Skip to content

LinkTask

Defines a link prediction GNN learning task.

NameTypeDescriptionOptional
connectorSnowflakeConnectorThe connector object used for sending requests to the GNN engine.No
namestrThe name of the task, can be anything describing the task at hand. The name must comply with Snowflake object identifier rules.No
task_data_sourceDictA dictionary mapping split names to table paths. For training, "train" and "validation" keys are required, while "test" is optional. For inference-only workflows, only "test" is required. Each value is a fully qualified Snowflake table name in Database.Schema.Table format. Multiple splits may reference the same table.No
source_entity_columnForeignKeyA foreign key that specifies the name of the source entity column in the training, validation, and test tables, and references the corresponding source entity GNNTable and its column. The column identified by this foreign key represents the source node in the task. The node IDs contained in the foreign key’s column_name must match, or be a subset of, the values in the column specified by the foreign key’s link_to attribute.No
target_entity_columnForeignKeyA foreign key that specifies the name of the target entity column in the training, validation, and test tables, and references the corresponding target entity GNNTable and its column. The column identified by this foreign key represents the target node in the task. The node IDs contained in the foreign key’s column_name must match, or be a subset of, the values in the column specified by the foreign key’s link_to attribute. (Users can choose not to provide targets for the test table.)No
task_typeTaskTypeThe type of the link task, it can be one of TaskType.LINK_PREDICTION or TaskType.REPEATED_LINK_PREDICTIONNo
time_columnstrIf the dataset includes a time-based dimension, you can specify a timestamp column to incorporate temporal dependencies. Only one time column is supported. For details, see the Time Columns section.Yes
evaluation_metricEvaluationMetricThe name of the evaluation metric that we want to optimize forYes
current_timeboolIf set to False the current time of the task table will be reduced by one time unit. Useful when the time column at the task table does not need to see the values from the database tables at the same timestampYes

As shown in the figure, this dataset contains three tables:

  • customers with candidate key customer_id
  • articles (products) with candidate key article_id
  • transactions with two foreign keys: customer_id linking to the customers table, and article_id linking to the articles table, as well as a time column t_dat.

Each row in the transactions table shows that a specific customer (customer_id) buying a specific product (article_id) on a specific date (t_dat).

Our task (purchase_task) is a recommendation task (link_prediction): given a customer and a date, we want to recommend articles the customer is likely to purchase. The time column is required so that the model does not see future transactions of a customer.

In this example, the source_entity_column links to the customers table, since we are making predictions about customers, and the target_entity_column links to the articles table, since we are predicting which articles the customers are likely to purchase next.

link_task_schema

In this case, the task can be defined as follows:

from relationalai_gnns import LinkTask, TaskType, ForeignKey
# The task_data_source maps each dataset split to the corresponding table name.
link_task = LinkTask(
connector=connector,
name="recommendation_task",
task_data_source={
"train": "DATABASE.SCHEMA.TRAIN",
"test": "DATABASE.SCHEMA.TEST",
"validation": "DATABASE.SCHEMA.VALIDATION"
},
source_entity_column=ForeignKey(column_name='customer_id', link_to='customers.customer_id'),
target_entity_column=ForeignKey(column_name='article_id', link_to='articles.article_id'),
time_column="timestamp",
task_type=TaskType.LINK_PREDICTION
)
Section titled “Repeated Link Prediction Task Setting Evaluation Metric”
from relationalai_gnns import LinkTask, TaskType, ForeignKey
from relationalai_gnns import EvaluationMetric
rep_link_task = LinkTask(
connector=connector,
name="my_link_task",
task_data_source={
"train": "DATABASE.SCHEMA.TRAIN",
"test": "DATABASE.SCHEMA.TEST",
"validation": "DATABASE.SCHEMA.VALIDATION"
},
source_entity_column=ForeignKey(column_name='source_ids', link_to='TableWithCKey1.Id1'),
target_entity_column=ForeignKey(column_name='target_ids', link_to='TableWithCKey2.Id2'),
task_type=TaskType.REPEATED_LINK_PREDICTION,
evaluation_metric=EvaluationMetric(name="link_prediction_map", eval_at_k=12)
)

If you only need to run inference (no training), you can create a task with just a "test" split:

from relationalai_gnns import LinkTask, TaskType, ForeignKey
link_task = LinkTask(
connector=connector,
name="inference_recommendation_task",
task_data_source={
"test": "DATABASE.SCHEMA.TEST"
},
source_entity_column=ForeignKey(column_name='source_ids', link_to='TableWithCKey1.Id1'),
time_column="timestamp",
task_type=TaskType.LINK_PREDICTION
)

Inference-only tasks can be used with trainer.predict() (by passing the task inside a new Dataset) but cannot be passed to trainer.fit() or trainer.fit_predict().

LinkTask inherits from the GNNTable, so it has the same methods. It additionally provides a show_task() method:

Prints the task metadata schema and task details.

rep_link_task.show_task()

Retrieves source_entity_column. Cannot be set after initialization. It is read-only.

Retrieves target_entity_column. Cannot be set after initialization. It is read-only.

Retrieves current_time. Cannot be set after initialization. It is read-only.