LinkTask

Defines a link prediction GNN learning task.

Parameters

Name	Type	Description	Optional
`connector`	`SnowflakeConnector`	The connector object used for sending requests to the GNN engine.	No
`name`	`str`	The name of the task, can be anything describing the task at hand. The name must comply with Snowflake object identifier rules.	No
`task_data_source`	`Dict`	A dictionary mapping split names to table paths. For training, `"train"` and `"validation"` keys are required, while `"test"` is optional. For inference-only workflows, only `"test"` is required. Each value is a fully qualified Snowflake table name in `Database.Schema.Table` format. Multiple splits may reference the same table.	No
`source_entity_column`	`ForeignKey`	A foreign key that specifies the name of the source entity column in the training, validation, and test tables, and references the corresponding source entity `GNNTable` and its column. The column identified by this foreign key represents the source node in the task. The node IDs contained in the foreign key’s `column_name` must match, or be a subset of, the values in the column specified by the foreign key’s `link_to` attribute.	No
`target_entity_column`	`ForeignKey`	A foreign key that specifies the name of the target entity column in the training, validation, and test tables, and references the corresponding target entity `GNNTable` and its column. The column identified by this foreign key represents the target node in the task. The node IDs contained in the foreign key’s `column_name` must match, or be a subset of, the values in the column specified by the foreign key’s `link_to` attribute. (Users can choose not to provide targets for the test table.)	No
`task_type`	`TaskType`	The type of the link task, it can be one of `TaskType.LINK_PREDICTION` or `TaskType.REPEATED_LINK_PREDICTION`	No
`time_column`	`str`	If the dataset includes a time-based dimension, you can specify a timestamp column to incorporate temporal dependencies. Only one time column is supported. For details, see the Time Columns section.	Yes
`evaluation_metric`	`EvaluationMetric`	The name of the evaluation metric that we want to optimize for	Yes
`current_time`	`bool`	If set to `False` the current time of the task table will be reduced by one time unit. Useful when the time column at the task table does not need to see the values from the database tables at the same timestamp	Yes

Examples

Link Prediction Task With Time

As shown in the figure, this dataset contains three tables:

customers with candidate key customer_id
articles (products) with candidate key article_id
transactions with two foreign keys: customer_id linking to the customers table, and article_id linking to the articles table, as well as a time column t_dat.

Each row in the transactions table shows that a specific customer (customer_id) buying a specific product (article_id) on a specific date (t_dat).

Our task (purchase_task) is a recommendation task (link_prediction): given a customer and a date, we want to recommend articles the customer is likely to purchase. The time column is required so that the model does not see future transactions of a customer.

In this example, the source_entity_column links to the customers table, since we are making predictions about customers, and the target_entity_column links to the articles table, since we are predicting which articles the customers are likely to purchase next.

link_task_schema

In this case, the task can be defined as follows:

from relationalai_gnns import LinkTask, TaskType, ForeignKey

# The task_data_source maps each dataset split to the corresponding table name.

link_task = LinkTask(
    connector=connector,
    name="recommendation_task",
    task_data_source={
        "train": "DATABASE.SCHEMA.TRAIN",
        "test": "DATABASE.SCHEMA.TEST",
        "validation": "DATABASE.SCHEMA.VALIDATION"
    },
    source_entity_column=ForeignKey(column_name='customer_id', link_to='customers.customer_id'),
    target_entity_column=ForeignKey(column_name='article_id', link_to='articles.article_id'),
    time_column="timestamp",
    task_type=TaskType.LINK_PREDICTION
)

Repeated Link Prediction Task Setting Evaluation Metric

from relationalai_gnns import LinkTask, TaskType, ForeignKey
from relationalai_gnns import EvaluationMetric

rep_link_task = LinkTask(
    connector=connector,
    name="my_link_task",
    task_data_source={
        "train": "DATABASE.SCHEMA.TRAIN",
        "test": "DATABASE.SCHEMA.TEST",
        "validation": "DATABASE.SCHEMA.VALIDATION"
    },
    source_entity_column=ForeignKey(column_name='source_ids', link_to='TableWithCKey1.Id1'),
    target_entity_column=ForeignKey(column_name='target_ids', link_to='TableWithCKey2.Id2'),
    task_type=TaskType.REPEATED_LINK_PREDICTION,
    evaluation_metric=EvaluationMetric(name="link_prediction_map", eval_at_k=12)
)

Inference-Only Task

If you only need to run inference (no training), you can create a task with just a "test" split:

from relationalai_gnns import LinkTask, TaskType, ForeignKey

link_task = LinkTask(
    connector=connector,
    name="inference_recommendation_task",
    task_data_source={
        "test": "DATABASE.SCHEMA.TEST"
    },
    source_entity_column=ForeignKey(column_name='source_ids', link_to='TableWithCKey1.Id1'),
    time_column="timestamp",
    task_type=TaskType.LINK_PREDICTION
)

Inference-only tasks can be used with trainer.predict() (by passing the task inside a new Dataset) but cannot be passed to trainer.fit() or trainer.fit_predict().

Methods

LinkTask inherits from the GNNTable, so it has the same methods. It additionally provides a show_task() method:

.show_task()

Prints the task metadata schema and task details.

Example

rep_link_task.show_task()

Attributes

.source_entity_column

Retrieves source_entity_column. Cannot be set after initialization. It is read-only.

.target_entity_column

Retrieves target_entity_column. Cannot be set after initialization. It is read-only.

.current_time

Retrieves current_time. Cannot be set after initialization. It is read-only.