Skip to content

NodeTask

Defines a node classification or regression GNN learning task.

NameTypeDescriptionOptional
connectorSnowflakeConnectorThe connector object used for sending requests to the GNN engine.No
namestrThe name of the task, can be anything describing the task at hand. The name must comply with Snowflake object identifier rules.No
task_data_sourceDictA dictionary mapping split names to table paths. For training, "train" and "validation" keys are required, while "test" is optional. For inference-only workflows, only "test" is required. Each value is a fully qualified Snowflake table name in Database.Schema.Table format. Multiple splits may reference the same table.No
label_columnstrThe name of the column in the train and validation and (optionally) test tables containing the labels. The label column is the column holding the values that will be used to train the model. (Users can choose not to provide labels for the test data.)No
target_entity_columnForeignKeyA foreign key that specifies the name of the target entity column in the training, validation, and test tables, and references the corresponding target entity GNNTable and its column. The column identified by this foreign key represents the target nodes in the task for which the model will make predictions. The node IDs contained in the foreign key’s column_name must match, or be a subset of, the values in the column specified by the foreign key’s link_to attribute.No
task_typeTaskTypeThe type of the node task, it can be one of TaskType.BINARY_CLASSIFICATION, TaskType.MULTICLASS_CLASSIFICATION, TaskType.MULTILABEL_CLASSIFICATION or TaskType.REGRESSIONNo
time_columnstrIf the dataset includes a time-based dimension, you can specify a timestamp column to incorporate temporal dependencies. Only one time column is supported. For details, see the Time Columns section.Yes
evaluation_metricEvaluationMetricThe name of the evaluation metric that we want to optimize forYes
current_timeboolIf set to False the current time of the task table will be reduced by one time unit. Useful when the time column at the task table does not need to see the values from the database tables at the same timestampYes

As shown in the figure, this dataset contains three tables:

  • customers with candidate key customer_id
  • articles (products) with candidate key article_id
  • transactions with two foreign keys: customer_id linking to the customers table, and article_id linking to the articles table, as well as a time column t_dat.

Each row in the transactions table shows that a specific customer (customer_id) buying a specific product (article_id) on a specific date (t_dat).

Our task (churn_task) is a node-level binary classification task (binary_classification): given a customer and a date, we want to predict whether they are likely to churn in the next month. The time column is required so that the model does not see future transactions when making predictions.

In this example, the node of interest is the customers table, since we are predicting churn for customers. Therefore, the target_entity_column links to the customers table.

node_task_schema

In this case, the task can be defined as follows:

from relationalai_gnns import NodeTask, TaskType, ForeignKey
# The task_data_source maps each dataset split to the corresponding table name.
binary_clf_task = NodeTask(
connector=connector,
name="churn_task",
task_data_source={
"train": "DATABASE.SCHEMA.TRAIN",
"test": "DATABASE.SCHEMA.TEST",
"validation": "DATABASE.SCHEMA.VALIDATION"
},
target_entity_column=ForeignKey(column_name="id", link_to="TableWithCKey.Id"),
time_column="timestamp",
label_column="label",
task_type=TaskType.BINARY_CLASSIFICATION
)

Regression Task and Evaluation Metric Setup

Section titled “Regression Task and Evaluation Metric Setup”
from relationalai_gnns import NodeTask, TaskType, ForeignKey
from relationalai_gnns import EvaluationMetric
regr_task = NodeTask(
connector=connector,
name="my_node_task",
task_data_source={
"train": "DATABASE.REGRESSION_SCHEMA.TRAIN",
"test": "DATABASE.REGRESSION_SCHEMA.TEST",
"validation": "DATABASE.REGRESSION_SCHEMA.VALIDATION"
},
target_entity_column=ForeignKey(column_name="target_id", link_to="TableWithCKey.Id"),
label_column="value",
task_type=TaskType.REGRESSION,
evaluation_metric=EvaluationMetric(name="r2")
)

If you only need to run inference (no training), you can create a task with just a "test" split:

from relationalai_gnns import NodeTask, TaskType, ForeignKey
inference_task = NodeTask(
connector=connector,
name="my_inference_task",
task_data_source={
"test": "DATABASE.SCHEMA.TEST"
},
target_entity_column=ForeignKey(
column_name="id",
link_to="TableWithCKey.Id"
),
task_type=TaskType.BINARY_CLASSIFICATION
)

Inference-only tasks can be used with trainer.predict() (by passing the task inside a new Dataset) but cannot be passed to trainer.fit() or trainer.fit_predict().

NodeTask inherits from the GNNTable, so it has the same methods. It additionally provides a show_task() method:

Prints the task metadata schema and task details.

binary_clf_task.show_task()

Retrieves label_column. Cannot be set after initialization. It is read-only.

Retrieves target_entity_column. Cannot be set after initialization. It is read-only.

Retrieves current_time. Cannot be set after initialization. It is read-only.