Task
Defines the GNN learning task. It inherits from the GNNTable, so it has the same methods. There are two different Task classes:
Each task type also comes with a default evaluation metric, which is listed in the table below. You can change the evaluation metric if you prefer to use a different one for your specific use case, as shown in the NodeTask example and the LinkTask example. You can find all the evaluation metrics supported in the Evaluation Metric page.
TaskType
Section titled “TaskType”Currently there are five types of tasks supported:
| TaskType | Task Class | Description | Default Evaluation Metric |
|---|---|---|---|
TaskType.BINARY_CLASSIFICATION | NodeTask | Describes a binary classification task. Labels can be of any Column DType, but must take only two distinct values. | roc_auc |
TaskType.MULTICLASS_CLASSIFICATION | NodeTask | Similar to binary classification, but the label column can have more than two distinct classes. | macro_f1 |
TaskType.MULTILABEL_CLASSIFICATION | NodeTask | In a multi-label classification problem, each instance can belong to one or more out of N total classes. Labels can be of any Column DType and there should be a separate row for each different label of an instance. | multilabel_auroc_macro |
TaskType.REGRESSION | NodeTask | Used for regression problems where labels are expected to be either floats or integers. | rmse |
TaskType.LINK_PREDICTION | LinkTask | A classic link prediction problem where we aim to identify the top-k most similar destination entities given a source entity. Each row represents a single source-target pair, with the source_entity_column specifying the source and the target_entity_column specifying a single destination entity. | link_prediction_map with eval_at_k=12 |
TaskType.REPEATED_LINK_PREDICTION | LinkTask | A variation of link prediction, reframed as a node classification problem, where we aim to identify the top-k destination entities that a given source entity will visit again. Each row represents a single source-target pair, with the source_entity_column specifying the source and the target_entity_column specifying a single destination entity. | link_prediction_map with eval_at_k=12 |
Time columns
Section titled “Time columns”When creating a GNNTable object or defining a task, you have the option to specify one of the dataset’s columns as a time column. Time columns are essential for temporal tasks, ensuring that the model respects the chronological order of events and avoids information leakage.
To understand their role, let’s consider a forecasting example.
Suppose we want to train a model to predict the sales of a store on a given date. A sample task table for this regression task might look like:
| STORE_ID | DATE | SALES | PROMO_ACTIVE |
|---|---|---|---|
| 123 | 10/12/2022 | 500 | 0 |
| 123 | 11/12/2022 | 600 | 1 |
| 456 | 10/12/2022 | 500 | 0 |
| 456 | 11/12/2022 | 500 | 1 |
This table contains sales data for two stores over two days. When training a forecasting model, it’s critical that the model only has access to data from dates prior to the one it is predicting. If future information is included during training, it will lead to information leakage and overfitting.
By marking a column (e.g., DATE) as a time column, the learning engine enforces this temporal constraint. It will only use data:
- strictly before the prediction date (<), or
- up to and including the prediction date (<=),
depending on how the task is configured. The choice between < and <= depends on the specific requirements of your problem. When defining the task, you can control this behavior by setting the parameter current_time in NodeTask and in LinkTask definition:
current_time = True→ use <= (data up to and including the prediction date).current_time = False→ use < (data strictly before the prediction date).
Let’s imagine you’re building a forecasting model for predicting product sales at multiple stores. You want to predict sales for each store on 2022-11-12, using information from earlier dates. Now, depending on the task type, the choice of current_time changes the behavior of the model’s temporal filtering:
- Forecasting (strictly past data)
You want to simulate real forecasting conditions, where future or same-day information (like promotions running today) is not yet known. Here, current_time should be set to False and the model can only use data before 2022-11-12 to make predictions. This ensures there’s no information leakage — it mimics how you’d forecast in production, using only historical data.
- Simulation or “what-if” analysis (include current day)
Now suppose you’re running a simulation to estimate how well your model fits current conditions — for example, you know that the promotion is already active today, and you want to use that information. In this case, current_time should be set to True and the model can use all data up to and including 2022-11-12, so features like PROMO_ACTIVE on that day are available. This setting is useful when your task represents real-time inference (you already have today’s context) rather than pure forecasting.
⚠️ Important Note: Only one time column is allowed per table and per task.
Evaluation Metric
Section titled “Evaluation Metric”Each task type has a default evaluation metric that is automatically used during training. This metric is important because it drives early stopping: training will stop when the evaluation metric no longer improves, helping to prevent overfitting and unnecessary computation.
While each task has a sensible default shown in the table above, you can also override the metric with a different one if it better suits your use case. See the Evaluation Metric page for the full list of supported metrics and how to configure them.