The RelationalAI Predictive Reasoner

Learn how to use the RelationalAI Predictive Reasoner to produce meaningful predictions and embeddings from your relational data, supporting decision-intelligence tasks like identifying potential churn, detecting fraud, and more. The Predictive Reasoner helps you turn data into actionable insight.

Before using the RelationalAI Predictive Reasoner, ensure the following requirements are met.

Snowflake Permissions for Experiment Tracking and Model Storage

To enable the RelationalAI application to save trained models, track experiments, and persist results, you must grant it access to a database and schema of your choice. These should match the database and schema specified in your ExperimentConfig.

The database and schema used for experiment tracking and model storage must be dedicated to RelationalAI artifacts only. Do not store other data in the same database or schema used for experiment tracking. Mixing user or application data with experiment tracking data can introduce security and operational risks and is not supported. We strongly recommend creating a separate database and schema specifically for experiment tracking and model storage.

Run the following commands in Snowflake (replace placeholders as needed):

-- Grant access to required database and schema
GRANT USAGE ON DATABASE <DATABASE> TO APPLICATION RELATIONALAI;
GRANT USAGE ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;
-- Allow storing experiment results
GRANT CREATE EXPERIMENT ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;
-- Allow registering trained models
GRANT CREATE MODEL ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;

Snowflake Role with Data Read and Write Access

The Snowflake role used when creating the Provider must have sufficient privileges to:

Read input tables from the source database.
Write prediction outputs back to the target database or schema.

Create a `Provider` and a `SnowflakeConnector`

Use your snowflake credentials to create a Provider. The Provider supports multiple authentication methods, with examples available here. The following example demonstrates password-based authentication locally and active_session authentication in a Snowflake notebook.

Local
Snowflake Notebook

from relationalai_gnns import Provider

snowflake_config = {
    "user": "<SNOWFLAKE_USER>",
    "password": "<SNOWFLAKE_PASSWORD>",
    "account": "<SNOWFLAKE_ACCOUNT>",
    "role":"<SNOWFLAKE_ROLE_WITH_ACCESS_TO_DB>",
    "warehouse": "<SNOWFLAKE_WAREHOUSE>",
    "app_name": "RELATIONALAI",
    "auth_method": "password"
}
provider = Provider(**snowflake_config)

from relationalai_gnns import Provider

snowflake_config = {
    "app_name": "RELATIONALAI",
    "auth_method": "active_session",
    "role": "<SNOWFLAKE_ROLE_WITH_ACCESS_TO_DB>"
}
provider = Provider(**snowflake_config)

To get your account name, run the following command in Snowflake:

SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME();

To find the RelationalAI application name, list all applications with:

SHOW APPLICATIONS;

Use the provider to create a new reasoner under the name "<GNN_REASONER>":

provider.create_gnn(
    name="<GNN_REASONER>",
    size="HIGHMEM_X64_S"
)

Available compute pool sizes are listed in create_gnn() method documentation.

Created reasoners are automatically suspended after a period of inactivity, with a default autosuspension time of 1 hour. To modify this setting, see the create_gnn() documentation. To resume a suspended reasoner, refer to the resume_gnn() method.

The Provider can also be used to monitor reasoner status, delete unwanted reasoners, or list all available reasoners. See Provider for usage examples.

⚠️ Important Note: When configuring Provider, specify a role that has the required read and write access to databases you plan to use.

Use the same snowflake credentials and the reasoner name to create a SnowflakeConnector.

from relationalai_gnns import SnowflakeConnector

connector = SnowflakeConnector(
    **snowflake_config,
    engine_name = "<GNN_REASONER>"
)

Create a `GNNTable` from a Snowfake Table

Once you have a SnowflakeConnector, you can construct a Dataset consisting of two or more GNNTables and a Task.

A GNNTable can only be created from a Snowfake table using its fully-qualified name, e.g. <db>.<schema>.<table>, as shown below:

from relationalai_gnns import GNNTable, ForeignKey, CandidateKey

# The name parameter specifies a custom name for the table, chosen by the user.
# Create a GNNTable from 'STUDENTS' table from 'SYNTH_DB.SYNTH_SCHEMA' schema in Snowflake
student_table = GNNTable(
    connector=connector,
    name="Students", # GNNTable is uniquely identified by this name.
    source="SYNTH_DB.SYNTH_SCHEMA.STUDENTS",
    type="node",
    candidate_keys=[CandidateKey(column_name="studentId")],
)

# Create a GNNTable from 'CLASSES' table from 'SYNTH_DB.SYNTH_SCHEMA' schema in Snowflake
class_table = GNNTable(
    connector=connector,
    name="Classes", # GNNTable is uniquely identified by this name.
    source="SYNTH_DB.SYNTH_SCHEMA.CLASSES",
    type="node",
    candidate_keys=[CandidateKey(column_name="classId")],
)

# Create a GNNTable from 'STUDENT_TAKES_CLASS' table from 'SYNTH_DB.SYNTH_SCHEMA' schema in Snowflake
# Add two foreign keys:
# One referencing Students.studentId,
# where Students is the name of "SYNTH_DB.SYNTH_SCHEMA.STUDENTS" table defined above,
# and studentId is its candidate key column.
# One referencing Classes.classId,
# where Classes is the name of "SYNTH_DB.SYNTH_SCHEMA.CLASSES" table defined above,
# and classId is its candidate key column.
student_takes_class_table = GNNTable(
    connector=connector,
    name="StudentsTakeClass", # GNNTable is uniquely identified by this name.
    source="SYNTH_DB.SYNTH_SCHEMA.STUDENT_TAKES_CLASS",
    type="node",
    foreign_keys=[ForeignKey(column_name="studentId", link_to="Students.studentId"),
                  ForeignKey(column_name="classId", link_to="Classes.classId")]
)

In this example, we construct three GNNTable objects of type node, meaning that each row in the table corresponds to a node in the graph. We also support GNNTable objects of type edge, where each row defines an edge connecting two node instances. For more details, refer to GNNTable documentation and see a practical example in Smoker Prediction with Edge List.

Notice that both student_table and class_table are constructed using candidate_keys argument, while student_takes_class_table uses foreign_keys argument. Here is a quick summary of rules for creating a GNNTable object of type node. For more details, consult GNNTable.

You have now converted three source tables from your Snowflake database into GNNTables. All columns from source tables are included, and semantic types of all columns are automatically inferred from their data types. A full description of all supported data types can be found in Column DTypes. Since this inference is automatic, we recommend using .show_table() method to review inferred semantic types.

student_table.show_table()

When changes are needed, a suite of methods is available to modify your GNNTable. See GNNTable for more details on the methods. Below are some examples of changes you can make.

from relationalai_gnns import ColumnDType

# Do not use this column as an input feature for the GNNTable
student_table.remove_column(col_name="participation")

# Change column semantic type, i.e. dtype
class_table.update_column_dtype(col_name="credits", dtype=ColumnDType('category'))

Finally, it’s recommended to use .validate_table() method to check for any inconsistencies in your GNNTables before proceeding to the next step.

# Validate table
student_takes_class_table.validate_table()

Define a learning task

We support two broad types of learning tasks: NodeTask and LinkTask. For details please consult Task API.

The following example defines a NodeTask for binary classification. You will need to prepare three tables with class labels:

SYNTH_DB.SYNTH_RANK.TRAIN
SYNTH_DB.SYNTH_RANK.VALIDATION
SYNTH_DB.SYNTH_RANK.TEST

All three tables should use the same schema. Each table must include:

A studentId column that serves as a foreign key referencing studentId candidate key of GNNTable with the nameStudents.
A label column containing the binary label that the model is trying to predict. Note that the test table, SYNTH_DB.SYNTH_RANK.TEST, does not need to have a label column.

from relationalai_gnns import NodeTask, TaskType, ForeignKey

# The task_data_source maps each dataset split to the corresponding table name.
node_task = NodeTask(
    connector=connector,
    name="my_node_task",
    task_data_source={
        "train": "SYNTH_DB.SYNTH_RANK.TRAIN",
        "test": "SYNTH_DB.SYNTH_RANK.TEST",
        "validation": "SYNTH_DB.SYNTH_RANK.VALIDATION"
    },
    target_entity_column=ForeignKey(column_name="studentId", link_to="Students.studentId"),
    label_column="label",
    task_type=TaskType.BINARY_CLASSIFICATION
)

Similarly, you can use the .show_task() method to review settings of the NodeTask. For instructions on modifying these default settings, please consult the Task API.

node_task.show_task()

Compose a `Dataset` from `GNNTable`s and a `Task`

Using the defined GNNTable objects, the NodeTask and the SnowflakeConnector, you can construct a Dataset as follows:

from relationalai_gnns import Dataset

dataset = Dataset(
    connector=connector,
    dataset_name="my_first_dataset",
    tables=[student_table, class_table, student_takes_class_table],
    task_description=node_task
)

The .visualize_dataset() method is provided to help you inspect the connectivity between all your tables and the task. For more details, please consult Dataset API.

Local
Snowflake Notebook

from IPython.display import Image, display

graph = dataset.visualize_dataset()
plt = Image(graph.create_png())
display(plt)

from graphviz import Source

graph = dataset.visualize_dataset()
# Experiment with font size and plot size to get a good visualization
for node in graph.get_nodes():
    font_size = node.get_attributes()['fontsize']
    font_size = "16"
    node.set('fontsize', font_size)

graph.set_graph_defaults(size="10,10!")  # Increase graph size

src = Source(graph.to_string())
src  # Display in notebook

Train a model and generate predictions

Once your Dataset is ready, you can proceed to train a model. To do this, you need to define three additional objects as shown in the example below:

from relationalai_gnns import ExperimentConfig, TrainerConfig, Trainer

experiment_config = ExperimentConfig(database="database_name",
                                     schema="schema_name")

trainer_config = TrainerConfig(connector=connector,
                               experiment_config=experiment_config,
                               device="cuda",
                               n_epochs=10)

trainer = Trainer(connector=connector, config=trainer_config)

train_job = trainer.fit(dataset=dataset)

ExperimentConfig specifies the database and schema where the experiment results, such as model metrics, will be stored.

TrainerConfig contains all training parameters. In the example, default values are used for all parameters. For a comprehensive list of training parameters and examples of modifying their values, see TrainerConfig API.

The other object you will need is Trainer, which is simply defined using a SnowflakeConnector and a TrainerConfig object. Finally, to train a model, use .fit() method and pass in your prepared Dataset. For more information on Trainer and its methods, refer to Trainer API.

Note

Model training are non-blocking jobs. In other words, .fit() method returns a JobMonitor object, which you can use to track the progress of the job while it runs in the background. For details on monitoring jobs, see the JobMonitor API.
Ensure the native app has the necessary permissions to the database and schema specified in ExperimentConfig.

-- grant access to resources needed for snowflake experiment tracking

GRANT USAGE ON DATABASE <DATABASE> TO APPLICATION RELATIONALAI;
GRANT USAGE ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;
-- grant access to store experiment results
GRANT CREATE EXPERIMENT ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;
-- grant access to register models
GRANT CREATE MODEL ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;

Once a model is trained, you can generate batch predictions using .predict() method of Trainer object.

from relationalai_gnns import OutputConfig

inference_job = trainer.predict(
    output_alias="experiment_1",
    output_config=OutputConfig.snowflake(
        database_name="SYNTH_DB",
        schema_name="PUBLIC"
    ),
    dataset=dataset,
    model_run_id=train_job.model_run_id,
)

In this example, predictions will be saved to PREDICTIONS_EXPERIMENT_1 table in SYNTH_DB.PUBLIC schema in Snowflake. Please make sure that your Snowflake role, used in SnowflakeConnector, has write permissions for this schema. For more details on .fit() and .predict() methods, see Trainer API.

To manage all jobs you have created in a session, we also provide JobManager class. For more details, please refer to JobManager API.