The RelationalAI Predictive Reasoner
Learn how to use the RelationalAI Predictive Reasoner to produce meaningful predictions and embeddings from your relational data, supporting decision-intelligence tasks like identifying potential churn, detecting fraud, and more. The Predictive Reasoner helps you turn data into actionable insight.
Create a Provider and a SnowflakeConnector
Section titled “Create a Provider and a SnowflakeConnector”Use your snowflake credentials to create a Provider. The Provider supports multiple authentication methods, with examples available here. The following example demonstrates password-based authentication locally and active_session authentication in a Snowflake notebook.
from relationalai_gnns import Provider
snowflake_config = { "user": "<SNOWFLAKE_USER>", "password": "<SNOWFLAKE_PASSWORD>", "account": "<SNOWFLAKE_ACCOUNT>", "role":"<SNOWFLAKE_ROLE_WITH_ACCESS_TO_DB>", "warehouse": "<SNOWFLAKE_WAREHOUSE>", "app_name": "RELATIONALAI", "auth_method": "password"}provider = Provider(**snowflake_config)from relationalai_gnns import Provider
snowflake_config = { "app_name": "RELATIONALAI", "auth_method": "active_session", "role": "<SNOWFLAKE_ROLE_WITH_ACCESS_TO_DB>"}provider = Provider(**snowflake_config)To get your account name, run the following command in Snowflake:
SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME();To find the RelationalAI application name, list all applications with:
SHOW APPLICATIONS;Use the provider to create a new reasoner under the name "<GNN_REASONER>":
provider.create_gnn( name="<GNN_REASONER>", size="HIGHMEM_X64_S")Available compute pool sizes are listed in create_gnn() method documentation.
Created reasoners are automatically suspended after a period of inactivity, with a default autosuspension time of 1 hour. To modify this setting, see the create_gnn() documentation. To resume a suspended reasoner, refer to the resume_gnn() method.
The Provider can also be used to monitor reasoner status, delete unwanted reasoners, or list all available reasoners. See Provider for usage examples.
⚠️ Important Note:
When configuring Provider, specify a role that has the required read and write access to databases you plan to use.
Use the same snowflake credentials and the reasoner name to create a SnowflakeConnector.
from relationalai_gnns import SnowflakeConnector
connector = SnowflakeConnector( **snowflake_config, engine_name = "<GNN_REASONER>")Create a GNNTable from a Snowfake Table
Section titled “Create a GNNTable from a Snowfake Table”Once you have a SnowflakeConnector, you can construct a Dataset consisting of two or more GNNTables and a Task.
A GNNTable can only be created from a Snowfake table using its fully-qualified name, e.g. <db>.<schema>.<table>, as shown below:
from relationalai_gnns import GNNTable, ForeignKey, CandidateKey
# The name parameter specifies a custom name for the table, chosen by the user.# Create a GNNTable from 'STUDENTS' table from 'SYNTH_DB.SYNTH_SCHEMA' schema in Snowflakestudent_table = GNNTable( connector=connector, name="Students", # GNNTable is uniquely identified by this name. source="SYNTH_DB.SYNTH_SCHEMA.STUDENTS", type="node", candidate_keys=[CandidateKey(column_name="studentId")],)
# Create a GNNTable from 'CLASSES' table from 'SYNTH_DB.SYNTH_SCHEMA' schema in Snowflakeclass_table = GNNTable( connector=connector, name="Classes", # GNNTable is uniquely identified by this name. source="SYNTH_DB.SYNTH_SCHEMA.CLASSES", type="node", candidate_keys=[CandidateKey(column_name="classId")],)
# Create a GNNTable from 'STUDENT_TAKES_CLASS' table from 'SYNTH_DB.SYNTH_SCHEMA' schema in Snowflake# Add two foreign keys:# One referencing Students.studentId,# where Students is the name of "SYNTH_DB.SYNTH_SCHEMA.STUDENTS" table defined above,# and studentId is its candidate key column.# One referencing Classes.classId,# where Classes is the name of "SYNTH_DB.SYNTH_SCHEMA.CLASSES" table defined above,# and classId is its candidate key column.student_takes_class_table = GNNTable( connector=connector, name="StudentsTakeClass", # GNNTable is uniquely identified by this name. source="SYNTH_DB.SYNTH_SCHEMA.STUDENT_TAKES_CLASS", type="node", foreign_keys=[ForeignKey(column_name="studentId", link_to="Students.studentId"), ForeignKey(column_name="classId", link_to="Classes.classId")])In this example, we construct three GNNTable objects of type node, meaning that each row in the table corresponds to a node in the graph. We also support GNNTable objects of type edge, where each row defines an edge connecting two node instances.
For more details, refer to GNNTable documentation and see a practical example in Smoker Prediction with Edge List.
Notice that both student_table and class_table are constructed using candidate_keys argument, while student_takes_class_table uses foreign_keys argument.
Here is a quick summary of rules for creating a GNNTable object of type node. For more details, consult GNNTable.
You have now converted three source tables from your Snowflake database into GNNTables. All columns from source tables are included, and semantic types of all columns are automatically inferred from their data types. A full description of all supported data types can be found in Column DTypes. Since this inference is automatic, we recommend using .show_table() method to review inferred semantic types.
student_table.show_table()When changes are needed, a suite of methods is available to modify your GNNTable.
See GNNTable for more details on the methods.
Below are some examples of changes you can make.
from relationalai_gnns import ColumnDType
# Do not use this column as an input feature for the GNNTablestudent_table.remove_column(col_name="participation")
# Change column semantic type, i.e. dtypeclass_table.update_column_dtype(col_name="credits", dtype=ColumnDType('category'))Finally, it’s recommended to use .validate_table() method to check for any inconsistencies in your GNNTables before proceeding to the next step.
# Validate tablestudent_takes_class_table.validate_table()Define a learning task
Section titled “Define a learning task”We support two broad types of learning tasks: NodeTask and LinkTask. For details please consult Task API.
The following example defines a NodeTask for binary classification. You will need to prepare three tables with class labels:
SYNTH_DB.SYNTH_RANK.TRAINSYNTH_DB.SYNTH_RANK.VALIDATIONSYNTH_DB.SYNTH_RANK.TEST
All three tables should use the same schema. Each table must include:
- A
studentIdcolumn that serves as a foreign key referencingstudentIdcandidate key ofGNNTablewith the nameStudents. - A
labelcolumn containing the binary label that the model is trying to predict. Note that the test table,SYNTH_DB.SYNTH_RANK.TEST, does not need to have a label column.
from relationalai_gnns import NodeTask, TaskType, ForeignKey
# The task_data_source maps each dataset split to the corresponding table name.node_task = NodeTask( connector=connector, name="my_node_task", task_data_source={ "train": "SYNTH_DB.SYNTH_RANK.TRAIN", "test": "SYNTH_DB.SYNTH_RANK.TEST", "validation": "SYNTH_DB.SYNTH_RANK.VALIDATION" }, target_entity_column=ForeignKey(column_name="studentId", link_to="Students.studentId"), label_column="label", task_type=TaskType.BINARY_CLASSIFICATION)Similarly, you can use the .show_task() method to review settings of the NodeTask.
For instructions on modifying these default settings, please consult the Task API.
node_task.show_task()Compose a Dataset from GNNTables and a Task
Section titled “Compose a Dataset from GNNTables and a Task”Using the defined GNNTable objects, the NodeTask and the SnowflakeConnector, you can construct a Dataset as follows:
from relationalai_gnns import Dataset
dataset = Dataset( connector=connector, dataset_name="my_first_dataset", tables=[student_table, class_table, student_takes_class_table], task_description=node_task)The .visualize_dataset() method is provided to help you inspect the connectivity between all your tables and the task.
For more details, please consult Dataset API.
from IPython.display import Image, display
graph = dataset.visualize_dataset()plt = Image(graph.create_png())display(plt)from graphviz import Source
graph = dataset.visualize_dataset()# Experiment with font size and plot size to get a good visualizationfor node in graph.get_nodes(): font_size = node.get_attributes()['fontsize'] font_size = "16" node.set('fontsize', font_size)
graph.set_graph_defaults(size="10,10!") # Increase graph size
src = Source(graph.to_string())src # Display in notebookTrain a model and generate predictions
Section titled “Train a model and generate predictions”Once your Dataset is ready, you can proceed to train a model. To do this, you need to define three additional objects as shown in the example below:
from relationalai_gnns import ExperimentConfig, TrainerConfig, Trainer
experiment_config = ExperimentConfig(database="database_name", schema="schema_name")
trainer_config = TrainerConfig(connector=connector, experiment_config=experiment_config, device="cuda", n_epochs=10)
trainer = Trainer(connector=connector, config=trainer_config)
train_job = trainer.fit(dataset=dataset)ExperimentConfig specifies the database and schema where the experiment results, such as model metrics, will be stored.
TrainerConfig contains all training parameters. In the example, default values are used for all parameters. For a comprehensive list of training parameters and examples of modifying their values, see TrainerConfig API.
The other object you will need is Trainer, which is simply defined using a SnowflakeConnector and a TrainerConfig object.
Finally, to train a model, use .fit() method and pass in your prepared Dataset. For more information on Trainer and its methods, refer to Trainer API.
Note
- Model training are non-blocking jobs. In other words,
.fit()method returns aJobMonitorobject, which you can use to track the progress of the job while it runs in the background. For details on monitoring jobs, see the JobMonitor API. - Ensure the native app has the necessary permissions to the database and schema specified in
ExperimentConfig.
-- grant access to resources needed for snowflake experiment tracking
GRANT USAGE ON DATABASE <DATABASE> TO APPLICATION RELATIONALAI;GRANT USAGE ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;-- grant access to store experiment resultsGRANT CREATE EXPERIMENT ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;-- grant access to register modelsGRANT CREATE MODEL ON SCHEMA <DATABASE>.<SCHEMA> TO APPLICATION RELATIONALAI;Once a model is trained, you can generate batch predictions using .predict() method of Trainer object.
from relationalai_gnns import OutputConfig
inference_job = trainer.predict( output_alias="experiment_1", output_config=OutputConfig.snowflake( database_name="SYNTH_DB", schema_name="PUBLIC" ), dataset=dataset, model_run_id=train_job.model_run_id,)In this example, predictions will be saved to PREDICTIONS_EXPERIMENT_1 table in SYNTH_DB.PUBLIC schema in Snowflake. Please make sure that your Snowflake role, used in SnowflakeConnector, has write permissions for this schema.
For more details on .fit() and .predict() methods, see Trainer API.
To manage all jobs you have created in a session, we also provide JobManager class. For more details, please refer to JobManager API.