Query a model
Build queries from a semantic model and get results back in Python. Use this when you want to explore a model, validate new facts while iterating, or extract a small result set for downstream code. Building a query is lazy, so compilation and runtime errors usually surface when you first materialize results.
- PyRel is installed and configured. See Set Up Your Environment for instructions.
- You have created a
Modelobject and declared some concepts and relationships. - You have defined some base facts.
Understand fragments and materialization
Section titled “Understand fragments and materialization”A query in PyRel is a Fragment: a composable description of what you want to filter and return.
Most query-building methods return a new fragment instead of running work immediately. That makes it cheap to build up a query in small steps, reuse pieces, and debug the shape of your query before you execute it.
Execution happens when you materialize a fragment.
The two most common materialization paths are Fragment.to_df (to get a pandas DataFrame) and Fragment.into + Fragment.exec (to export into a Snowflake table).
Here’s a minimal “compose, then materialize” pattern using Model.select:
from relationalai.semantics import Integer, Model
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})
q = m.select(Order.id)
# Nothing runs until you materialize.# This is where compilation, execution, and requirement checking happen.df = q.to_df()print(df.head())q = m.select(Order.id)builds a fragment. It describes the result shape (oneOrder.idcolumn) but does not execute.q.to_df()is the materialization step. This is where the fragment is compiled and executed, and where requirements (if any) are checked.- Because the snippet declares schema but does not define any facts, the DataFrame is valid but typically empty. That keeps the example focused on the “lazy until materialized” behavior.
Select values with select
Section titled “Select values with select”Selecting values defines what columns you want back when you materialize a fragment.
Choose the smallest explicit selection that matches your question so your DataFrame schema is predictable.
If you start broad, you can always refine later by calling select again on an existing fragment.
Select properties
Section titled “Select properties”You can select properties to return readable scalar columns, like “order id” and “promised ship date”. This is similar to selecting columns from a SQL table.
Use Model.select with properties:
from relationalai.semantics import DateTime, Integer, Modelfrom relationalai.semantics.std.datetime import datetime
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.promised_ship_date = m.Property(f"{Order} promised ship date is {DateTime}")
m.define( Order.new(id=1, promised_ship_date=datetime(2025, 12, 10)), Order.new(id=2, promised_ship_date=datetime(2025, 12, 12)),)
q = m.select( Order.id, Order.promised_ship_date,)
df = q.to_df()print(df.head())m.define(Order.new(...))createsOrderentities withidandpromised_ship_dateproperties.m.select(Order.id, Order.promised_ship_date)selects those properties as columns in the output.- The resulting DataFrame has two columns:
id(integers) andpromised_ship_date(datetimes). There is one row per order.
Select fields of relationships
Section titled “Select fields of relationships”Choose this when you need multiple fields of a relationship (including non-output fields) as separate output columns.
Relationship output fields are the default value.
To reference a non-output field, index the relationship by name to create a FieldRef.
from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})Shipment = m.Concept("Shipment", identify_by={"id": Integer})Carrier = m.Concept("Carrier", identify_by={"id": Integer})
Shipment.shipped_for = m.Relationship( f"{Shipment} via {Carrier:carrier} for {Order:order}")
m.define( o1 := Order.new(id=1), o2 := Order.new(id=2), s1 := Shipment.new(id=1), s2 := Shipment.new(id=2), c1 := Carrier.new(id=1), c2 := Carrier.new(id=2), s1.shipped_for(c1, o1), # Shipment 1 via Carrier 1 for Order 1 s2.shipped_for(c2, o2), # Shipment 2 via Carrier 2 for Order 2)
# Select fields by typeq = m.select( Shipment.id, Shipment.shipped_for[Carrier], Shipment.shipped_for[Order],)
# Alternatively, select fields by nameq = m.select( Shipment.id, Shipment.shipped_for["carrier"].id, Shipment.shipped_for["order"].id,)
# Select the relationship chain without indexing to get the output fieldq = m.select( Shipment.id, Shipment.shipped_for.id, # This is the same as Shipment.shipped_for[Carrier].id)
df = q.to_df()print(df.head())Shipment.shipped_for = m.Relationship(...)declares a relationship with three fields: a Shipment field, a Carrier field (named “carrier”), and an Order field (named “order”).Shipment.shipped_for[Carrier].idselects the Carrier field by type and gets theidproperty for entities in that field.Shipment.shipped_for["carrier"].idis equivalent.- The output DataFrame has three columns: a shipment id, a carrier id, and an order id.
Each row corresponds to a
(Shipment, Carrier, Order)triple that matches the relationship.
Alias selected values
Section titled “Alias selected values”Alias selected values when you want stable, readable column names in your output. This is especially helpful when you export results or when you want your DataFrame schema to stay stable as your model evolves.
Use alias(...) on selected values:
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.status = m.Property(f"{Order} status is {String}")
m.define( Order.new(id=1, status="shipped"), Order.new(id=2, status="delayed"),)
q = m.select( Order.id.alias("order_id"), Order.status.alias("order_status"),)
df = q.to_df()print(df.head())Order.id.alias("order_id")sets the DataFrame column name for the selectedidproperty.Order.status.alias("order_status")does the same for the selectedstatusproperty.- Aliasing affects output column names. It does not change your model schema or property names.
(Advanced) Select concepts
Section titled “(Advanced) Select concepts”You can select a concept to get one row per entity. Although this isn’t very common in practice, it can be helpful for debugging.
Use Model.select with a concept handle:
from relationalai.semantics import Integer, Model
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order)df = q.to_df()print(df.head())m.define(Order.new(...))creates a small set ofOrderentities so the output has rows.m.select(Order)selects entities that are members of theOrderconcept.- The resulting DataFrame has one column (named
Order) and one row per order entity. The values in theOrdercolumn are internal entity references.
- Entity references are internal values that point to specific entities in the model. They are not human-readable, but they are stable and can be helpful for debugging.
Filter results with where
Section titled “Filter results with where”Filtering keeps only the rows that match the conditions you care about.
Start a filter fragment with Model.where, then select the columns you want and materialize.
from relationalai.semantics import DateTime, Integer, Modelfrom relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
Shipment = m.Concept("Shipment", identify_by={"id": Integer})Shipment.shipped_at = m.Property(f"{Shipment} shipped at {DateTime}")
m.define( Shipment.new(id=1, shipped_at=datetime(2025, 12, 15)), Shipment.new(id=2, shipped_at=datetime(2025, 12, 9)),)
q = ( m.where( Shipment.shipped_at > datetime(2025, 12, 10), ).select( Shipment.id.alias("shipment_id"), Shipment.shipped_at, ))
df = q.to_df()print(df.head())m.define(...)adds two shipments with different ship dates, so the filter has something to include and exclude.m.where(Shipment.shipped_at > ...)keeps only shipments shipped after the cutoff date. With the sample facts, only shipment 1 matches.select(...)chooses which values become DataFrame columns.alias("shipment_id")makes the column name stable and easy to read.to_df()runs the query and returns results as a pandas DataFrame.
- You can write the same query with
selectbeforewhereorwherebeforeselect. Choose the order that reads best for your use case.
Rank and select top results with rank, asc, desc, and top
Section titled “Rank and select top results with rank, asc, desc, and top”Ranking can help you find the “most important” matches when your query returns a large number of rows.
Ordering helpers like asc and desc define how ranking works.
Use aggregates.rank with aggregates.desc when you want to rank output by descending value:
from relationalai.semantics import Integer, Modelfrom relationalai.semantics.std import aggregates
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.delay_in_days = m.Property(f"{Order} delay in days is {Integer}")
m.define( Order.new(id=1, delay_in_days=5), Order.new(id=2, delay_in_days=2), Order.new(id=3, delay_in_days=7),)
q = m.select( Order.id.alias("order_id"), Order.delay_in_days, aggregates.rank(aggregates.desc(Order.delay_in_days)).alias("delay_rank"),)
df = q.to_df()print(df.head())aggregates.desc(Order.delay_in_days)defines the ordering for ranking. Larger delays are ranked first.
- You can use
aggregates.ascto rank in ascending order instead. In that case, smaller delays would be ranked first. aggregates.rank(...)produces a numeric rank column. It does not sort the output DataFrame.- Because the rank is a column, you can filter on it (for example
where(delay_rank <= 10)) or sort the DataFrame after materialization.
Use aggregates.top when you want a top-
from relationalai.semantics import Integer, Modelfrom relationalai.semantics.std import aggregates
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.delay_in_days = m.Property(f"{Order} delay in days is {Integer}")
m.define( Order.new(id=1, delay_in_days=5), Order.new(id=2, delay_in_days=2), Order.new(id=3, delay_in_days=7),)
q = ( m.select(Order.id.alias("order_id"), Order.delay_in_days) .where(aggregates.top(10, Order.delay_in_days)))
df = q.to_df()print(df.head())aggregates.top(10, Order.delay_in_days)is a predicate you use inwhere(...). It keeps only the top-matches by the chosen value. - In this snippet, there are fewer than 10 orders, so all rows match.
Lower the
10to a smaller number to see the filter remove rows.
- Top-
selection is descending by default (largest values are kept). - You can use
aggregates.bottomto keep the smallest values instead.
Deduplicate rows with distinct
Section titled “Deduplicate rows with distinct”Use distinct to remove duplicate rows or to count unique values.
This comes up most often when relationship multiplicity produces duplicate rows in your output.
For distinct rows, you must wrap the entire selected row.
Use distinct as a SELECT DISTINCT-style wrapper:
from relationalai.semantics import DateTime, Integer, Model, String, distinctfrom relationalai.semantics.std import aggregatesfrom relationalai.semantics.std.datetime import datetime
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.status = m.Property(f"{Order} status is {String}")Order.created_at = m.Property(f"{Order} created at is {DateTime}")
m.define( Order.new(id=1, status="shipped", created_at=datetime(2025, 12, 1)), Order.new(id=2, status="delayed", created_at=datetime(2025, 12, 1)), Order.new(id=3, status="shipped", created_at=datetime(2025, 12, 1)),)
q = m.select(distinct(Order.status))print(q.to_df())
q_unique = m.select( aggregates.count(distinct(Order.status)).alias("unique_statuses"))print(q_unique.to_df())select(distinct(Order.status))returns unique scalar values ofOrder.status. This is theSELECT DISTINCTpattern.count(distinct(Order.status))counts how many unique status values exist. With the sample facts, the result is 2 (“shipped” and “delayed”).- Use
distinct(...)to control whether you are deduplicating rows or just the input to an aggregate.
- If you select more than one column, you must wrap the entire row in
distinct(...)to get unique rows. For example,select(distinct(Order.status, Order.created_at))deduplicates the(status, created_at)row pairs.
Aggregate results with count and per
Section titled “Aggregate results with count and per”Aggregates help you summarize results into small tables you can sanity-check quickly.
Use them to answer questions like “how many orders per status” while iterating on a model.
Grouping is explicit, and .per(...) defines the grouping dimensions.
Use aggregates.count with .per(...) to compute counts per group:
from relationalai.semantics import Integer, Model, Stringfrom relationalai.semantics.std import aggregates
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.status = m.Property(f"{Order} status is {String}")
m.define( Order.new(id=1, status="shipped"), Order.new(id=2, status="delayed"), Order.new(id=3, status="shipped"),)
q = m.select( Order.status, aggregates.count(Order).per(Order.status).alias("order_count"),)
df = q.to_df()print(df)aggregates.count(Order)counts how manyOrderentities match the current fragment..per(Order.status)groups the count by status. This is the equivalent of “group by status” in SQL-style aggregation.- The resulting DataFrame has one row per status and a corresponding count.
If you need a “unique count per group”, wrap the counted value with
distinct(...)first.
Materialize results with to_df
Section titled “Materialize results with to_df”Materialization turns a fragment into concrete results you can inspect and use in Python. Use it after your fragment selects the right columns and filters the right rows. If materialization fails, the fix is often configuration, credentials, or failing requirements rather than the query shape.
Materialize a fragment with Fragment.to_df:
from relationalai.semantics import Integer, Model
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order.id)
df = q.to_df() # Materialization happens here.print(df)m.define(Order.new(...))lazily creates some data to query.m.select(Order.id)builds a fragment that describes the query shape (one column with order ids) but does not run any work.q.to_df()executes the fragment and returns a pandas DataFrame. This is the point where compilation and execution happen, and where requirements (if any) are checked.
Export results to Snowflake tables with into and exec
Section titled “Export results to Snowflake tables with into and exec”Exporting writes a fragment’s results into a Snowflake table for downstream use. Use this when you want a snapshot that other tools can query.
Use the following table to help you choose how to specify the destination table for your export:
| What to use | When to use it |
|---|---|
| Use the fully qualified table name | Use this when you want to write the destination table name directly in code. |
| Use a short name defined in your config | Use this when you want the destination to come from raiconfig.yaml, such as when development, staging, and production use different tables. |
Use the fully qualified table name
Section titled “Use the fully qualified table name”Pass a Table object to Fragment.into with a fully qualified name to create an export target, then call Fragment.exec to run the export:
from relationalai.semantics import Integer, Model
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order.id.alias("order_id"))
# Create a Table object with the fully qualified name of the destination table.out = m.Table("<OUTPUT_DB>.<OUTPUT_SCHEMA>.<OUTPUT_TABLE>")
q.into(out).exec()out = m.Table("<OUTPUT_DB>.<OUTPUT_SCHEMA>.<OUTPUT_TABLE>")creates an export target with a fully qualified name.q.into(out).exec()runs the export, writing results into the specified Snowflake table.- If the target table already exists,
q.into(out).exec()replaces it by default. Useq.into(out, update=True).exec()when you want an update export instead.
- You need write access to the destination schema.
Declare a destination schema for stable exports
Section titled “Declare a destination schema for stable exports”Declare a destination schema when the Snowflake table should keep columns that are not present in every export result. Use this when later exports need the same destination columns, even if one run does not produce values for all of them:
from relationalai.semantics import Integer, Model, String
m = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})Order.status = m.Property(f"{Order} has {String:status}")
m.define( Order.new(id=1, status="shipped"), Order.new(id=2, status="pending"),)
q = m.select(Order.id.alias("order_id"), Order.status)
out = m.Table( "<OUTPUT_DB>.<OUTPUT_SCHEMA>.<OUTPUT_TABLE>", schema={ "order_id": Integer, "status": String, "shipped_at": String, },)
q.into(out).exec()- The query exports only
order_idandstatus. - The destination schema also declares
shipped_at. - PyRel keeps
shipped_atin the destination table, even though it is not present in the query, and fills it withNULL.
Use a short name defined in your config
Section titled “Use a short name defined in your config”Rather than specifying the full destination table name in your code, you can define a short export name in your configuration and refer to it when creating a Table object.
This keeps physical table details out of your code and makes it possible to control physical destinations with profile overrides.
To use a short configured name:
-
Set the export table config in
raiconfig.yamlraiconfig.yaml tables:MY_NAMED_EXPORT:fqn: "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>" -
Use the short configured name in your model
from relationalai.semantics import Integer, Modelm = Model("MyModel")Order = m.Concept("Order", identify_by={"id": Integer})m.define(Order.new(id=1), Order.new(id=2))# Create a query fragment.q = m.select(Order.id.alias("order_id"))# Create a Table object for the configured export name.out = m.Table("MY_NAMED_EXPORT")# Inspect the resolved destination.print(out.info.physical_name)# Execute the export.q.into(out).exec()
-
Create the same
tablesconfig in Pythonfrom relationalai.config import create_configcfg = create_config(tables={"MY_NAMED_EXPORT": {"fqn": "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>",},}) -
Use the short configured name in your model
from relationalai.semantics import Integer, Modelm = Model("MyModel", config=cfg)Order = m.Concept("Order", identify_by={"id": Integer})m.define(Order.new(id=1), Order.new(id=2))# Create a query fragment.q = m.select(Order.id.alias("order_id"))# Create a Table object for the configured export name.out = m.Table("MY_NAMED_EXPORT")# Inspect the resolved destination.print(out.info.physical_name)# Execute the export.q.into(out).exec()
- The
fqnoption in the config defines the physical Snowflake table destination for the export. out = m.Table("MY_NAMED_EXPORT")creates aTableobject that refers to the configured export name. The physical name is resolved from the configuration at this point.q.into(out).exec()runs the export, writing results into the specified Snowflake table.- If the target table already exists,
q.into(out).exec()replaces it by default. Useq.into(out, update=True).exec()when you want an update export instead.
- You need write access to the destination schema.