Skip to content

Query a model

Build queries from a semantic model and get results back in Python. Use this when you want to explore a model, validate new facts while iterating, or extract a small result set for downstream code. Building a query is lazy, so compilation and runtime errors usually surface when you first materialize results.

A query in PyRel is a Fragment: a composable description of what you want to filter and return.

Most query-building methods return a new fragment instead of running work immediately. That makes it cheap to build up a query in small steps, reuse pieces, and debug the shape of your query before you execute it.

Execution happens when you materialize a fragment. The two most common materialization paths are Fragment.to_df (to get a pandas DataFrame) and Fragment.into + Fragment.exec (to export into a Snowflake table).

Here’s a minimal “compose, then materialize” pattern using Model.select:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
q = m.select(Order.id)
# Nothing runs until you materialize.
# This is where compilation, execution, and requirement checking happen.
df = q.to_df()
print(df.head())
  • q = m.select(Order.id) builds a fragment. It describes the result shape (one Order.id column) but does not execute.
  • q.to_df() is the materialization step. This is where the fragment is compiled and executed, and where requirements (if any) are checked.
  • Because the snippet declares schema but does not define any facts, the DataFrame is valid but typically empty. That keeps the example focused on the “lazy until materialized” behavior.

Selecting values defines what columns you want back when you materialize a fragment. Choose the smallest explicit selection that matches your question so your DataFrame schema is predictable. If you start broad, you can always refine later by calling select again on an existing fragment.

You can select properties to return readable scalar columns, like “order id” and “promised ship date”. This is similar to selecting columns from a SQL table.

Use Model.select with properties:

from relationalai.semantics import DateTime, Integer, Model
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.promised_ship_date = m.Property(f"{Order} promised ship date is {DateTime}")
m.define(
Order.new(id=1, promised_ship_date=datetime(2025, 12, 10)),
Order.new(id=2, promised_ship_date=datetime(2025, 12, 12)),
)
q = m.select(
Order.id,
Order.promised_ship_date,
)
df = q.to_df()
print(df.head())
  • m.define(Order.new(...)) creates Order entities with id and promised_ship_date properties.
  • m.select(Order.id, Order.promised_ship_date) selects those properties as columns in the output.
  • The resulting DataFrame has two columns: id (integers) and promised_ship_date (datetimes). There is one row per order.

Choose this when you need multiple fields of a relationship (including non-output fields) as separate output columns.

Relationship output fields are the default value. To reference a non-output field, index the relationship by name to create a FieldRef.

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Shipment = m.Concept("Shipment", identify_by={"id": Integer})
Carrier = m.Concept("Carrier", identify_by={"id": Integer})
Shipment.shipped_for = m.Relationship(
f"{Shipment} via {Carrier:carrier} for {Order:order}"
)
m.define(
o1 := Order.new(id=1),
o2 := Order.new(id=2),
s1 := Shipment.new(id=1),
s2 := Shipment.new(id=2),
c1 := Carrier.new(id=1),
c2 := Carrier.new(id=2),
s1.shipped_for(c1, o1), # Shipment 1 via Carrier 1 for Order 1
s2.shipped_for(c2, o2), # Shipment 2 via Carrier 2 for Order 2
)
# Select fields by type
q = m.select(
Shipment.id,
Shipment.shipped_for[Carrier],
Shipment.shipped_for[Order],
)
# Alternatively, select fields by name
q = m.select(
Shipment.id,
Shipment.shipped_for["carrier"].id,
Shipment.shipped_for["order"].id,
)
# Select the relationship chain without indexing to get the output field
q = m.select(
Shipment.id,
Shipment.shipped_for.id, # This is the same as Shipment.shipped_for[Carrier].id
)
df = q.to_df()
print(df.head())
  • Shipment.shipped_for = m.Relationship(...) declares a relationship with three fields: a Shipment field, a Carrier field (named “carrier”), and an Order field (named “order”).
  • Shipment.shipped_for[Carrier].id selects the Carrier field by type and gets the id property for entities in that field. Shipment.shipped_for["carrier"].id is equivalent.
  • The output DataFrame has three columns: a shipment id, a carrier id, and an order id. Each row corresponds to a (Shipment, Carrier, Order) triple that matches the relationship.

Alias selected values when you want stable, readable column names in your output. This is especially helpful when you export results or when you want your DataFrame schema to stay stable as your model evolves.

Use alias(...) on selected values:

from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.status = m.Property(f"{Order} status is {String}")
m.define(
Order.new(id=1, status="shipped"),
Order.new(id=2, status="delayed"),
)
q = m.select(
Order.id.alias("order_id"),
Order.status.alias("order_status"),
)
df = q.to_df()
print(df.head())
  • Order.id.alias("order_id") sets the DataFrame column name for the selected id property.
  • Order.status.alias("order_status") does the same for the selected status property.
  • Aliasing affects output column names. It does not change your model schema or property names.

You can select a concept to get one row per entity. Although this isn’t very common in practice, it can be helpful for debugging.

Use Model.select with a concept handle:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order)
df = q.to_df()
print(df.head())
  • m.define(Order.new(...)) creates a small set of Order entities so the output has rows.
  • m.select(Order) selects entities that are members of the Order concept.
  • The resulting DataFrame has one column (named Order) and one row per order entity. The values in the Order column are internal entity references.
  • Entity references are internal values that point to specific entities in the model. They are not human-readable, but they are stable and can be helpful for debugging.

Filtering keeps only the rows that match the conditions you care about. Start a filter fragment with Model.where, then select the columns you want and materialize.

from relationalai.semantics import DateTime, Integer, Model
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
Shipment = m.Concept("Shipment", identify_by={"id": Integer})
Shipment.shipped_at = m.Property(f"{Shipment} shipped at {DateTime}")
m.define(
Shipment.new(id=1, shipped_at=datetime(2025, 12, 15)),
Shipment.new(id=2, shipped_at=datetime(2025, 12, 9)),
)
q = (
m.where(
Shipment.shipped_at > datetime(2025, 12, 10),
).select(
Shipment.id.alias("shipment_id"),
Shipment.shipped_at,
)
)
df = q.to_df()
print(df.head())
  • m.define(...) adds two shipments with different ship dates, so the filter has something to include and exclude.
  • m.where(Shipment.shipped_at > ...) keeps only shipments shipped after the cutoff date. With the sample facts, only shipment 1 matches.
  • select(...) chooses which values become DataFrame columns. alias("shipment_id") makes the column name stable and easy to read.
  • to_df() runs the query and returns results as a pandas DataFrame.
  • You can write the same query with select before where or where before select. Choose the order that reads best for your use case.

Rank and select top results with rank, asc, desc, and top

Section titled “Rank and select top results with rank, asc, desc, and top”

Ranking can help you find the “most important” matches when your query returns a large number of rows. Ordering helpers like asc and desc define how ranking works.

Use aggregates.rank with aggregates.desc when you want to rank output by descending value:

from relationalai.semantics import Integer, Model
from relationalai.semantics.std import aggregates
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.delay_in_days = m.Property(f"{Order} delay in days is {Integer}")
m.define(
Order.new(id=1, delay_in_days=5),
Order.new(id=2, delay_in_days=2),
Order.new(id=3, delay_in_days=7),
)
q = m.select(
Order.id.alias("order_id"),
Order.delay_in_days,
aggregates.rank(aggregates.desc(Order.delay_in_days)).alias("delay_rank"),
)
df = q.to_df()
print(df.head())
  • aggregates.desc(Order.delay_in_days) defines the ordering for ranking. Larger delays are ranked first.
  • You can use aggregates.asc to rank in ascending order instead. In that case, smaller delays would be ranked first.
  • aggregates.rank(...) produces a numeric rank column. It does not sort the output DataFrame.
  • Because the rank is a column, you can filter on it (for example where(delay_rank <= 10)) or sort the DataFrame after materialization.

Use aggregates.top when you want a top- filter:

from relationalai.semantics import Integer, Model
from relationalai.semantics.std import aggregates
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.delay_in_days = m.Property(f"{Order} delay in days is {Integer}")
m.define(
Order.new(id=1, delay_in_days=5),
Order.new(id=2, delay_in_days=2),
Order.new(id=3, delay_in_days=7),
)
q = (
m.select(Order.id.alias("order_id"), Order.delay_in_days)
.where(aggregates.top(10, Order.delay_in_days))
)
df = q.to_df()
print(df.head())
  • aggregates.top(10, Order.delay_in_days) is a predicate you use in where(...). It keeps only the top- matches by the chosen value.
  • In this snippet, there are fewer than 10 orders, so all rows match. Lower the 10 to a smaller number to see the filter remove rows.
  • Top- selection is descending by default (largest values are kept).
  • You can use aggregates.bottom to keep the smallest values instead.

Use distinct to remove duplicate rows or to count unique values. This comes up most often when relationship multiplicity produces duplicate rows in your output. For distinct rows, you must wrap the entire selected row.

Use distinct as a SELECT DISTINCT-style wrapper:

from relationalai.semantics import DateTime, Integer, Model, String, distinct
from relationalai.semantics.std import aggregates
from relationalai.semantics.std.datetime import datetime
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.status = m.Property(f"{Order} status is {String}")
Order.created_at = m.Property(f"{Order} created at is {DateTime}")
m.define(
Order.new(id=1, status="shipped", created_at=datetime(2025, 12, 1)),
Order.new(id=2, status="delayed", created_at=datetime(2025, 12, 1)),
Order.new(id=3, status="shipped", created_at=datetime(2025, 12, 1)),
)
q = m.select(distinct(Order.status))
print(q.to_df())
q_unique = m.select(
aggregates.count(distinct(Order.status)).alias("unique_statuses")
)
print(q_unique.to_df())
  • select(distinct(Order.status)) returns unique scalar values of Order.status. This is the SELECT DISTINCT pattern.
  • count(distinct(Order.status)) counts how many unique status values exist. With the sample facts, the result is 2 (“shipped” and “delayed”).
  • Use distinct(...) to control whether you are deduplicating rows or just the input to an aggregate.
  • If you select more than one column, you must wrap the entire row in distinct(...) to get unique rows. For example, select(distinct(Order.status, Order.created_at)) deduplicates the (status, created_at) row pairs.

Aggregates help you summarize results into small tables you can sanity-check quickly. Use them to answer questions like “how many orders per status” while iterating on a model. Grouping is explicit, and .per(...) defines the grouping dimensions.

Use aggregates.count with .per(...) to compute counts per group:

from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import aggregates
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.status = m.Property(f"{Order} status is {String}")
m.define(
Order.new(id=1, status="shipped"),
Order.new(id=2, status="delayed"),
Order.new(id=3, status="shipped"),
)
q = m.select(
Order.status,
aggregates.count(Order).per(Order.status).alias("order_count"),
)
df = q.to_df()
print(df)
  • aggregates.count(Order) counts how many Order entities match the current fragment.
  • .per(Order.status) groups the count by status. This is the equivalent of “group by status” in SQL-style aggregation.
  • The resulting DataFrame has one row per status and a corresponding count. If you need a “unique count per group”, wrap the counted value with distinct(...) first.

Materialization turns a fragment into concrete results you can inspect and use in Python. Use it after your fragment selects the right columns and filters the right rows. If materialization fails, the fix is often configuration, credentials, or failing requirements rather than the query shape.

Materialize a fragment with Fragment.to_df:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order.id)
df = q.to_df() # Materialization happens here.
print(df)
  • m.define(Order.new(...)) lazily creates some data to query.
  • m.select(Order.id) builds a fragment that describes the query shape (one column with order ids) but does not run any work.
  • q.to_df() executes the fragment and returns a pandas DataFrame. This is the point where compilation and execution happen, and where requirements (if any) are checked.

Export results to Snowflake tables with into and exec

Section titled “Export results to Snowflake tables with into and exec”

Exporting writes a fragment’s results into a Snowflake table for downstream use. Use this when you want a snapshot that other tools can query.

Use the following table to help you choose how to specify the destination table for your export:

What to useWhen to use it
Use the fully qualified table nameUse this when you want to write the destination table name directly in code.
Use a short name defined in your configUse this when you want the destination to come from raiconfig.yaml, such as when development, staging, and production use different tables.

Pass a Table object to Fragment.into with a fully qualified name to create an export target, then call Fragment.exec to run the export:

from relationalai.semantics import Integer, Model
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
m.define(Order.new(id=1), Order.new(id=2))
q = m.select(Order.id.alias("order_id"))
# Create a Table object with the fully qualified name of the destination table.
out = m.Table("<OUTPUT_DB>.<OUTPUT_SCHEMA>.<OUTPUT_TABLE>")
q.into(out).exec()
  • out = m.Table("<OUTPUT_DB>.<OUTPUT_SCHEMA>.<OUTPUT_TABLE>") creates an export target with a fully qualified name.
  • q.into(out).exec() runs the export, writing results into the specified Snowflake table.
  • If the target table already exists, q.into(out).exec() replaces it by default. Use q.into(out, update=True).exec() when you want an update export instead.
  • You need write access to the destination schema.

Declare a destination schema for stable exports

Section titled “Declare a destination schema for stable exports”

Declare a destination schema when the Snowflake table should keep columns that are not present in every export result. Use this when later exports need the same destination columns, even if one run does not produce values for all of them:

from relationalai.semantics import Integer, Model, String
m = Model("MyModel")
Order = m.Concept("Order", identify_by={"id": Integer})
Order.status = m.Property(f"{Order} has {String:status}")
m.define(
Order.new(id=1, status="shipped"),
Order.new(id=2, status="pending"),
)
q = m.select(Order.id.alias("order_id"), Order.status)
out = m.Table(
"<OUTPUT_DB>.<OUTPUT_SCHEMA>.<OUTPUT_TABLE>",
schema={
"order_id": Integer,
"status": String,
"shipped_at": String,
},
)
q.into(out).exec()
  • The query exports only order_id and status.
  • The destination schema also declares shipped_at.
  • PyRel keeps shipped_at in the destination table, even though it is not present in the query, and fills it with NULL.

Rather than specifying the full destination table name in your code, you can define a short export name in your configuration and refer to it when creating a Table object. This keeps physical table details out of your code and makes it possible to control physical destinations with profile overrides.

To use a short configured name:

  1. Set the export table config in raiconfig.yaml

    raiconfig.yaml
    tables:
    MY_NAMED_EXPORT:
    fqn: "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>"
  2. Use the short configured name in your model

    from relationalai.semantics import Integer, Model
    m = Model("MyModel")
    Order = m.Concept("Order", identify_by={"id": Integer})
    m.define(Order.new(id=1), Order.new(id=2))
    # Create a query fragment.
    q = m.select(Order.id.alias("order_id"))
    # Create a Table object for the configured export name.
    out = m.Table("MY_NAMED_EXPORT")
    # Inspect the resolved destination.
    print(out.info.physical_name)
    # Execute the export.
    q.into(out).exec()
  • The fqn option in the config defines the physical Snowflake table destination for the export.
  • out = m.Table("MY_NAMED_EXPORT") creates a Table object that refers to the configured export name. The physical name is resolved from the configuration at this point.
  • q.into(out).exec() runs the export, writing results into the specified Snowflake table.
  • If the target table already exists, q.into(out).exec() replaces it by default. Use q.into(out, update=True).exec() when you want an update export instead.
  • You need write access to the destination schema.