Portfolio Balancing

Version:

Experience level: Intermediate
Reasoning types: Prescriptive
Industry: Finance
Tags: AllocationQP
Experience level: Intermediate
Reasoning types: Prescriptive
Industry: Finance
Tags: AllocationQP
Experience level: Intermediate
Reasoning types: Prescriptive, Rules-based, Graph
Industry: Finance
Tags: Quadratic ProgrammingRisk MinimizationPortfolio OptimizationMulti-ObjectiveScenario AnalysisIpoptMulti-ReasonerChained ReasoningComplianceGraph AnalysisCommunity DetectionStress Testing

What this template is for

Investors and portfolio managers often need to allocate capital across multiple assets while balancing expected return against risk. This template implements a classic Markowitz mean-variance model that chooses non-negative allocations to minimize portfolio variance subject to a minimum expected return target.

This template uses RelationalAI’s prescriptive reasoning (optimization) capabilities to compute an optimal allocation under constraints, and to run a small scenario analysis that illustrates the risk/return trade-off.

Prescriptive reasoning helps you:

Quantify trade-offs between return targets and risk.
Enforce constraints like budgets and no-short-selling.
Explore scenarios by varying the minimum expected return.

Who this is for

You want an end-to-end example of prescriptive reasoning (optimization) with quadratic objectives.
You’re comfortable with basic Python and optimization concepts (risk/return, covariance).

What you’ll build

A semantic model for stocks, expected returns, and pairwise covariance.
A quadratic program that chooses non-negative allocations.
A minimum return constraint and a variance-minimization objective.
A scenario loop over different minimum return targets with a summary table.

What’s included

Model + solve script: portfolio_balancing.py
Sample data: data/returns.csv, data/covariance.csv
Outputs: per-scenario solver status/objective, allocation table, and a scenario summary

Prerequisites

Access

A Snowflake account that has the RAI Native App installed.
A Snowflake user with permissions to access the RAI Native App.

Tools

Python >= 3.10

Quickstart

Follow these steps to run the template with the included sample data.

Download the ZIP file for this template and extract it:
Terminal window
```
curl -O https://private.relational.ai/templates/zips/v0.13/portfolio_balancing.zip
unzip portfolio_balancing.zip
cd portfolio_balancing
```
You can also download the template ZIP using the “Download ZIP” button at the top of this page.

Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip

Install dependencies
Terminal window
```
python -m pip install .
```
Configure Snowflake connection and RAI profile
Terminal window
```
rai init
```
Run the template
Terminal window
```
python portfolio_balancing.py
```

Expected output

The script solves three scenarios for the minimum expected return target.

Running scenario: min_return = 10
  Status: OPTIMAL, Objective: ...

  Portfolio allocation:
  name   value
  ...

==================================================
Scenario Analysis Summary
==================================================
  10: OPTIMAL, obj=...
  20: OPTIMAL, obj=...
  30: OPTIMAL, obj=...

Template structure

.
├─ README.md
├─ pyproject.toml
├─ portfolio_balancing.py    # main runner / entrypoint
└─ data/                     # sample input data
   ├─ returns.csv
   └─ covariance.csv

Start here: portfolio_balancing.py

Sample data

Data files are in data/.

`returns.csv`

Defines one expected return value per stock.

Column	Meaning
`index`	Stock identifier
`returns`	Expected return (decimal, e.g., `0.04` = 4%)

`covariance.csv`

Defines pairwise covariance values between stock pairs.

Column	Meaning
`i`	First stock index
`j`	Second stock index
`covar`	Covariance between stocks `i` and `j`

Model overview

The semantic model uses a single concept (Stock) and a pairwise covariance property (Stock.covar). The decision variable is a continuous allocation per stock.

`Stock`

Represents an investable asset.

Property	Type	Identifying?	Notes
`index`	int	Yes	Loaded from `data/returns.csv`
`returns`	float	No	Expected return
`covar`	float	No	Pairwise covariance with another `Stock`
`quantity`	float	No	Decision variable (continuous, non-negative)

How it works

This section walks through the highlights in portfolio_balancing.py.

Import libraries and configure inputs

First, the script imports the Semantics and optimization APIs, configures the data directory, and defines the key parameters:

from pathlib import Path

import pandas
from pandas import read_csv

from relationalai.semantics import Float, Model, data, require, select, sum, where
from relationalai.semantics.reasoners.optimization import Solver, SolverModel

# --------------------------------------------------
# Configure inputs
# --------------------------------------------------

DATA_DIR = Path(__file__).parent / "data"

# Disable pandas inference of string types. This ensures that string columns
# in the CSVs are loaded as object dtype. This is only required when using
# relationalai versions prior to v1.0.
pandas.options.future.infer_string = False

# Budget and minimum return parameters.
BUDGET = 1000
MIN_RETURN = 20

Define concepts and load CSV data

Next, it creates a Model, defines the Stock concept, and loads both CSVs. The covariance values are defined by joining stock indices using where(...).define(...):

# --------------------------------------------------
# Define semantic model & load data
# --------------------------------------------------

# Create a Semantics model container.
model = Model("portfolio", config=globals().get("config", None), use_lqp=False)

# Stock concept: available investments with expected returns.
Stock = model.Concept("Stock")
Stock.returns = model.Property("{Stock} has {returns:float}")

# Load expected return data from CSV.
data(read_csv(DATA_DIR / "returns.csv")).into(Stock, keys=["index"])

# Stock.covar property: covariance matrix between stock pairs.
Stock.covar = model.Property("{Stock} and {stock2:Stock} have {covar:float}")
Stock2 = Stock.ref()

# Load covariance data from CSV.
covar_csv = read_csv(DATA_DIR / "covariance.csv")
pairs = data(covar_csv)
where(
    Stock.index == pairs.i,
    Stock2.index == pairs.j
).define(
    Stock.covar(Stock, Stock2, pairs.covar)
)

Define decision variables, constraints, and objective

Then it creates a decision variable Stock.x_quantity and registers constraints and the quadratic variance objective inside build_formulation(...):

# --------------------------------------------------
# Model the decision problem
# --------------------------------------------------

# Stock.x_quantity decision variable: amount allocated to each stock.
Stock.x_quantity = model.Property("{Stock} quantity is {x:float}")

c = Float.ref()

# Scenario parameter. This is updated inside the scenario loop.
min_return = MIN_RETURN

# Budget is fixed across scenarios.
budget = BUDGET


def build_formulation(s):
    """Register variables, constraints, and objective on the solver model."""
    # Decision variable: quantity of each stock.
    s.solve_for(Stock.x_quantity, name=["qty", Stock.index])

    # Constraint: no short selling.
    bounds = require(Stock.x_quantity >= 0)
    s.satisfy(bounds)

    # Constraint: budget limit.
    budget_constraint = require(sum(Stock.x_quantity) <= budget)
    s.satisfy(budget_constraint)

    # Constraint: minimum return target (scenario parameter).
    return_constraint = require(sum(Stock.returns * Stock.x_quantity) >= min_return)
    s.satisfy(return_constraint)

    # Objective: minimize portfolio risk (variance)
    risk = sum(c * Stock.x_quantity * Stock2.quantity).where(Stock.covar(Stock2, c))
    s.minimize(risk)

Solve and print results

Finally, the script loops over multiple values of min_return, creates a fresh SolverModel for each scenario, and prints both the allocation and a summary:

# --------------------------------------------------
# Solve with Scenario Analysis (Numeric Parameter)
# --------------------------------------------------

SCENARIO_PARAM = "min_return"
SCENARIO_VALUES = [10, 20, 30]

scenario_results = []

for scenario_value in SCENARIO_VALUES:
    print(f"\nRunning scenario: {SCENARIO_PARAM} = {scenario_value}")

    # Set scenario parameter value.
    min_return = scenario_value

    # Create a fresh SolverModel for each scenario.
    s = SolverModel(model, "cont")
    build_formulation(s)

    solver = Solver("highs")
    s.solve(solver, time_limit_sec=60)

    scenario_results.append({
        "scenario": scenario_value,
        "status": str(s.termination_status),
        "objective": s.objective_value,
    })
    print(f"  Status: {s.termination_status}, Objective: {s.objective_value}")

    # Print portfolio allocation from solver results.
    var_df = s.variable_values().to_df()
    qty_df = var_df[
        var_df["name"].str.startswith("qty") & (var_df["float"] > 0.001)
    ].rename(columns={"float": "value"})
    print(f"\n  Portfolio allocation:")
    print(qty_df.to_string(index=False))

# --------------------------------------------------
# Solve and check solution
# --------------------------------------------------

# Print a scenario summary table.
print("\n" + "=" * 50)
print("Scenario Analysis Summary")
print("=" * 50)
for result in scenario_results:
    print(f"  {result['scenario']}: {result['status']}, obj={result['objective']}")

Troubleshooting

I get ModuleNotFoundError when running the script

Confirm you created and activated the virtual environment from the Quickstart.
Reinstall dependencies with python -m pip install ..
Verify you are running python portfolio_balancing.py from the portfolio_balancing/ folder.

The script fails while reading a CSV from data/

Confirm data/returns.csv and data/covariance.csv exist.
Verify headers match the expected columns (index, returns, i, j, covar).
Check for missing values and non-numeric entries in return/covariance columns.

I see an unexpected termination status (not OPTIMAL)

Try re-running; if you hit a time limit, consider increasing time_limit_sec.
If you changed scenario parameters, confirm the minimum return target is feasible given the budget.

What this template is for

Prescriptive reasoning helps you:

Quantify trade-offs between return targets and risk.
Enforce constraints like budgets and no-short-selling.
Explore scenarios by varying the minimum expected return.

Who this is for

You want an end-to-end example of prescriptive reasoning (optimization) with quadratic objectives.
You’re comfortable with basic Python and optimization concepts (risk/return, covariance).

What you’ll build

A semantic model for stocks, expected returns, and pairwise covariance.
A quadratic program that chooses non-negative allocations.
A minimum return constraint and a variance-minimization objective.
A scenario loop over different minimum return targets with a summary table.

What’s included

Model + solve script: portfolio_balancing.py
Sample data: data/returns.csv, data/covariance.csv
Outputs: per-scenario solver status/objective, allocation table, and a scenario summary

Prerequisites

Access

A Snowflake account that has the RAI Native App installed.
A Snowflake user with permissions to access the RAI Native App.

Tools

Python >= 3.10

Quickstart

Follow these steps to run the template with the included sample data.

Download the ZIP file for this template and extract it:
Terminal window
```
curl -O https://private.relational.ai/templates/zips/v0.14/portfolio_balancing.zip
unzip portfolio_balancing.zip
cd portfolio_balancing
```
You can also download the template ZIP using the “Download ZIP” button at the top of this page.

Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip

Install dependencies
Terminal window
```
python -m pip install .
```
Configure Snowflake connection and RAI profile
Terminal window
```
rai init
```
Run the template
Terminal window
```
python portfolio_balancing.py
```

Expected output

The script solves three scenarios for the minimum expected return target.

Running scenario: min_return = 10
  Status: OPTIMAL, Objective: ...

  Portfolio allocation:
  name   value
  ...

==================================================
Scenario Analysis Summary
==================================================
  10: OPTIMAL, obj=...
  20: OPTIMAL, obj=...
  30: OPTIMAL, obj=...

Template structure

.
├─ README.md
├─ pyproject.toml
├─ portfolio_balancing.py    # main runner / entrypoint
└─ data/                     # sample input data
   ├─ returns.csv
   └─ covariance.csv

Start here: portfolio_balancing.py

Sample data

Data files are in data/.

`returns.csv`

Defines one expected return value per stock.

Column	Meaning
`index`	Stock identifier
`returns`	Expected return (decimal, e.g., `0.04` = 4%)

`covariance.csv`

Defines pairwise covariance values between stock pairs.

Column	Meaning
`i`	First stock index
`j`	Second stock index
`covar`	Covariance between stocks `i` and `j`

Model overview

The semantic model uses a single concept (Stock) and a pairwise covariance property (Stock.covar). The decision variable is a continuous allocation per stock.

`Stock`

Represents an investable asset.

Property	Type	Identifying?	Notes
`index`	int	Yes	Loaded from `data/returns.csv`
`returns`	float	No	Expected return
`covar`	float	No	Pairwise covariance with another `Stock`
`x_quantity`	float	No	Decision variable (continuous, non-negative)

How it works

This section walks through the highlights in portfolio_balancing.py.

Import libraries and configure inputs

First, the script imports the Semantics and optimization APIs, configures the data directory, and defines the key parameters:

from pathlib import Path

import pandas
from pandas import read_csv

from relationalai.semantics import Float, Model, Relationship, data, require, select, sum, where
from relationalai.semantics.reasoners.optimization import Solver, SolverModel

# --------------------------------------------------
# Configure inputs
# --------------------------------------------------

DATA_DIR = Path(__file__).parent / "data"

# Disable pandas inference of string types. This ensures that string columns
# in the CSVs are loaded as object dtype. This is only required when using
# relationalai versions prior to v1.0.
pandas.options.future.infer_string = False

# Budget and minimum return parameters.
BUDGET = 1000
MIN_RETURN = 20

Define concepts and load CSV data

Next, it creates a Model, defines the Stock concept, and loads both CSVs. The covariance values are defined by joining stock indices using where(...).define(...):

# --------------------------------------------------
# Define semantic model & load data
# --------------------------------------------------

# Create a Semantics model container.
model = Model("portfolio", config=globals().get("config", None))

# Stock concept: available investments with expected returns.
Stock = model.Concept("Stock")
Stock.returns = model.Property("{Stock} has {returns:float}")

# Load expected return data from CSV.
data(read_csv(DATA_DIR / "returns.csv")).into(Stock, keys=["index"])

# Stock.covar property: covariance matrix between stock pairs.
Stock.covar = model.Relationship("{Stock} and {stock2:Stock} have {covar:float}")
OtherStock = Stock.ref()

# Load covariance data from CSV.
covar_csv = read_csv(DATA_DIR / "covariance.csv")
pairs = data(covar_csv)
where(
    Stock.index == pairs.i,
    OtherStock.index == pairs.j
).define(
    Stock.covar(Stock, OtherStock, pairs.covar)
)

Define decision variables, constraints, and objective

Then it creates a decision variable Stock.x_quantity and registers constraints and the quadratic variance objective inside build_formulation(...):

# --------------------------------------------------
# Model the decision problem
# --------------------------------------------------

# Stock.x_quantity decision variable: amount allocated to each stock.
Stock.x_quantity = model.Property("{Stock} quantity is {x:float}")

covar_val = Float.ref()

# Scenario parameter. This is updated inside the scenario loop.
min_return = MIN_RETURN

# Budget is fixed across scenarios.
budget = BUDGET


def build_formulation(s):
    """Register variables, constraints, and objective on the solver model."""
    # Decision variable: quantity of each stock.
    s.solve_for(Stock.x_quantity, name=["qty", Stock.index])

    # Constraint: no short selling.
    bounds = require(Stock.x_quantity >= 0)
    s.satisfy(bounds)

    # Constraint: budget limit.
    budget_constraint = require(sum(Stock.x_quantity) <= budget)
    s.satisfy(budget_constraint)

    # Constraint: minimum return target (scenario parameter).
    return_constraint = require(sum(Stock.returns * Stock.x_quantity) >= min_return)
    s.satisfy(return_constraint)

    # Objective: minimize portfolio risk (variance)
    risk = sum(covar_val * Stock.x_quantity * OtherStock.x_quantity).where(Stock.covar(OtherStock, covar_val))
    s.minimize(risk)

Solve and print results

Finally, the script loops over multiple values of min_return, creates a fresh SolverModel for each scenario, and prints both the allocation and a summary:

# --------------------------------------------------
# Solve with Scenario Analysis (Numeric Parameter)
# --------------------------------------------------

SCENARIO_PARAM = "min_return"
SCENARIO_VALUES = [10, 20, 30]

scenario_results = []

for scenario_value in SCENARIO_VALUES:
    print(f"\nRunning scenario: {SCENARIO_PARAM} = {scenario_value}")

    # Set scenario parameter value.
    min_return = scenario_value

    # Create a fresh SolverModel for each scenario.
    s = SolverModel(model, "cont")
    build_formulation(s)

    solver = Solver("highs")
    s.solve(solver, time_limit_sec=60)

    scenario_results.append({
        "scenario": scenario_value,
        "status": str(s.termination_status),
        "objective": s.objective_value,
    })
    print(f"  Status: {s.termination_status}, Objective: {s.objective_value}")

    # Print portfolio allocation from solver results.
    var_df = s.variable_values().to_df()
    qty_df = var_df[
        var_df["name"].str.startswith("qty") & (var_df["value"] > 0.001)
    ]
    print(f"\n  Portfolio allocation:")
    print(qty_df.to_string(index=False))

# --------------------------------------------------
# Solve and check solution
# --------------------------------------------------

# Print a scenario summary table.
print("\n" + "=" * 50)
print("Scenario Analysis Summary")
print("=" * 50)
for result in scenario_results:
    print(f"  {result['scenario']}: {result['status']}, obj={result['objective']}")

Troubleshooting

I get ModuleNotFoundError when running the script

Confirm you created and activated the virtual environment from the Quickstart.
Reinstall dependencies with python -m pip install ..
Verify you are running python portfolio_balancing.py from the portfolio_balancing/ folder.

The script fails while reading a CSV from data/

Confirm data/returns.csv and data/covariance.csv exist.
Verify headers match the expected columns (index, returns, i, j, covar).
Check for missing values and non-numeric entries in return/covariance columns.

I see an unexpected termination status (not OPTIMAL)

Try re-running; if you hit a time limit, consider increasing time_limit_sec.
If you changed scenario parameters, confirm the minimum return target is feasible given the budget.

What this template is for

Portfolio managers don’t want to pay twice for the same exposure — if two funds track nearly the same benchmark, owning both is one bet with worse bookkeeping. This template chains four reasoning stages on a single shared ontology to build compliant, risk-optimized portfolios across an 8-stock universe and stress-test them under a crisis regime.

It uses RelationalAI’s rules, graph, and prescriptive reasoners in a chained workflow:

Rules scan the current book for compliance violations — overconcentrated holdings (> 15% of balance), sector concentration (> 30%), and high-risk traders — as derived Relationships.
Graph builds a correlation graph from the covariance matrix, runs Louvain clustering, and picks the highest-Sharpe stock per cluster as the cluster’s representative. 8 stocks collapse to 5 distinct bets; near-duplicates are dropped from the investable universe rather than capped within it.
Prescriptive optimization solves a bi-objective Markowitz QP on the representative-only universe under position and sector caps, tracing the efficient frontier via the epsilon constraint method across a Scenario Concept that combines three budgets and two regimes.
Crisis stress test is the same solve_epsilon call — no separate model — but Scenario.regime picks a PSD-preserving shrinkage covariance, so base and crisis frontiers come out of one pipeline.

Each stage writes derived properties the next reads directly: Rules define the thresholds Stage 3 enforces as constraints, Stage 2’s Stock.is_representative shapes the decision space, and the stress test reads Stock.regime_covar keyed by Scenario.regime. See “How it works” for the full data flow.

Why this problem matters

Portfolio managers don’t want to pay twice for the same exposure. If two funds track nearly the same benchmark, allocating 5k to the other is functionally a single $9k bet with worse bookkeeping. Sector labels alone miss this: two tech ETFs can share a Technology label and still be near-duplicates, or two instruments from different sectors can co-move strongly enough that owning both is redundant. And base-case optimization is optimistic — under crisis regimes (correlations spike toward 1), everything that hasn’t been deduplicated hurts twice.

The four-stage approach addresses each gap. Stage 1 surfaces existing violations in the current book (diagnostic). Stage 2 clusters by return covariance and picks the highest-Sharpe representative per cluster, collapsing redundant bets. Stage 3 optimizes over the representative-only universe under position and sector limits. Stage 4 re-solves under a PSD-preserving crisis covariance to stress the resulting portfolio.

Key design patterns demonstrated

Shared compliance thresholds — SECTOR_LIMIT is defined once and enforced in both stages. POSITION_LIMIT (Stage 1 per-stock compliance) and REP_POSITION_LIMIT (Stage 3 per-representative cap) are deliberately different: a representative carries its cluster’s combined exposure, so the construction-side cap is higher than the holdings-side compliance cap
Graph results feed optimization — Louvain cluster ids and per-cluster argmax (highest Sharpe) both persist on Stock, and the optimizer’s Stock.is_non_representative() constraint forces non-reps to zero (complement defined positively because the prescriptive rewriter doesn’t accept model.not_() in a solver .where())
Collapse, don’t cap — the graph stage reduces the investable universe to distinct bets rather than allowing all N stocks and capping within redundant groups
Scenario Concept for parameter sweeps — Scenario entities combine budget (1,000, $2,000) and regime (base, crisis) so each epsilon solve handles all six combinations in one call
Epsilon constraint method — solve_epsilon(eps_rate) sweeps return targets across the feasible range, producing the full Pareto frontier without manually fixing return values
PSD-preserving stress covariance — correlation shrinkage toward all-ones keeps the QP convex at every lambda, unlike naive off-diagonal scaling
Quadratic programming via Ipopt — the risk objective is quadratic (x' * Cov * x), solved with Ipopt’s nonlinear optimizer rather than a linear MIP solver
Anchor solves establish feasible range — Anchor 1 (minimize risk) and Anchor 2 (maximize return) determine the return rate range before the epsilon sweep

Who this is for

Quantitative analysts and portfolio managers exploring mean-variance optimization
Data scientists learning quadratic programming with RelationalAI
Finance students studying the Markowitz efficient frontier
Anyone interested in risk-return trade-off analysis with scenario comparisons

What you’ll build

A rules-based compliance pipeline using RAI derived properties and Relationships to flag overconcentrated holdings, sector concentration violations, and high-risk traders
A correlation graph over stocks with Louvain community detection, plus per-cluster representative selection by highest Sharpe
A quadratic programming model that minimizes portfolio variance subject to position and sector limits on a representative-only universe (non-reps forced to zero)
Budget and no-short-selling constraints across multiple (budget, regime) scenarios
Epsilon constraint method sweeping return targets to trace the efficient frontier
Anchor solves to establish the feasible return range
Pareto analysis with marginal cost and knee detection
A crisis-regime stress test using PSD-preserving correlation shrinkage to compare base vs crisis frontiers side-by-side

What’s included

portfolio_balancing.py — Main script with all four stages: rules-based compliance, covariance clustering (Louvain), bi-objective QP with epsilon sweep, and crisis-regime stress test
data/returns.csv — Stock universe: index, ticker, sector, expected returns (8 stocks)
data/covar.csv — Covariance matrix entries (i, j, covariance value)
data/users.csv — User profiles with risk scores
data/accounts.csv — Account balances
data/holdings.csv — Current holdings per account and stock
data/transactions.csv — Transaction history with flagged-transaction indicators
pyproject.toml — Python package configuration with dependencies

Prerequisites

Access

A Snowflake account that has the RAI Native App installed.
A Snowflake user with permissions to access the RAI Native App.

Tools

Python >= 3.10
RelationalAI Python SDK (relationalai) == 1.0.14

Quickstart

Download ZIP:

curl -O https://docs.relational.ai/templates/zips/v1/portfolio_balancing.zip
unzip portfolio_balancing.zip
cd portfolio_balancing

Create venv:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

Install:
Terminal window
```
python -m pip install .
```
Configure:
Terminal window
```
rai init
```
Run:
Terminal window
```
python portfolio_balancing.py
```

Expected output (sample — full output covers all four stages):

======================================================================
STAGE 1: COMPLIANCE ANALYSIS (rules)
======================================================================

--- Rule 1: Overconcentrated Holdings (position > 15% of balance) ---
  holding_id=1, ticker=AAPL, account_id=1, value=18000.00, balance=100000.00, pct=18.0%
  ...
--- Rule 2: Sector Concentration (sector > 30% of balance) ---
  account_id=1, sector=Technology, sector_value=34000.00, pct=34.0%
  ...
--- Rule 3: High Risk Traders (risk_score > 0.8 AND >5 flagged txns) ---
  user_id=1, name=Alice Chen, risk_score=0.85
  ...

======================================================================
STAGE 2: GRAPH -- Covariance Clustering (Louvain)
======================================================================
  Correlation graph: 4 edges (|correlation| >= 0.3)
  Louvain communities: 5 cluster(s)
    Cluster 1 (size 2): JNJ (Healthcare), PFE (Healthcare)
    Cluster 2 (size 3): AAPL (Technology), MSFT (Technology), GOOGL (Technology)
    Cluster 3 (size 1): JPM (Financials)
    Cluster 4 (size 1): PG (Consumer Staples)
    Cluster 5 (size 1): XOM (Energy)

  Avg correlation: intra-cluster = +0.683, inter-cluster = +0.131

  Cluster representatives (5 of 8 stocks, picked by highest Sharpe):
    Cluster 1: PFE (Healthcare) -- Sharpe = 0.530
    Cluster 2: GOOGL (Technology) -- Sharpe = 0.605
    Cluster 3: JPM (Financials) -- Sharpe = 0.500
    Cluster 4: PG (Consumer Staples) -- Sharpe = 0.444
    Cluster 5: XOM (Energy) -- Sharpe = 0.588

======================================================================
STAGE 3: BI-OBJECTIVE OPTIMIZATION
(position + sector limits on representative universe; base & crisis regimes)
======================================================================

ANCHOR SOLVE 1: Minimize risk (no return constraint)
Status: LOCALLY_SOLVED
  base_500:    return = 32.4335, risk =   1160.3926
  base_1000:   return = 64.8673, risk =   4641.5704
  base_2000:   return = 129.7346, risk =  18566.2815
  crisis_500:  return = 31.6873, risk =   1913.5995
  crisis_1000: return = 63.3745, risk =   7654.3979
  crisis_2000: return = 126.7490, risk =  30617.5917

ANCHOR SOLVE 2: Maximize return
Status: LOCALLY_SOLVED
  base_500/crisis_500:   return = 42.0000
  base_1000/crisis_1000: return = 84.0000
  base_2000/crisis_2000: return = 168.0000

Return rate range: [0.0634, 0.0840] per unit invested

EPSILON SWEEP: 5 interior points
  Point 1 .. Point 5: all LOCALLY_SOLVED

EFFICIENT FRONTIER: Risk vs Return (per scenario)

  base_500 (budget=500, regime=base):
    #      Label     Return         Risk
    --------------------------------------
    1   min_risk      32.43    1160.3926
    2      eps_1      33.41    1176.7790
    3      eps_2      35.12    1262.6111
    4      eps_3      36.84    1385.8901
    5      eps_4      38.56    1545.7909
    6      eps_5      40.28    1742.4712

  Marginal analysis: rate climbs 16.85 -> 49.94 -> 71.72 -> 93.03 -> 114.43 risk/return.
  Knee: Point 2 (eps_1) -- marginal cost jumps 3.0x beyond this point.

(similar tables for base_1000, base_2000, crisis_500, crisis_1000, crisis_2000)

======================================================================
STAGE 4: CRISIS REGIME STRESS TEST
(PSD-preserving correlation shrinkage, alpha = 0.7)
======================================================================

  Volatility comparison (sqrt risk) -- base vs crisis at each lambda:

  Budget 500:
      Label     vol_base   vol_crisis        gap    gap_%
  --------------------------------------------------------
   min_risk      34.0645      43.7447    +9.6802   +28.4%
      eps_1      34.3042      44.5398   +10.2356   +29.8%
      eps_2      35.5332      46.1119   +10.5787   +29.8%
      eps_3      37.2275      47.9433   +10.7158   +28.8%
      eps_4      39.3165      49.9933   +10.6768   +27.2%
      eps_5      41.7429      52.2694   +10.5265   +25.2%

  (similar tables for Budget 1000 and Budget 2000, identical gap_% pattern)

Crisis volatility sits 25-30% above base at every lambda and the gap peaks in the middle of the frontier (eps_1..eps_2 at +29.8%), not at the concentrated end (eps_5 at +25.2%). That inversion is the payoff of the representative-only universe: at the concentrated end the optimizer is picking the highest-Sharpe distinct bet per cluster, which incidentally sits in sectors with lower crisis correlations (Energy, Consumer Staples). Without the representative collapse, the concentrated end would stack near-duplicates and see the crisis gap grow, not shrink.

Template structure

.
├── README.md
├── pyproject.toml
├── portfolio_balancing.py
└── data/
    ├── returns.csv
    ├── covar.csv
    ├── users.csv
    ├── accounts.csv
    ├── holdings.csv
    └── transactions.csv

How it works

This section walks through the highlights in portfolio_balancing.py.

Reasoner overview

Stage	Reasoner	Reads from ontology	Writes to ontology	Role
1	Rules	Holding, Account, User, Transaction, Stock	Holding.is_overconcentrated, Holding.is_sector_concentrated, User.is_high_risk_trader	4 overconcentrated holdings (AAPL 18%, MSFT 16%, JNJ 16%, PFE 16.2%). 2 sector concentrations (Technology 34%, Healthcare 32.2%). 2 high-risk traders (Alice Chen 0.85, Eve Taylor 0.92).
2	Graph (Louvain)	Stock.covar (diagonal for variance), derived Stock.correlation filtered at threshold 0.3	Stock.variance, Stock.volatility, Stock.correlation, Stock.cluster, Stock.sharpe, Stock.cluster_max_sharpe, Stock.is_representative	4 edges retained after thresholding. Louvain yields 5 clusters; 5 representatives picked by highest Sharpe (one per cluster). Collapses 8 stocks to 5 distinct bets.
3	Prescriptive (QP)	Stock.returns, Stock.regime_covar, Stock.is_representative, Scenario.budget, Scenario.regime	Stock.x_quantity indexed by Scenario (non-reps forced to 0)	Min-risk and max-return anchors bracket the frontier. Epsilon sweep traces 5 interior points per (budget, regime). Programmatic knee detection at eps_1.
4	Prescriptive (stress)	Stock.regime_covar under “crisis” regime	(shares Stock.x_quantity with Stage 3)	Crisis volatility 25-30% higher than base at every lambda; gap peaks at the middle of the frontier (eps_1..eps_3) and narrows toward both ends. The representative-only universe keeps the concentrated end from stacking near-duplicate bets that would otherwise amplify crisis vol.

All four stages share a single RAI model. Compliance thresholds are defined once at the top of the script. Stage 1 uses POSITION_LIMIT = 0.15 and SECTOR_LIMIT = 0.30 to flag existing violations as derived Relationships. Stage 3 re-uses SECTOR_LIMIT but applies REP_POSITION_LIMIT = 0.30 to the decision variable: after representative collapse each cluster has exactly one carrier, so its cap is legitimately higher than a per-stock compliance cap.

How the reasoners chain

Each stage writes derived properties the next reads directly. Stage 1’s thresholds (POSITION_LIMIT, SECTOR_LIMIT) become Stage 3 constraints. Stage 2’s Stock.is_representative and Stock.is_non_representative shape Stage 3’s decision space (non-reps forced to zero). Stage 4 uses the same solve_epsilon call as Stage 3 — the Regime concept keyed into Stock.regime_covar makes base vs crisis a scenario view on the same solve, not a separate model. The Reasoner overview table above names each property that crosses a stage boundary.

Multi-scenario Pareto frontier in one pipeline

Scenario combines three budgets and two regimes — six tuples. Each solve_epsilon(eps_rate) call returns one optimal allocation per tuple; the epsilon sweep repeats across return-rate targets. Six scenarios × seven points (two anchors + five interior) = 42 optimal portfolios, all from seven solver invocations. Two consequences:

Base and crisis are comparable at equal budget and equal lambda: the vol gap is a pure regime effect, not a re-fitting artifact.
Adding a fourth regime or a fifth budget is a data edit in scenario_data, not a code change in solve_epsilon. Scenarios are data.

Stage 1: Rules-based compliance analysis

The first stage defines compliance flags as RAI derived properties and Relationships. The model loads portfolio data (users, accounts, holdings, transactions) alongside the stock universe, then evaluates three rules using two configurable thresholds:

POSITION_LIMIT = 0.15   # max fraction of budget per stock
SECTOR_LIMIT = 0.30     # max fraction of budget per sector

Rule 1 — Overconcentrated holdings: a holding whose value exceeds POSITION_LIMIT of the account balance. The holding value is a derived property:

Holding.value = model.Property(f"{Holding} has value {Float:holding_value}")
model.define(Holding.value(Holding.quantity * Holding.purchase_price))

Holding.is_overconcentrated = model.Relationship(f"{Holding} is overconcentrated")
AccountR1 = Account.ref()
model.where(
    Holding.account(AccountR1),
    Holding.value > POSITION_LIMIT * AccountR1.balance,
).define(Holding.is_overconcentrated())

Rule 2 — Sector concentration: total holdings in a sector exceeding SECTOR_LIMIT of the account balance. Uses aggregation to sum holding values per (account, sector):

sector_exposure = sum(HoldingSC.value).where(
    HoldingSC.account(AccountSC),
    HoldingSC.stock(StockSC),
    StockSC.sector_ref(SectorSC),
).per(AccountSC, SectorSC)

model.where(
    Holding.account(AccountSC),
    Holding.stock(StockR2),
    StockR2.sector_ref(SectorSC),
    sector_exposure > SECTOR_LIMIT * AccountSC.balance,
).define(Holding.is_sector_concentrated())

Rule 3 — High-risk traders: users with risk_score > 0.8 and more than 5 flagged transactions. Flagged transaction count is computed via aggregation:

flagged_count = sum(TransactionHR.is_flagged_val).where(
    TransactionHR.user(User),
).per(User)

model.where(
    User.risk_score > 0.8,
    flagged_count > 5,
).define(User.is_high_risk_trader())

Stage 2: Graph — covariance clustering

Volatility and correlation are derived in PyRel from the base covariance, so the ontology is the single source of truth for every similarity metric. Stock.variance picks the covariance diagonal, Stock.volatility applies sqrt(variance) via relationalai.semantics.std.math.sqrt, and Stock.correlation(i, j) = covar(i, j) / (vol_i * vol_j):

Stock.volatility = model.Property(f"{Stock} has {Float:stock_volatility}")
model.define(Stock.volatility(sqrt(Stock.variance)))

Stock.correlation = model.Property(
    f"{Stock} and {Stock} have correlation {Float:stock_correlation}"
)
PairedStockCorr = Stock.ref()
cov_ij_ref = Float.ref()
model.where(
    Stock.covar(PairedStockCorr, cov_ij_ref),
).define(
    Stock.correlation(
        PairedStockCorr,
        cov_ij_ref / (Stock.volatility * PairedStockCorr.volatility),
    )
)

The Graph reasoner builds an undirected graph with Stock as the node concept. Edges are filtered in PyRel directly against the derived correlation property — no upstream edge list required:

corr_graph = Graph(
    model,
    directed=False,
    weighted=False,
    node_concept=Stock,
    aggregator="sum",
)

stock_i_ref = Stock.ref()
stock_j_ref = Stock.ref()
corr_ref = Float.ref()
model.define(corr_graph.Edge.new(src=stock_i_ref, dst=stock_j_ref)).where(
    stock_i_ref.correlation(stock_j_ref, corr_ref),
    stock_i_ref.index < stock_j_ref.index,
    math_abs(corr_ref) >= CORR_THRESHOLD,
)

Louvain community detection runs directly on the graph and returns (node, cluster_id) pairs. The cluster id is persisted as a Stock property so downstream stages can consume it:

community = corr_graph.louvain()
cluster_label = Integer.ref("cluster_label")
Stock.cluster = model.Property(f"{Stock} in cluster {Integer:cluster_id}")
stock_clust_ref = Stock.ref()
model.define(stock_clust_ref.cluster(cluster_label)).where(
    community(stock_clust_ref, cluster_label)
)

The script reports cluster sizes and intra- vs inter-cluster average correlation as a sanity check that the clustering separates co-moving stocks from independent ones.

After clustering, Stage 2 picks one representative per cluster — the stock with the highest Sharpe ratio — using per-group argmax in PyRel. Only the representatives will be eligible for allocation in Stage 3:

Stock.sharpe = model.Property(f"{Stock} has Sharpe {Float:stock_sharpe}")
model.define(Stock.sharpe(Stock.returns / Stock.volatility))

peer_for_max = Stock.ref()
Stock.cluster_max_sharpe = model.Property(
    f"{Stock} has cluster max Sharpe {Float:cluster_max_sharpe}"
)
model.define(
    Stock.cluster_max_sharpe(
        aggs.max(peer_for_max.sharpe)
        .where(peer_for_max.cluster == Stock.cluster)
        .per(Stock)
    )
)

Stock.is_representative = model.Relationship(f"{Stock} is cluster representative")
model.where(Stock.sharpe == Stock.cluster_max_sharpe).define(
    Stock.is_representative()
)

Stage 3: Bi-objective optimization

Scenario concept and decision variables

The Stock concept (defined earlier for all stages) carries ticker, sector, expected returns, and the base covariance matrix. Stage 2 added Stock.variance, Stock.volatility, Stock.correlation, Stock.cluster, Stock.sharpe, Stock.cluster_max_sharpe, Stock.is_representative, and Stock.is_non_representative on top. Stage 3 consumes the representative flag via its compliance constraints, and adds budget-and-regime scenarios, regime-conditioned covariance, and decision variables.

Scenarios combine budget and regime so each epsilon solve handles all six (budget, regime) combinations simultaneously:

Regime = model.Concept("Regime", identify_by={"regime_name": String})
model.define(Regime.new(regime_name="base"))
model.define(Regime.new(regime_name="crisis"))

Scenario = model.Concept("Scenario", identify_by={"name": String})
Scenario.budget = model.Property(f"{Scenario} has {Float:budget}")
Scenario.regime = model.Property(f"{Scenario} in {Regime}")
scenario_data = model.data(
    [
        ("base_500", 500, "base"),
        ("base_1000", 1000, "base"),
        ("base_2000", 2000, "base"),
        ("crisis_500", 500, "crisis"),
        ("crisis_1000", 1000, "crisis"),
        ("crisis_2000", 2000, "crisis"),
    ],
    columns=["name", "budget", "regime"],
)
model.define(
    s := Scenario.new(name=scenario_data["name"]),
    s.budget(scenario_data["budget"]),
)
# Link Scenario to Regime by matching the regime name from the data.
scenario_link_ref = Scenario.ref()
regime_link_ref = Regime.ref()
model.where(
    scenario_link_ref.name == scenario_data["name"],
    regime_link_ref.regime_name == scenario_data["regime"],
).define(scenario_link_ref.regime(regime_link_ref))

Define decision variables, constraints, and objective

Each stock gets a continuous quantity variable indexed by Scenario (multi-argument Property).

Stock.x_quantity = model.Property(f"{Stock} in {Scenario} has {Float:quantity}")
x_qty = Float.ref()

Two concentration limits plus a representative-only filter are added via _add_compliance_constraints. Position and sector caps behave as before; the Stock.is_non_representative() relation forces every non-representative stock to zero allocation, which is how the graph stage’s redundancy removal shows up at solve time. The complement is defined positively because the prescriptive rewriter can’t accept model.not_(...) inside a solver constraint:

def _add_compliance_constraints(problem):
    # Position limit: each representative <= REP_POSITION_LIMIT * budget.
    problem.satisfy(model.where(
        Stock.x_quantity(Scenario, x_qty),
    ).require(x_qty <= REP_POSITION_LIMIT * Scenario.budget))

    # Sector limit: total allocation to stocks in same sector <= SECTOR_LIMIT * budget.
    sector_alloc = sum(x_qty).where(
        Stock.x_quantity(Scenario, x_qty),
        Stock.sector == s_sector_ref.sector,
    ).per(Scenario, s_sector_ref.sector)
    problem.satisfy(model.where(
        Stock.x_quantity(Scenario, x_qty),
    ).require(sector_alloc <= SECTOR_LIMIT * Scenario.budget))

    # Representative-only: non-representative stocks forced to zero.
    problem.satisfy(model.where(
        Stock.x_quantity(Scenario, x_qty),
        Stock.is_non_representative(),
    ).require(x_qty == 0))

The risk objective is quadratic in the decision variables and uses the regime-conditioned covariance: each Scenario picks its matching regime’s covariance, so base and crisis scenarios solve against different covariances in the same call.

problem.minimize(
    sum(regime_cov_val * x_qty * x_qty_paired)
    .where(
        Stock.regime_covar(PairedStock, Scenario.regime, regime_cov_val),
        Stock.x_quantity(Scenario, x_qty),
        PairedStock.x_quantity(Scenario, x_qty_paired),
    )
)

Solve anchor points and run the epsilon sweep

Two anchor solves establish the feasible return range. Anchor 1 minimizes risk with no return constraint. Anchor 2 maximizes return.

result1 = solve_epsilon(eps_rate=None)

The epsilon sweep then traces interior points. Each solve minimizes risk subject to a return-rate floor that scales with budget.

n_interior = 5
epsilon_rates = [
    return_rate_min + i * (return_rate_max - return_rate_min) / (n_interior + 1)
    for i in range(1, n_interior + 1)
]

for i, rate in enumerate(epsilon_rates):
    result = solve_epsilon(eps_rate=rate)

Pareto analysis output

The script prints the efficient frontier per (budget, regime) scenario, marginal risk-per-return between adjacent points, and programmatic knee detection where the marginal cost jumps most.

Stage 4: Crisis regime stress test

Crisis covariance is derived in PyRel via PSD-preserving correlation shrinkage, keyed by a Regime concept. The shrinkage formula rho_crisis = alpha * rho + (1 - alpha) * J re-expressed in covariance units becomes cov_crisis(i, j) = alpha * cov(i, j) + (1 - alpha) * vol_i * vol_j — a convex combination of PSD matrices, so PSD is preserved by construction:

Stock.regime_covar = model.Property(
    f"{Stock} and {Stock} in {Regime} have {Float:regime_covar}"
)

# Base regime: covariance unchanged.
model.where(
    Stock.covar(PairedStockBase, base_cov_ref),
    base_regime_ref.regime_name == "base",
).define(Stock.regime_covar(PairedStockBase, base_regime_ref, base_cov_ref))

# Crisis regime: convex combination of base covariance and vol_i * vol_j.
model.where(
    Stock.covar(PairedStockCrisis, crisis_cov_ref),
    crisis_regime_ref.regime_name == "crisis",
).define(
    Stock.regime_covar(
        PairedStockCrisis,
        crisis_regime_ref,
        CRISIS_ALPHA * crisis_cov_ref
        + (1 - CRISIS_ALPHA) * Stock.volatility * PairedStockCrisis.volatility,
    )
)

Both regimes live on the same Stock.regime_covar property, keyed by the Regime concept, so Stage 3’s objective can select the right covariance per scenario without branching:

After the Stage 3 sweep finishes, Stage 4 emits a side-by-side comparison of base and crisis volatility (sqrt(risk)) at each epsilon point, grouped by budget. Crisis volatility is consistently 25-30% higher than base at every lambda. The gap peaks in the middle of the frontier (eps_1..eps_3) and narrows toward both ends. That shape is the payoff of the representative-only universe: at the concentrated end the optimizer is picking the highest-Sharpe distinct bet per cluster (Energy and Consumer Staples in this dataset), which happen to have lower crisis correlations than the middle of the frontier. Without the representative collapse, the concentrated end would stack near-duplicates and the crisis gap would grow instead of shrink.

Customize this template

Adjust compliance thresholds: POSITION_LIMIT (default 0.15) applies in Stage 1 compliance rules (per-stock holdings). REP_POSITION_LIMIT (default 0.30) applies in Stage 3 optimization (per-representative allocation, which carries its cluster’s combined exposure). SECTOR_LIMIT (default 0.30) applies to both. Note that REP_POSITION_LIMIT must satisfy REP_POSITION_LIMIT * num_representatives >= 1.0 or the fully-invested constraint becomes infeasible.
Tune the correlation graph: Raise or lower CORR_THRESHOLD (default 0.3) to control graph sparsity. Higher thresholds produce fewer edges and more singleton clusters; lower thresholds produce a denser graph and fewer, larger clusters.
Change the representative picking rule: Stage 2 picks the highest-Sharpe stock per cluster. To pick differently, change the Stock.cluster_max_sharpe derivation — e.g., replace Stock.sharpe with Stock.returns (highest return), -Stock.volatility (lowest vol), or a weighted blend. Singletons are always their own representative regardless of rule.
Adjust crisis severity: Lower CRISIS_ALPHA (default 0.7) shrinks correlations harder toward all-ones (more severe crisis). alpha = 1.0 is no crisis (base); alpha = 0.0 is maximum crisis (all correlations = 1). Values between 0.5 and 0.9 give interesting comparisons while keeping the QP well-conditioned.
Add more stocks: Extend returns.csv and covar.csv with additional assets and their covariance entries.
Add compliance rules: Define additional Relationships in the rules stage (e.g., minimum holding period, transaction velocity limits).
Allow short selling: Remove the non-negativity constraint to allow negative holdings.
Adjust frontier resolution: Increase n_interior for a finer-grained efficient frontier.
Maximize return for given risk: Flip the formulation to maximize expected return subject to a risk budget.
Transaction costs: Add a linear or quadratic penalty term for rebalancing from an existing portfolio.

Troubleshooting

Problem is infeasible

The return rate target may be too high for the available stocks and budget. Reduce n_interior to use fewer sweep points, or increase the budget values in the scenario data.

rai init fails or connection errors

Ensure your Snowflake credentials are configured correctly and that the RAI Native App is installed on your account. Run rai init again and verify the connection settings.

ModuleNotFoundError for relationalai

Make sure you activated the virtual environment and ran python -m pip install . from the template directory. The pyproject.toml declares the required dependencies.

Solver reports non-convex or numerical issues

Ensure the covariance matrix is symmetric and positive semi-definite. Check that covar.csv contains entries for all (i, j) pairs and that covar(i,j) == covar(j,i). The Ipopt solver finds locally optimal solutions for convex QP problems.