Portfolio Balancing
Allocate investment across stocks to minimize risk while achieving a target return.
Allocate investment across stocks to minimize risk while achieving a target return.
Multi-reasoner template: rules-based compliance, covariance clustering, and bi-objective Markowitz optimization with a crisis-regime stress test.
Browse files
Browse files
What this template is for
Investors and portfolio managers often need to allocate capital across multiple assets while balancing expected return against risk. This template implements a classic Markowitz mean-variance model that chooses non-negative allocations to minimize portfolio variance subject to a minimum expected return target.
This template uses RelationalAI’s prescriptive reasoning (optimization) capabilities to compute an optimal allocation under constraints, and to run a small scenario analysis that illustrates the risk/return trade-off.
Prescriptive reasoning helps you:
- Quantify trade-offs between return targets and risk.
- Enforce constraints like budgets and no-short-selling.
- Explore scenarios by varying the minimum expected return.
Who this is for
- You want an end-to-end example of prescriptive reasoning (optimization) with quadratic objectives.
- You’re comfortable with basic Python and optimization concepts (risk/return, covariance).
What you’ll build
- A semantic model for stocks, expected returns, and pairwise covariance.
- A quadratic program that chooses non-negative allocations.
- A minimum return constraint and a variance-minimization objective.
- A scenario loop over different minimum return targets with a summary table.
What’s included
- Model + solve script:
portfolio_balancing.py - Sample data:
data/returns.csv,data/covariance.csv - Outputs: per-scenario solver status/objective, allocation table, and a scenario summary
Prerequisites
Access
- A Snowflake account that has the RAI Native App installed.
- A Snowflake user with permissions to access the RAI Native App.
Tools
- Python >= 3.10
Quickstart
Follow these steps to run the template with the included sample data.
-
Download the ZIP file for this template and extract it:
Terminal window curl -O https://private.relational.ai/templates/zips/v0.13/portfolio_balancing.zipunzip portfolio_balancing.zipcd portfolio_balancing -
Create and activate a virtual environment
Terminal window python -m venv .venvsource .venv/bin/activatepython -m pip install -U pip -
Install dependencies
Terminal window python -m pip install . -
Configure Snowflake connection and RAI profile
Terminal window rai init -
Run the template
Terminal window python portfolio_balancing.py -
Expected output
The script solves three scenarios for the minimum expected return target.
Running scenario: min_return = 10Status: OPTIMAL, Objective: ...Portfolio allocation:name value...==================================================Scenario Analysis Summary==================================================10: OPTIMAL, obj=...20: OPTIMAL, obj=...30: OPTIMAL, obj=...
Template structure
.├─ README.md├─ pyproject.toml├─ portfolio_balancing.py # main runner / entrypoint└─ data/ # sample input data ├─ returns.csv └─ covariance.csvStart here: portfolio_balancing.py
Sample data
Data files are in data/.
returns.csv
Defines one expected return value per stock.
| Column | Meaning |
|---|---|
index | Stock identifier |
returns | Expected return (decimal, e.g., 0.04 = 4%) |
covariance.csv
Defines pairwise covariance values between stock pairs.
| Column | Meaning |
|---|---|
i | First stock index |
j | Second stock index |
covar | Covariance between stocks i and j |
Model overview
The semantic model uses a single concept (Stock) and a pairwise covariance property (Stock.covar). The decision variable is a continuous allocation per stock.
Stock
Represents an investable asset.
| Property | Type | Identifying? | Notes |
|---|---|---|---|
index | int | Yes | Loaded from data/returns.csv |
returns | float | No | Expected return |
covar | float | No | Pairwise covariance with another Stock |
quantity | float | No | Decision variable (continuous, non-negative) |
How it works
This section walks through the highlights in portfolio_balancing.py.
Import libraries and configure inputs
First, the script imports the Semantics and optimization APIs, configures the data directory, and defines the key parameters:
from pathlib import Path
import pandasfrom pandas import read_csv
from relationalai.semantics import Float, Model, data, require, select, sum, wherefrom relationalai.semantics.reasoners.optimization import Solver, SolverModel
# --------------------------------------------------# Configure inputs# --------------------------------------------------
DATA_DIR = Path(__file__).parent / "data"
# Disable pandas inference of string types. This ensures that string columns# in the CSVs are loaded as object dtype. This is only required when using# relationalai versions prior to v1.0.pandas.options.future.infer_string = False
# Budget and minimum return parameters.BUDGET = 1000MIN_RETURN = 20Define concepts and load CSV data
Next, it creates a Model, defines the Stock concept, and loads both CSVs. The covariance values are defined by joining stock indices using where(...).define(...):
# --------------------------------------------------# Define semantic model & load data# --------------------------------------------------
# Create a Semantics model container.model = Model("portfolio", config=globals().get("config", None), use_lqp=False)
# Stock concept: available investments with expected returns.Stock = model.Concept("Stock")Stock.returns = model.Property("{Stock} has {returns:float}")
# Load expected return data from CSV.data(read_csv(DATA_DIR / "returns.csv")).into(Stock, keys=["index"])
# Stock.covar property: covariance matrix between stock pairs.Stock.covar = model.Property("{Stock} and {stock2:Stock} have {covar:float}")Stock2 = Stock.ref()
# Load covariance data from CSV.covar_csv = read_csv(DATA_DIR / "covariance.csv")pairs = data(covar_csv)where( Stock.index == pairs.i, Stock2.index == pairs.j).define( Stock.covar(Stock, Stock2, pairs.covar))Define decision variables, constraints, and objective
Then it creates a decision variable Stock.x_quantity and registers constraints and the quadratic variance objective inside build_formulation(...):
# --------------------------------------------------# Model the decision problem# --------------------------------------------------
# Stock.x_quantity decision variable: amount allocated to each stock.Stock.x_quantity = model.Property("{Stock} quantity is {x:float}")
c = Float.ref()
# Scenario parameter. This is updated inside the scenario loop.min_return = MIN_RETURN
# Budget is fixed across scenarios.budget = BUDGET
def build_formulation(s): """Register variables, constraints, and objective on the solver model.""" # Decision variable: quantity of each stock. s.solve_for(Stock.x_quantity, name=["qty", Stock.index])
# Constraint: no short selling. bounds = require(Stock.x_quantity >= 0) s.satisfy(bounds)
# Constraint: budget limit. budget_constraint = require(sum(Stock.x_quantity) <= budget) s.satisfy(budget_constraint)
# Constraint: minimum return target (scenario parameter). return_constraint = require(sum(Stock.returns * Stock.x_quantity) >= min_return) s.satisfy(return_constraint)
# Objective: minimize portfolio risk (variance) risk = sum(c * Stock.x_quantity * Stock2.quantity).where(Stock.covar(Stock2, c)) s.minimize(risk)Solve and print results
Finally, the script loops over multiple values of min_return, creates a fresh SolverModel for each scenario, and prints both the allocation and a summary:
# --------------------------------------------------# Solve with Scenario Analysis (Numeric Parameter)# --------------------------------------------------
SCENARIO_PARAM = "min_return"SCENARIO_VALUES = [10, 20, 30]
scenario_results = []
for scenario_value in SCENARIO_VALUES: print(f"\nRunning scenario: {SCENARIO_PARAM} = {scenario_value}")
# Set scenario parameter value. min_return = scenario_value
# Create a fresh SolverModel for each scenario. s = SolverModel(model, "cont") build_formulation(s)
solver = Solver("highs") s.solve(solver, time_limit_sec=60)
scenario_results.append({ "scenario": scenario_value, "status": str(s.termination_status), "objective": s.objective_value, }) print(f" Status: {s.termination_status}, Objective: {s.objective_value}")
# Print portfolio allocation from solver results. var_df = s.variable_values().to_df() qty_df = var_df[ var_df["name"].str.startswith("qty") & (var_df["float"] > 0.001) ].rename(columns={"float": "value"}) print(f"\n Portfolio allocation:") print(qty_df.to_string(index=False))
# --------------------------------------------------# Solve and check solution# --------------------------------------------------
# Print a scenario summary table.print("\n" + "=" * 50)print("Scenario Analysis Summary")print("=" * 50)for result in scenario_results: print(f" {result['scenario']}: {result['status']}, obj={result['objective']}")Troubleshooting
I get ModuleNotFoundError when running the script
- Confirm you created and activated the virtual environment from the Quickstart.
- Reinstall dependencies with
python -m pip install .. - Verify you are running
python portfolio_balancing.pyfrom theportfolio_balancing/folder.
The script fails while reading a CSV from data/
- Confirm
data/returns.csvanddata/covariance.csvexist. - Verify headers match the expected columns (
index,returns,i,j,covar). - Check for missing values and non-numeric entries in return/covariance columns.
I see an unexpected termination status (not OPTIMAL)
- Try re-running; if you hit a time limit, consider increasing
time_limit_sec. - If you changed scenario parameters, confirm the minimum return target is feasible given the budget.
What this template is for
Investors and portfolio managers often need to allocate capital across multiple assets while balancing expected return against risk. This template implements a classic Markowitz mean-variance model that chooses non-negative allocations to minimize portfolio variance subject to a minimum expected return target.
This template uses RelationalAI’s prescriptive reasoning (optimization) capabilities to compute an optimal allocation under constraints, and to run a small scenario analysis that illustrates the risk/return trade-off.
Prescriptive reasoning helps you:
- Quantify trade-offs between return targets and risk.
- Enforce constraints like budgets and no-short-selling.
- Explore scenarios by varying the minimum expected return.
Who this is for
- You want an end-to-end example of prescriptive reasoning (optimization) with quadratic objectives.
- You’re comfortable with basic Python and optimization concepts (risk/return, covariance).
What you’ll build
- A semantic model for stocks, expected returns, and pairwise covariance.
- A quadratic program that chooses non-negative allocations.
- A minimum return constraint and a variance-minimization objective.
- A scenario loop over different minimum return targets with a summary table.
What’s included
- Model + solve script:
portfolio_balancing.py - Sample data:
data/returns.csv,data/covariance.csv - Outputs: per-scenario solver status/objective, allocation table, and a scenario summary
Prerequisites
Access
- A Snowflake account that has the RAI Native App installed.
- A Snowflake user with permissions to access the RAI Native App.
Tools
- Python >= 3.10
Quickstart
Follow these steps to run the template with the included sample data.
-
Download the ZIP file for this template and extract it:
Terminal window curl -O https://private.relational.ai/templates/zips/v0.14/portfolio_balancing.zipunzip portfolio_balancing.zipcd portfolio_balancing -
Create and activate a virtual environment
Terminal window python -m venv .venvsource .venv/bin/activatepython -m pip install -U pip -
Install dependencies
Terminal window python -m pip install . -
Configure Snowflake connection and RAI profile
Terminal window rai init -
Run the template
Terminal window python portfolio_balancing.py -
Expected output
The script solves three scenarios for the minimum expected return target.
Running scenario: min_return = 10Status: OPTIMAL, Objective: ...Portfolio allocation:name value...==================================================Scenario Analysis Summary==================================================10: OPTIMAL, obj=...20: OPTIMAL, obj=...30: OPTIMAL, obj=...
Template structure
.├─ README.md├─ pyproject.toml├─ portfolio_balancing.py # main runner / entrypoint└─ data/ # sample input data ├─ returns.csv └─ covariance.csvStart here: portfolio_balancing.py
Sample data
Data files are in data/.
returns.csv
Defines one expected return value per stock.
| Column | Meaning |
|---|---|
index | Stock identifier |
returns | Expected return (decimal, e.g., 0.04 = 4%) |
covariance.csv
Defines pairwise covariance values between stock pairs.
| Column | Meaning |
|---|---|
i | First stock index |
j | Second stock index |
covar | Covariance between stocks i and j |
Model overview
The semantic model uses a single concept (Stock) and a pairwise covariance property (Stock.covar). The decision variable is a continuous allocation per stock.
Stock
Represents an investable asset.
| Property | Type | Identifying? | Notes |
|---|---|---|---|
index | int | Yes | Loaded from data/returns.csv |
returns | float | No | Expected return |
covar | float | No | Pairwise covariance with another Stock |
x_quantity | float | No | Decision variable (continuous, non-negative) |
How it works
This section walks through the highlights in portfolio_balancing.py.
Import libraries and configure inputs
First, the script imports the Semantics and optimization APIs, configures the data directory, and defines the key parameters:
from pathlib import Path
import pandasfrom pandas import read_csv
from relationalai.semantics import Float, Model, Relationship, data, require, select, sum, wherefrom relationalai.semantics.reasoners.optimization import Solver, SolverModel
# --------------------------------------------------# Configure inputs# --------------------------------------------------
DATA_DIR = Path(__file__).parent / "data"
# Disable pandas inference of string types. This ensures that string columns# in the CSVs are loaded as object dtype. This is only required when using# relationalai versions prior to v1.0.pandas.options.future.infer_string = False
# Budget and minimum return parameters.BUDGET = 1000MIN_RETURN = 20Define concepts and load CSV data
Next, it creates a Model, defines the Stock concept, and loads both CSVs. The covariance values are defined by joining stock indices using where(...).define(...):
# --------------------------------------------------# Define semantic model & load data# --------------------------------------------------
# Create a Semantics model container.model = Model("portfolio", config=globals().get("config", None))
# Stock concept: available investments with expected returns.Stock = model.Concept("Stock")Stock.returns = model.Property("{Stock} has {returns:float}")
# Load expected return data from CSV.data(read_csv(DATA_DIR / "returns.csv")).into(Stock, keys=["index"])
# Stock.covar property: covariance matrix between stock pairs.Stock.covar = model.Relationship("{Stock} and {stock2:Stock} have {covar:float}")OtherStock = Stock.ref()
# Load covariance data from CSV.covar_csv = read_csv(DATA_DIR / "covariance.csv")pairs = data(covar_csv)where( Stock.index == pairs.i, OtherStock.index == pairs.j).define( Stock.covar(Stock, OtherStock, pairs.covar))Define decision variables, constraints, and objective
Then it creates a decision variable Stock.x_quantity and registers constraints and the quadratic variance objective inside build_formulation(...):
# --------------------------------------------------# Model the decision problem# --------------------------------------------------
# Stock.x_quantity decision variable: amount allocated to each stock.Stock.x_quantity = model.Property("{Stock} quantity is {x:float}")
covar_val = Float.ref()
# Scenario parameter. This is updated inside the scenario loop.min_return = MIN_RETURN
# Budget is fixed across scenarios.budget = BUDGET
def build_formulation(s): """Register variables, constraints, and objective on the solver model.""" # Decision variable: quantity of each stock. s.solve_for(Stock.x_quantity, name=["qty", Stock.index])
# Constraint: no short selling. bounds = require(Stock.x_quantity >= 0) s.satisfy(bounds)
# Constraint: budget limit. budget_constraint = require(sum(Stock.x_quantity) <= budget) s.satisfy(budget_constraint)
# Constraint: minimum return target (scenario parameter). return_constraint = require(sum(Stock.returns * Stock.x_quantity) >= min_return) s.satisfy(return_constraint)
# Objective: minimize portfolio risk (variance) risk = sum(covar_val * Stock.x_quantity * OtherStock.x_quantity).where(Stock.covar(OtherStock, covar_val)) s.minimize(risk)Solve and print results
Finally, the script loops over multiple values of min_return, creates a fresh SolverModel for each scenario, and prints both the allocation and a summary:
# --------------------------------------------------# Solve with Scenario Analysis (Numeric Parameter)# --------------------------------------------------
SCENARIO_PARAM = "min_return"SCENARIO_VALUES = [10, 20, 30]
scenario_results = []
for scenario_value in SCENARIO_VALUES: print(f"\nRunning scenario: {SCENARIO_PARAM} = {scenario_value}")
# Set scenario parameter value. min_return = scenario_value
# Create a fresh SolverModel for each scenario. s = SolverModel(model, "cont") build_formulation(s)
solver = Solver("highs") s.solve(solver, time_limit_sec=60)
scenario_results.append({ "scenario": scenario_value, "status": str(s.termination_status), "objective": s.objective_value, }) print(f" Status: {s.termination_status}, Objective: {s.objective_value}")
# Print portfolio allocation from solver results. var_df = s.variable_values().to_df() qty_df = var_df[ var_df["name"].str.startswith("qty") & (var_df["value"] > 0.001) ] print(f"\n Portfolio allocation:") print(qty_df.to_string(index=False))
# --------------------------------------------------# Solve and check solution# --------------------------------------------------
# Print a scenario summary table.print("\n" + "=" * 50)print("Scenario Analysis Summary")print("=" * 50)for result in scenario_results: print(f" {result['scenario']}: {result['status']}, obj={result['objective']}")Troubleshooting
I get ModuleNotFoundError when running the script
- Confirm you created and activated the virtual environment from the Quickstart.
- Reinstall dependencies with
python -m pip install .. - Verify you are running
python portfolio_balancing.pyfrom theportfolio_balancing/folder.
The script fails while reading a CSV from data/
- Confirm
data/returns.csvanddata/covariance.csvexist. - Verify headers match the expected columns (
index,returns,i,j,covar). - Check for missing values and non-numeric entries in return/covariance columns.
I see an unexpected termination status (not OPTIMAL)
- Try re-running; if you hit a time limit, consider increasing
time_limit_sec. - If you changed scenario parameters, confirm the minimum return target is feasible given the budget.
What this template is for
Portfolio managers don’t want to pay twice for the same exposure — if two funds track nearly the same benchmark, owning both is one bet with worse bookkeeping. This template chains four reasoning stages on a single shared ontology to build compliant, risk-optimized portfolios across an 8-stock universe and stress-test them under a crisis regime.
It uses RelationalAI’s rules, graph, and prescriptive reasoners in a chained workflow:
- Rules scan the current book for compliance violations — overconcentrated holdings (> 15% of balance), sector concentration (> 30%), and high-risk traders — as derived Relationships.
- Graph builds a correlation graph from the covariance matrix, runs Louvain clustering, and picks the highest-Sharpe stock per cluster as the cluster’s representative. 8 stocks collapse to 5 distinct bets; near-duplicates are dropped from the investable universe rather than capped within it.
- Prescriptive optimization solves a bi-objective Markowitz QP on the representative-only universe under position and sector caps, tracing the efficient frontier via the epsilon constraint method across a
ScenarioConcept that combines three budgets and two regimes. - Crisis stress test is the same
solve_epsiloncall — no separate model — butScenario.regimepicks a PSD-preserving shrinkage covariance, so base and crisis frontiers come out of one pipeline.
Each stage writes derived properties the next reads directly: Rules define the thresholds Stage 3 enforces as constraints, Stage 2’s Stock.is_representative shapes the decision space, and the stress test reads Stock.regime_covar keyed by Scenario.regime. See “How it works” for the full data flow.
Why this problem matters
Portfolio managers don’t want to pay twice for the same exposure. If two funds track nearly the same benchmark, allocating
The four-stage approach addresses each gap. Stage 1 surfaces existing violations in the current book (diagnostic). Stage 2 clusters by return covariance and picks the highest-Sharpe representative per cluster, collapsing redundant bets. Stage 3 optimizes over the representative-only universe under position and sector limits. Stage 4 re-solves under a PSD-preserving crisis covariance to stress the resulting portfolio.
Key design patterns demonstrated
- Shared compliance thresholds —
SECTOR_LIMITis defined once and enforced in both stages.POSITION_LIMIT(Stage 1 per-stock compliance) andREP_POSITION_LIMIT(Stage 3 per-representative cap) are deliberately different: a representative carries its cluster’s combined exposure, so the construction-side cap is higher than the holdings-side compliance cap - Graph results feed optimization — Louvain cluster ids and per-cluster argmax (highest Sharpe) both persist on
Stock, and the optimizer’sStock.is_non_representative()constraint forces non-reps to zero (complement defined positively because the prescriptive rewriter doesn’t acceptmodel.not_()in a solver.where()) - Collapse, don’t cap — the graph stage reduces the investable universe to distinct bets rather than allowing all N stocks and capping within redundant groups
- Scenario Concept for parameter sweeps —
Scenarioentities combine budget (1,000, $2,000) and regime (base, crisis) so each epsilon solve handles all six combinations in one call - Epsilon constraint method —
solve_epsilon(eps_rate)sweeps return targets across the feasible range, producing the full Pareto frontier without manually fixing return values - PSD-preserving stress covariance — correlation shrinkage toward all-ones keeps the QP convex at every lambda, unlike naive off-diagonal scaling
- Quadratic programming via Ipopt — the risk objective is quadratic (
x' * Cov * x), solved with Ipopt’s nonlinear optimizer rather than a linear MIP solver - Anchor solves establish feasible range — Anchor 1 (minimize risk) and Anchor 2 (maximize return) determine the return rate range before the epsilon sweep
Who this is for
- Quantitative analysts and portfolio managers exploring mean-variance optimization
- Data scientists learning quadratic programming with RelationalAI
- Finance students studying the Markowitz efficient frontier
- Anyone interested in risk-return trade-off analysis with scenario comparisons
What you’ll build
- A rules-based compliance pipeline using RAI derived properties and Relationships to flag overconcentrated holdings, sector concentration violations, and high-risk traders
- A correlation graph over stocks with Louvain community detection, plus per-cluster representative selection by highest Sharpe
- A quadratic programming model that minimizes portfolio variance subject to position and sector limits on a representative-only universe (non-reps forced to zero)
- Budget and no-short-selling constraints across multiple (budget, regime) scenarios
- Epsilon constraint method sweeping return targets to trace the efficient frontier
- Anchor solves to establish the feasible return range
- Pareto analysis with marginal cost and knee detection
- A crisis-regime stress test using PSD-preserving correlation shrinkage to compare base vs crisis frontiers side-by-side
What’s included
portfolio_balancing.py— Main script with all four stages: rules-based compliance, covariance clustering (Louvain), bi-objective QP with epsilon sweep, and crisis-regime stress testdata/returns.csv— Stock universe: index, ticker, sector, expected returns (8 stocks)data/covar.csv— Covariance matrix entries (i, j, covariance value)data/users.csv— User profiles with risk scoresdata/accounts.csv— Account balancesdata/holdings.csv— Current holdings per account and stockdata/transactions.csv— Transaction history with flagged-transaction indicatorspyproject.toml— Python package configuration with dependencies
Prerequisites
Access
- A Snowflake account that has the RAI Native App installed.
- A Snowflake user with permissions to access the RAI Native App.
Tools
- Python >= 3.10
- RelationalAI Python SDK (
relationalai) == 1.0.14
Quickstart
-
Download ZIP:
Terminal window curl -O https://docs.relational.ai/templates/zips/v1/portfolio_balancing.zipunzip portfolio_balancing.zipcd portfolio_balancing -
Create venv:
Terminal window python -m venv .venvsource .venv/bin/activatepython -m pip install --upgrade pip -
Install:
Terminal window python -m pip install . -
Configure:
Terminal window rai init -
Run:
Terminal window python portfolio_balancing.py -
Expected output (sample — full output covers all four stages):
======================================================================STAGE 1: COMPLIANCE ANALYSIS (rules)======================================================================--- Rule 1: Overconcentrated Holdings (position > 15% of balance) ---holding_id=1, ticker=AAPL, account_id=1, value=18000.00, balance=100000.00, pct=18.0%...--- Rule 2: Sector Concentration (sector > 30% of balance) ---account_id=1, sector=Technology, sector_value=34000.00, pct=34.0%...--- Rule 3: High Risk Traders (risk_score > 0.8 AND >5 flagged txns) ---user_id=1, name=Alice Chen, risk_score=0.85...======================================================================STAGE 2: GRAPH -- Covariance Clustering (Louvain)======================================================================Correlation graph: 4 edges (|correlation| >= 0.3)Louvain communities: 5 cluster(s)Cluster 1 (size 2): JNJ (Healthcare), PFE (Healthcare)Cluster 2 (size 3): AAPL (Technology), MSFT (Technology), GOOGL (Technology)Cluster 3 (size 1): JPM (Financials)Cluster 4 (size 1): PG (Consumer Staples)Cluster 5 (size 1): XOM (Energy)Avg correlation: intra-cluster = +0.683, inter-cluster = +0.131Cluster representatives (5 of 8 stocks, picked by highest Sharpe):Cluster 1: PFE (Healthcare) -- Sharpe = 0.530Cluster 2: GOOGL (Technology) -- Sharpe = 0.605Cluster 3: JPM (Financials) -- Sharpe = 0.500Cluster 4: PG (Consumer Staples) -- Sharpe = 0.444Cluster 5: XOM (Energy) -- Sharpe = 0.588======================================================================STAGE 3: BI-OBJECTIVE OPTIMIZATION(position + sector limits on representative universe; base & crisis regimes)======================================================================ANCHOR SOLVE 1: Minimize risk (no return constraint)Status: LOCALLY_SOLVEDbase_500: return = 32.4335, risk = 1160.3926base_1000: return = 64.8673, risk = 4641.5704base_2000: return = 129.7346, risk = 18566.2815crisis_500: return = 31.6873, risk = 1913.5995crisis_1000: return = 63.3745, risk = 7654.3979crisis_2000: return = 126.7490, risk = 30617.5917ANCHOR SOLVE 2: Maximize returnStatus: LOCALLY_SOLVEDbase_500/crisis_500: return = 42.0000base_1000/crisis_1000: return = 84.0000base_2000/crisis_2000: return = 168.0000Return rate range: [0.0634, 0.0840] per unit investedEPSILON SWEEP: 5 interior pointsPoint 1 .. Point 5: all LOCALLY_SOLVEDEFFICIENT FRONTIER: Risk vs Return (per scenario)base_500 (budget=500, regime=base):# Label Return Risk--------------------------------------1 min_risk 32.43 1160.39262 eps_1 33.41 1176.77903 eps_2 35.12 1262.61114 eps_3 36.84 1385.89015 eps_4 38.56 1545.79096 eps_5 40.28 1742.4712Marginal analysis: rate climbs 16.85 -> 49.94 -> 71.72 -> 93.03 -> 114.43 risk/return.Knee: Point 2 (eps_1) -- marginal cost jumps 3.0x beyond this point.(similar tables for base_1000, base_2000, crisis_500, crisis_1000, crisis_2000)======================================================================STAGE 4: CRISIS REGIME STRESS TEST(PSD-preserving correlation shrinkage, alpha = 0.7)======================================================================Volatility comparison (sqrt risk) -- base vs crisis at each lambda:Budget 500:Label vol_base vol_crisis gap gap_%--------------------------------------------------------min_risk 34.0645 43.7447 +9.6802 +28.4%eps_1 34.3042 44.5398 +10.2356 +29.8%eps_2 35.5332 46.1119 +10.5787 +29.8%eps_3 37.2275 47.9433 +10.7158 +28.8%eps_4 39.3165 49.9933 +10.6768 +27.2%eps_5 41.7429 52.2694 +10.5265 +25.2%(similar tables for Budget 1000 and Budget 2000, identical gap_% pattern)Crisis volatility sits 25-30% above base at every lambda and the gap peaks in the middle of the frontier (eps_1..eps_2 at +29.8%), not at the concentrated end (eps_5 at +25.2%). That inversion is the payoff of the representative-only universe: at the concentrated end the optimizer is picking the highest-Sharpe distinct bet per cluster, which incidentally sits in sectors with lower crisis correlations (Energy, Consumer Staples). Without the representative collapse, the concentrated end would stack near-duplicates and see the crisis gap grow, not shrink.
Template structure
.├── README.md├── pyproject.toml├── portfolio_balancing.py└── data/ ├── returns.csv ├── covar.csv ├── users.csv ├── accounts.csv ├── holdings.csv └── transactions.csvHow it works
This section walks through the highlights in portfolio_balancing.py.
Reasoner overview
| Stage | Reasoner | Reads from ontology | Writes to ontology | Role |
|---|---|---|---|---|
| 1 | Rules | Holding, Account, User, Transaction, Stock | Holding.is_overconcentrated, Holding.is_sector_concentrated, User.is_high_risk_trader | 4 overconcentrated holdings (AAPL 18%, MSFT 16%, JNJ 16%, PFE 16.2%). 2 sector concentrations (Technology 34%, Healthcare 32.2%). 2 high-risk traders (Alice Chen 0.85, Eve Taylor 0.92). |
| 2 | Graph (Louvain) | Stock.covar (diagonal for variance), derived Stock.correlation filtered at threshold 0.3 | Stock.variance, Stock.volatility, Stock.correlation, Stock.cluster, Stock.sharpe, Stock.cluster_max_sharpe, Stock.is_representative | 4 edges retained after thresholding. Louvain yields 5 clusters; 5 representatives picked by highest Sharpe (one per cluster). Collapses 8 stocks to 5 distinct bets. |
| 3 | Prescriptive (QP) | Stock.returns, Stock.regime_covar, Stock.is_representative, Scenario.budget, Scenario.regime | Stock.x_quantity indexed by Scenario (non-reps forced to 0) | Min-risk and max-return anchors bracket the frontier. Epsilon sweep traces 5 interior points per (budget, regime). Programmatic knee detection at eps_1. |
| 4 | Prescriptive (stress) | Stock.regime_covar under “crisis” regime | (shares Stock.x_quantity with Stage 3) | Crisis volatility 25-30% higher than base at every lambda; gap peaks at the middle of the frontier (eps_1..eps_3) and narrows toward both ends. The representative-only universe keeps the concentrated end from stacking near-duplicate bets that would otherwise amplify crisis vol. |
All four stages share a single RAI model. Compliance thresholds are defined once at the top of the script. Stage 1 uses POSITION_LIMIT = 0.15 and SECTOR_LIMIT = 0.30 to flag existing violations as derived Relationships. Stage 3 re-uses SECTOR_LIMIT but applies REP_POSITION_LIMIT = 0.30 to the decision variable: after representative collapse each cluster has exactly one carrier, so its cap is legitimately higher than a per-stock compliance cap.
How the reasoners chain
Each stage writes derived properties the next reads directly. Stage 1’s thresholds (POSITION_LIMIT, SECTOR_LIMIT) become Stage 3 constraints. Stage 2’s Stock.is_representative and Stock.is_non_representative shape Stage 3’s decision space (non-reps forced to zero). Stage 4 uses the same solve_epsilon call as Stage 3 — the Regime concept keyed into Stock.regime_covar makes base vs crisis a scenario view on the same solve, not a separate model. The Reasoner overview table above names each property that crosses a stage boundary.
Multi-scenario Pareto frontier in one pipeline
Scenario combines three budgets and two regimes — six tuples. Each solve_epsilon(eps_rate) call returns one optimal allocation per tuple; the epsilon sweep repeats across return-rate targets. Six scenarios × seven points (two anchors + five interior) = 42 optimal portfolios, all from seven solver invocations. Two consequences:
- Base and crisis are comparable at equal budget and equal lambda: the vol gap is a pure regime effect, not a re-fitting artifact.
- Adding a fourth regime or a fifth budget is a data edit in
scenario_data, not a code change insolve_epsilon. Scenarios are data.
Stage 1: Rules-based compliance analysis
The first stage defines compliance flags as RAI derived properties and Relationships. The model loads portfolio data (users, accounts, holdings, transactions) alongside the stock universe, then evaluates three rules using two configurable thresholds:
POSITION_LIMIT = 0.15 # max fraction of budget per stockSECTOR_LIMIT = 0.30 # max fraction of budget per sectorRule 1 — Overconcentrated holdings: a holding whose value exceeds POSITION_LIMIT of the account balance. The holding value is a derived property:
Holding.value = model.Property(f"{Holding} has value {Float:holding_value}")model.define(Holding.value(Holding.quantity * Holding.purchase_price))
Holding.is_overconcentrated = model.Relationship(f"{Holding} is overconcentrated")AccountR1 = Account.ref()model.where( Holding.account(AccountR1), Holding.value > POSITION_LIMIT * AccountR1.balance,).define(Holding.is_overconcentrated())Rule 2 — Sector concentration: total holdings in a sector exceeding SECTOR_LIMIT of the account balance. Uses aggregation to sum holding values per (account, sector):
sector_exposure = sum(HoldingSC.value).where( HoldingSC.account(AccountSC), HoldingSC.stock(StockSC), StockSC.sector_ref(SectorSC),).per(AccountSC, SectorSC)
model.where( Holding.account(AccountSC), Holding.stock(StockR2), StockR2.sector_ref(SectorSC), sector_exposure > SECTOR_LIMIT * AccountSC.balance,).define(Holding.is_sector_concentrated())Rule 3 — High-risk traders: users with risk_score > 0.8 and more than 5 flagged transactions. Flagged transaction count is computed via aggregation:
flagged_count = sum(TransactionHR.is_flagged_val).where( TransactionHR.user(User),).per(User)
model.where( User.risk_score > 0.8, flagged_count > 5,).define(User.is_high_risk_trader())Stage 2: Graph — covariance clustering
Volatility and correlation are derived in PyRel from the base covariance, so the ontology is the single source of truth for every similarity metric. Stock.variance picks the covariance diagonal, Stock.volatility applies sqrt(variance) via relationalai.semantics.std.math.sqrt, and Stock.correlation(i, j) = covar(i, j) / (vol_i * vol_j):
Stock.volatility = model.Property(f"{Stock} has {Float:stock_volatility}")model.define(Stock.volatility(sqrt(Stock.variance)))
Stock.correlation = model.Property( f"{Stock} and {Stock} have correlation {Float:stock_correlation}")PairedStockCorr = Stock.ref()cov_ij_ref = Float.ref()model.where( Stock.covar(PairedStockCorr, cov_ij_ref),).define( Stock.correlation( PairedStockCorr, cov_ij_ref / (Stock.volatility * PairedStockCorr.volatility), ))The Graph reasoner builds an undirected graph with Stock as the node concept. Edges are filtered in PyRel directly against the derived correlation property — no upstream edge list required:
corr_graph = Graph( model, directed=False, weighted=False, node_concept=Stock, aggregator="sum",)
stock_i_ref = Stock.ref()stock_j_ref = Stock.ref()corr_ref = Float.ref()model.define(corr_graph.Edge.new(src=stock_i_ref, dst=stock_j_ref)).where( stock_i_ref.correlation(stock_j_ref, corr_ref), stock_i_ref.index < stock_j_ref.index, math_abs(corr_ref) >= CORR_THRESHOLD,)Louvain community detection runs directly on the graph and returns (node, cluster_id) pairs. The cluster id is persisted as a Stock property so downstream stages can consume it:
community = corr_graph.louvain()cluster_label = Integer.ref("cluster_label")Stock.cluster = model.Property(f"{Stock} in cluster {Integer:cluster_id}")stock_clust_ref = Stock.ref()model.define(stock_clust_ref.cluster(cluster_label)).where( community(stock_clust_ref, cluster_label))The script reports cluster sizes and intra- vs inter-cluster average correlation as a sanity check that the clustering separates co-moving stocks from independent ones.
After clustering, Stage 2 picks one representative per cluster — the stock with the highest Sharpe ratio — using per-group argmax in PyRel. Only the representatives will be eligible for allocation in Stage 3:
Stock.sharpe = model.Property(f"{Stock} has Sharpe {Float:stock_sharpe}")model.define(Stock.sharpe(Stock.returns / Stock.volatility))
peer_for_max = Stock.ref()Stock.cluster_max_sharpe = model.Property( f"{Stock} has cluster max Sharpe {Float:cluster_max_sharpe}")model.define( Stock.cluster_max_sharpe( aggs.max(peer_for_max.sharpe) .where(peer_for_max.cluster == Stock.cluster) .per(Stock) ))
Stock.is_representative = model.Relationship(f"{Stock} is cluster representative")model.where(Stock.sharpe == Stock.cluster_max_sharpe).define( Stock.is_representative())Stage 3: Bi-objective optimization
Scenario concept and decision variables
The Stock concept (defined earlier for all stages) carries ticker, sector, expected returns, and the base covariance matrix. Stage 2 added Stock.variance, Stock.volatility, Stock.correlation, Stock.cluster, Stock.sharpe, Stock.cluster_max_sharpe, Stock.is_representative, and Stock.is_non_representative on top. Stage 3 consumes the representative flag via its compliance constraints, and adds budget-and-regime scenarios, regime-conditioned covariance, and decision variables.
Scenarios combine budget and regime so each epsilon solve handles all six (budget, regime) combinations simultaneously:
Regime = model.Concept("Regime", identify_by={"regime_name": String})model.define(Regime.new(regime_name="base"))model.define(Regime.new(regime_name="crisis"))
Scenario = model.Concept("Scenario", identify_by={"name": String})Scenario.budget = model.Property(f"{Scenario} has {Float:budget}")Scenario.regime = model.Property(f"{Scenario} in {Regime}")scenario_data = model.data( [ ("base_500", 500, "base"), ("base_1000", 1000, "base"), ("base_2000", 2000, "base"), ("crisis_500", 500, "crisis"), ("crisis_1000", 1000, "crisis"), ("crisis_2000", 2000, "crisis"), ], columns=["name", "budget", "regime"],)model.define( s := Scenario.new(name=scenario_data["name"]), s.budget(scenario_data["budget"]),)# Link Scenario to Regime by matching the regime name from the data.scenario_link_ref = Scenario.ref()regime_link_ref = Regime.ref()model.where( scenario_link_ref.name == scenario_data["name"], regime_link_ref.regime_name == scenario_data["regime"],).define(scenario_link_ref.regime(regime_link_ref))Define decision variables, constraints, and objective
Each stock gets a continuous quantity variable indexed by Scenario (multi-argument Property).
Stock.x_quantity = model.Property(f"{Stock} in {Scenario} has {Float:quantity}")x_qty = Float.ref()Two concentration limits plus a representative-only filter are added via _add_compliance_constraints. Position and sector caps behave as before; the Stock.is_non_representative() relation forces every non-representative stock to zero allocation, which is how the graph stage’s redundancy removal shows up at solve time. The complement is defined positively because the prescriptive rewriter can’t accept model.not_(...) inside a solver constraint:
def _add_compliance_constraints(problem): # Position limit: each representative <= REP_POSITION_LIMIT * budget. problem.satisfy(model.where( Stock.x_quantity(Scenario, x_qty), ).require(x_qty <= REP_POSITION_LIMIT * Scenario.budget))
# Sector limit: total allocation to stocks in same sector <= SECTOR_LIMIT * budget. sector_alloc = sum(x_qty).where( Stock.x_quantity(Scenario, x_qty), Stock.sector == s_sector_ref.sector, ).per(Scenario, s_sector_ref.sector) problem.satisfy(model.where( Stock.x_quantity(Scenario, x_qty), ).require(sector_alloc <= SECTOR_LIMIT * Scenario.budget))
# Representative-only: non-representative stocks forced to zero. problem.satisfy(model.where( Stock.x_quantity(Scenario, x_qty), Stock.is_non_representative(), ).require(x_qty == 0))The risk objective is quadratic in the decision variables and uses the regime-conditioned covariance: each Scenario picks its matching regime’s covariance, so base and crisis scenarios solve against different covariances in the same call.
problem.minimize( sum(regime_cov_val * x_qty * x_qty_paired) .where( Stock.regime_covar(PairedStock, Scenario.regime, regime_cov_val), Stock.x_quantity(Scenario, x_qty), PairedStock.x_quantity(Scenario, x_qty_paired), ))Solve anchor points and run the epsilon sweep
Two anchor solves establish the feasible return range. Anchor 1 minimizes risk with no return constraint. Anchor 2 maximizes return.
result1 = solve_epsilon(eps_rate=None)The epsilon sweep then traces interior points. Each solve minimizes risk subject to a return-rate floor that scales with budget.
n_interior = 5epsilon_rates = [ return_rate_min + i * (return_rate_max - return_rate_min) / (n_interior + 1) for i in range(1, n_interior + 1)]
for i, rate in enumerate(epsilon_rates): result = solve_epsilon(eps_rate=rate)Pareto analysis output
The script prints the efficient frontier per (budget, regime) scenario, marginal risk-per-return between adjacent points, and programmatic knee detection where the marginal cost jumps most.
Stage 4: Crisis regime stress test
Crisis covariance is derived in PyRel via PSD-preserving correlation shrinkage, keyed by a Regime concept. The shrinkage formula rho_crisis = alpha * rho + (1 - alpha) * J re-expressed in covariance units becomes cov_crisis(i, j) = alpha * cov(i, j) + (1 - alpha) * vol_i * vol_j — a convex combination of PSD matrices, so PSD is preserved by construction:
Stock.regime_covar = model.Property( f"{Stock} and {Stock} in {Regime} have {Float:regime_covar}")
# Base regime: covariance unchanged.model.where( Stock.covar(PairedStockBase, base_cov_ref), base_regime_ref.regime_name == "base",).define(Stock.regime_covar(PairedStockBase, base_regime_ref, base_cov_ref))
# Crisis regime: convex combination of base covariance and vol_i * vol_j.model.where( Stock.covar(PairedStockCrisis, crisis_cov_ref), crisis_regime_ref.regime_name == "crisis",).define( Stock.regime_covar( PairedStockCrisis, crisis_regime_ref, CRISIS_ALPHA * crisis_cov_ref + (1 - CRISIS_ALPHA) * Stock.volatility * PairedStockCrisis.volatility, ))Both regimes live on the same Stock.regime_covar property, keyed by the Regime concept, so Stage 3’s objective can select the right covariance per scenario without branching:
After the Stage 3 sweep finishes, Stage 4 emits a side-by-side comparison of base and crisis volatility (sqrt(risk)) at each epsilon point, grouped by budget. Crisis volatility is consistently 25-30% higher than base at every lambda. The gap peaks in the middle of the frontier (eps_1..eps_3) and narrows toward both ends. That shape is the payoff of the representative-only universe: at the concentrated end the optimizer is picking the highest-Sharpe distinct bet per cluster (Energy and Consumer Staples in this dataset), which happen to have lower crisis correlations than the middle of the frontier. Without the representative collapse, the concentrated end would stack near-duplicates and the crisis gap would grow instead of shrink.
Customize this template
- Adjust compliance thresholds:
POSITION_LIMIT(default 0.15) applies in Stage 1 compliance rules (per-stock holdings).REP_POSITION_LIMIT(default 0.30) applies in Stage 3 optimization (per-representative allocation, which carries its cluster’s combined exposure).SECTOR_LIMIT(default 0.30) applies to both. Note thatREP_POSITION_LIMITmust satisfyREP_POSITION_LIMIT * num_representatives >= 1.0or the fully-invested constraint becomes infeasible. - Tune the correlation graph: Raise or lower
CORR_THRESHOLD(default 0.3) to control graph sparsity. Higher thresholds produce fewer edges and more singleton clusters; lower thresholds produce a denser graph and fewer, larger clusters. - Change the representative picking rule: Stage 2 picks the highest-Sharpe stock per cluster. To pick differently, change the
Stock.cluster_max_sharpederivation — e.g., replaceStock.sharpewithStock.returns(highest return),-Stock.volatility(lowest vol), or a weighted blend. Singletons are always their own representative regardless of rule. - Adjust crisis severity: Lower
CRISIS_ALPHA(default 0.7) shrinks correlations harder toward all-ones (more severe crisis).alpha = 1.0is no crisis (base);alpha = 0.0is maximum crisis (all correlations = 1). Values between 0.5 and 0.9 give interesting comparisons while keeping the QP well-conditioned. - Add more stocks: Extend
returns.csvandcovar.csvwith additional assets and their covariance entries. - Add compliance rules: Define additional Relationships in the rules stage (e.g., minimum holding period, transaction velocity limits).
- Allow short selling: Remove the non-negativity constraint to allow negative holdings.
- Adjust frontier resolution: Increase
n_interiorfor a finer-grained efficient frontier. - Maximize return for given risk: Flip the formulation to maximize expected return subject to a risk budget.
- Transaction costs: Add a linear or quadratic penalty term for rebalancing from an existing portfolio.
Troubleshooting
Problem is infeasible
The return rate target may be too high for the available stocks and budget. Reduce n_interior to use fewer sweep points, or increase the budget values in the scenario data.
rai init fails or connection errors
Ensure your Snowflake credentials are configured correctly and that the RAI Native App is installed on your account. Run rai init again and verify the connection settings.
ModuleNotFoundError for relationalai
Make sure you activated the virtual environment and ran python -m pip install . from the template directory. The pyproject.toml declares the required dependencies.
Solver reports non-convex or numerical issues
Ensure the covariance matrix is symmetric and positive semi-definite. Check that covar.csv contains entries for all (i, j) pairs and that covar(i,j) == covar(j,i). The Ipopt solver finds locally optimal solutions for convex QP problems.