Synthetic Eligibility Records
Generate K distinct, internally consistent member eligibility records per solve using a CSP solver in multi-solution mode: each record satisfies CMS Medicare-eligibility, age-by-plan-type CFDs, and PCP-network attribution.
What this template is for
Healthcare payer engineering teams, RegTech rule-certification harnesses, and claim-engine fuzzers all need batches of internally consistent member eligibility records to test against. Real beneficiary data is gated behind PII rules; sampled production data carries cohort biases; hand-crafted fixtures drift out of sync with the regulations they are meant to exercise. The right alternative is a constrained generative model: declare the eligibility rules once, ask the solver for K records that satisfy every rule simultaneously.
Synthetic-data tooling consumers want a batch of K diverse records per solve, not one. A single record can’t expose a CFD cascade or a network-attribution corner case; K records spread across age bands, plan types, and provider networks can. This template encodes member eligibility as a constraint satisfaction model using RelationalAI’s prescriptive reasoning and runs the solver in multi-solution mode: pass solution_limit=K to problem.solve(...), then enumerate each generated record via Variable.values(solution_index, value). The output is one row per generated member — ready to drop into a test fixture, a fuzzing oracle, or a coverage matrix.
The rule structure here is drawn from the public CMS Medicare and NCQA regulatory shape: age-by-plan-type CFDs (over-65 must be on Medicare-Advantage; under-65 must not) and PCP-network attribution (the chosen primary-care provider must be in-network for the chosen plan). The same template structure — decision-valued tuple per record, reference-data lookups via composition, multi-solution enumeration — applies to any rule-driven synthetic-data domain: KYC member records (banking), tenant lease attributes (proptech), shipment manifests (logistics).
Who this is for
- Healthcare payer engineering teams building eligibility-engine test suites
- RegTech / compliance-rules certification harnesses needing rule-coverage fixtures
- Claim-engine and adjudication-engine fuzzers needing diverse, valid input batches
- Data-platform engineers building synthetic-data pipelines that respect domain invariants
What you’ll build
- A constraint model with three integer decision properties on a singleton
Member:age_bucket_id,plan_id,provider_id— each solution returns one feasible filling of those three slots - A small reference table of representative ages (
AgeBucket) so age is a categorical decision rather than a per-year integer; this keeps every decision domain compact and similar in size, which is what makes the multi-solution enumeration produce structurally varied records across age, plan, and network - A pair of CFD ICs encoding the two arms of the age-by-plan rule using the forbidden-pair
implies(Member.age_bucket_id == AgeBucket.id, Member.plan_id != Plan.id)idiom — safe under the CSP rewriter - A PCP-network attribution IC iterating over reference-data
(Plan, Provider)tuples in different networks and forbidding the cross-network combination - A pre-solve dense-ID check on
plans.csv,providers.csv, andage_buckets.csvso the solver’s integer decision bounds line up with the reference rows the CFDs iterate over (sparse IDs would let the solver pick a value with no matching row, leaving the rules unconstrained for that record and silently dropping it from the post-solve display join) - Multi-solution enumeration as the primary code path:
problem.solve(..., solution_limit=MAX_RECORDS)runs the search in enumeration mode;Variable.values(solution_index, value)joins the three decision variables on a shared solution index to reconstruct each record - An empty-result branch driven by
solve_info().num_points: when no feasible record exists, the script prints a diagnostic instead of hard-failing, which is the right shape for a reusable generator - No
problem.verify()call: every IC usesimplies, which is solver-only — passing implies-bodied ICs toverify()returns silently-OK without actually evaluating them, so the convention is that they must NOT be passed. The CFD and network-attribution invariants are directly visible in the expected-output block in the Quickstart: every record’sage_yearsvsplan_typeand every record’snetworkvsproviderare printed side-by-side
What’s included
synthetic_eligibility_records.py— main script with ontology, decisions, constraints, and solver calldata/age_buckets.csv— 4 representative ages spanning the adult/senior split (2 under 65, 2 at or above)data/plans.csv— 3 plans (PPO, HMO, MedicareAdvantage) each on its own networkdata/providers.csv— 4 primary-care providers (1 PPO, 1 HMO, 2 Medicare) so each plan-network has at least one in-network PCP and the bundled K=8 enumeration spans all three planspyproject.toml— Python package configuration
Prerequisites
Access
- A Snowflake account that has the RAI Native App installed.
- A Snowflake user with permissions to access the RAI Native App.
Tools
- Python >= 3.10
Quickstart
-
Download ZIP:
Terminal window curl -O https://docs.relational.ai/templates/zips/v1/synthetic_eligibility_records.zipunzip synthetic_eligibility_records.zipcd synthetic_eligibility_records -
Create venv:
Terminal window python -m venv .venvsource .venv/bin/activatepython -m pip install --upgrade pip -
Install:
Terminal window python -m pip install . -
Configure (prompts for Snowflake account, role, and profile name):
Terminal window rai init -
Run:
Terminal window python synthetic_eligibility_records.py -
Expected output. With
MAX_RECORDS = 16and a bundled feasible set of 8 records, the solver exhausts the search space and returns statusOPTIMALwith all 8 records. Each row carries itssolutionindex. Solver build strings, exact wall times, and per-solution ordering will vary; the structure of the output and the set of returned records is stable:Solve result:• status: OPTIMAL• objective: 0• solve time: 0.12s• num_points: 8• solver: MiniZinc_unknownGenerated member records (up to 16 per run):solution age_years plan_type plan_network provider_network provider0 78 MedicareAdvantage 3 3 Dr_Senior_B1 68 MedicareAdvantage 3 3 Dr_Senior_B2 78 MedicareAdvantage 3 3 Dr_Senior_A3 68 MedicareAdvantage 3 3 Dr_Senior_A4 28 HMO 2 2 Dr_East_HMO5 50 HMO 2 2 Dr_East_HMO6 50 PPO 1 1 Dr_North_PPO7 28 PPO 1 1 Dr_North_PPOEach row is one full member. The constraint encoding is visible by eye:
age_years >= 65always pairs withMedicareAdvantage, andplan_networkalways equalsprovider_network. On a real catalog the feasible set typically exceedsMAX_RECORDS; the solver returns statusSOLUTION_LIMITonce the cap is hit.
Template structure
.├── README.md├── pyproject.toml├── synthetic_eligibility_records.py└── data/ ├── age_buckets.csv ├── plans.csv └── providers.csvHow it works
The solver decides three integer attributes of a singleton Member — age bucket, plan, provider — subject to the eligibility rules. Each solution returned by the solver is one feasible filling of those three slots; multi-solution mode enumerates K of them per solve.
1. Categorical age via a small reference table. Age is not a per-year integer decision: instead, the AgeBucket reference table holds four representative ages, and Member.age_bucket_id picks one. The CFDs walk through AgeBucket.age_years to compare against the seniority threshold. This keeps the age decision domain at the same order of magnitude as the plan and provider domains, which is what makes the solver’s enumeration produce structurally varied records across all three dimensions:
AgeBucket = model.Concept("AgeBucket", identify_by={"id": Integer})AgeBucket.age_years = model.Property(f"{AgeBucket} has {Integer:age_years}")2. Forbidden-pair encoding for CFDs. The Medicare-Advantage CFD has two arms: senior implies Medicare, non-senior implies non-Medicare. Each arm is encoded as a forbidden pair iteration. The where clause filters reference-data tuples at relational time (here, all (Plan, AgeBucket) pairs that violate the arm); the implies inside the require gates on the decision-valued match. This sidesteps the rewriter’s restriction on decision variables in where clauses (where(Plan.id == Member.plan_id) would not parse; iteration over Plan and AgeBucket happens at relational time, the decision check goes inside implies):
senior_must_medicare_ic = model.where( Plan.plan_type != "MedicareAdvantage", AgeBucket.age_years >= SENIOR_THRESHOLD_YEARS,).require( implies( Member.age_bucket_id == AgeBucket.id, Member.plan_id != Plan.id, ))The non-senior arm uses the same shape with Plan.plan_type == "MedicareAdvantage" and AgeBucket.age_years < SENIOR_THRESHOLD_YEARS in the where.
3. PCP-network attribution as forbidden cross-network pairs. The chosen provider’s network must equal the chosen plan’s network. Same forbidden-pair shape: iterate over (Plan, Provider) tuples in different networks at relational time, and forbid that combination if the member picks both:
network_match_ic = model.where(Plan.network_id != Provider.network_id).require( implies(Member.plan_id == Plan.id, Member.provider_id != Provider.id))4. Multi-solution enumeration via Variable.values(solution_index, value). Capturing the variable subconcept from solve_for(...) exposes a .values(sol_idx, val) relationship that indexes the per-solution outputs. Binding the value slot directly to a reference Concept’s .id walks the chosen ID back to that record’s columns in one step:
problem.solve("minizinc", time_limit_sec=60, solution_limit=MAX_RECORDS)si = problem.solve_info()si.display()
sol_idx = Integer.ref()records_df = ( model.select( sol_idx.alias("solution"), AgeBucket.age_years.alias("age_years"), Plan.plan_type.alias("plan_type"), Plan.network_id.alias("plan_network"), Provider.network_id.alias("provider_network"), Provider.name.alias("provider"), ) .where( age_bucket_var.values(sol_idx, AgeBucket.id), plan_id_var.values(sol_idx, Plan.id), provider_id_var.values(sol_idx, Provider.id), ) .to_df() .sort_values("solution") .reset_index(drop=True))print(f"\nGenerated member records (up to {MAX_RECORDS} per run):")print(records_df.to_string(index=False))The variable subconcept exposes a back-pointer named after the entity in its property: age_bucket_var.member walks back to the Member instance (not exercised in this single-member template; useful for multi-member variants where each row of .values(...) is one (Member, solution) pair).
Customize this template
- Use your own plans and providers by replacing the two CSV files. The constraint structure does not change; the integer ID columns stay required (the script uses them for the
Member.plan_id/Member.provider_iddecision domains) and IDs must remain dense and contiguous (the pre-solve check enforces this). - Raise the solution limit on a real catalog. The bundled
MAX_RECORDS = 16is sized so the solver exhausts the small demo feasible set; production test suites typically want 100—10,000 records per solve.time_limit_secis your safety net — enumeration stops when either the limit or the budget is reached. - Adjust the seniority gate by changing
SENIOR_THRESHOLD_YEARS(currently 65, the CMS Medicare threshold). Both arms of the age-by-plan CFD read this constant directly. - Add a dependent-count decision by introducing a
Member.num_dependentsinteger decision bounded by 0 and a per-planmax_dependentscap. Extendplans.csvwith amax_dependentscolumn, declarePlan.max_dependents = model.Property(f"{{Plan}} has {{Integer:max_dependents}}"), addMember.num_dependents = model.Property(...)and aproblem.solve_for(Member.num_dependents, ...)call, then encode the cap with the same forbidden-pair idiom:model.where(Plan.max_dependents >= 0).require(implies(Member.plan_id == Plan.id, Member.num_dependents <= Plan.max_dependents)). - Add a coverage-period decision pair by introducing
coverage_start_daysandcoverage_end_daysas integer day decisions (counted from a notional epoch) bounded around a target date. The temporal-interval-containment shape needs two ICs: one requiringMember.coverage_start_days <= TARGET_DATE_DAYSand one requiringTARGET_DATE_DAYS <= Member.coverage_end_days, plus a minimum-duration ICMember.coverage_end_days - Member.coverage_start_days >= MIN_DAYS. This is useful for fuzzing claim-adjudication date logic. - Switch from “all feasible” to “smallest violating instance” by adding
problem.minimize(...)over a violation count, dropping a positive IC, and usingsolution_limit=1. This is the negative-mode use case from the constrained-generative-models literature — handy for finding the cheapest counter-example to a candidate rule. - Adapt to a different regulatory regime by editing the CFD predicates and the network-attribution IC. The shape is identical for KYC member records (banking AML), tenant lease attributes (proptech), shipment manifests (logistics customs) — declare the rules as forbidden-pair iterations, ask the solver for K records.
- Watch the cross-product cost on real catalogs.
where(Plan.network_id != Provider.network_id)materializes the full Plan × Provider product at relational time — trivial here (3 × 4 = 12 pairs), but a 1,000-plan × 10,000-provider catalog gives 10M pairs before filtering. For production-scale catalogs, partition the IC by region or pre-filter the relational walk to plans and providers that share at least one common region tag.
Troubleshooting
Import error for relationalai
- Confirm your virtual environment is active:
which pythonshould point to.venv. - Reinstall dependencies:
python -m pip install ..
Authentication or configuration errors
- Run
rai initto create or update your RelationalAI/Snowflake configuration. - If you have multiple profiles, set
export RAI_PROFILE=<your_profile>.
MiniZinc solver not available
- This template uses the MiniZinc constraint solver. Ensure the RAI Native App version supports MiniZinc.
- HiGHS is not appropriate here — this is a discrete satisfaction model with categorical decisions and structural propagation, not LP/MILP.
Solver returns INFEASIBLE / no feasible eligibility records
- Check the solve status the script prints.
INFEASIBLEmeans the reference data admits no record;UNKNOWNorTIME_LIMITmeans the budget expired before a record was found (raisetime_limit_secor shrink the decision domains). - For genuine infeasibility: each solution picks one age bucket, so the model is infeasible only when every bucket lands on a side of the senior threshold that has no compatible plan-and-provider combination. Confirm there is at least one Medicare-Advantage plan whose network has a provider iff any age bucket has
age_years >= SENIOR_THRESHOLD_YEARS, and at least one non-Medicare plan whose network has a provider iff any bucket hasage_years < SENIOR_THRESHOLD_YEARS. The pre-solve coverage check also warns on either-direction asymmetry betweenplans.csvandproviders.csv; read the startup warnings before assuming the data is sound.
ValueError: <file> id column must be dense and contiguous
- The pre-solve check ran on
plans.csv,providers.csv, orage_buckets.csvand found gaps in theidcolumn (the file name is included in the error message). The solver bounds each decision bylower=min(id), upper=max(id); without dense IDs it can pick a value with no matching reference row, the relational-timeimpliesrules gated on the matching row will not fire, and the post-solve display join will silently drop the record. - Renumber the rows so IDs run consecutively from the minimum to the maximum (e.g., 1, 2, 3, … or 10, 11, 12, …).
Warning: plan network(s) [...] have no providers in providers.csv
- The pre-solve coverage check found a
network_idvalue inplans.csvthat does not appear in anyproviders.csvrow. The PCP-network-attribution IC forbids cross-network(plan, provider)combinations, so any record that picks one of the listed plans has no satisfying provider and that plan can never appear in a generated record. The model is still solvable from the records that pick the other plans — this is a warning, not an error. - Add at least one provider for the listed network(s), or remove the affected plan rows from
plans.csv.
Warning: provider network(s) [...] have no plans in plans.csv
- The script’s symmetric coverage check found a
network_idvalue inproviders.csvthat does not appear in anyplans.csvrow. The PCP-network-attribution IC forbids cross-network(plan, provider)combinations, so providers in those networks can never be matched to any plan and will never appear in a generated record. The model is still solvable — this is a warning, not an error. - Add a plan on the listed network(s) to make those providers reachable, or remove the dead provider rows from
providers.csv.
How many records will the solver return?
- Up to
MAX_RECORDS(16 by default) or however many feasible records exist in the reference data, whichever is smaller.solve_info().num_pointsreports the actual count after the solve;solve_info().statusreportsSOLUTION_LIMITwhen the limit was hit (more records available) andOPTIMALwhen the search has been exhausted. - Solution ordering is not guaranteed across runs or solver versions; the set of returned records may also shift if MiniZinc’s branching heuristics see new ties. Treat the
solutioncolumn as a label, not a ranking. - The K returned records are guaranteed to be pairwise distinct on at least one decision (age bucket, plan, or provider) but not maximally diverse and not ranked. For broader spread, raise
MAX_RECORDSpast the size of the feasible set so the solver exhausts every distinct case, or add stratification buckets and re-solve per stratum.
Adding a where-side filter on a decision variable raises ValueError: Unexpected SymbolicNode result
model.where(...)filters at relational time only — decision variables are not legal inside it. The rewriter raises this error when it encounters a decision-valued comparison in awhereclause.- Move the decision condition into
impliesand use a tautological relational filter (or a real one) to scope any reference-data Concepts the IC needs. For example, replacemodel.where(Plan.id == Member.plan_id).require(Member.num_dependents <= Plan.max_dependents)withmodel.where(Plan.max_dependents >= 0).require(implies(Member.plan_id == Plan.id, Member.num_dependents <= Plan.max_dependents)). - See the three constraint definitions in
synthetic_eligibility_records.py(network_match_ic,senior_must_medicare_ic,non_senior_no_medicare_ic) for the canonical idiom.