Plan — Game-Theoretic COA Engine on uci-demo for SBIR OSW26BZ02-DV004 (D2P2)¶
Context¶
SBIR topic OSW26BZ02-DV004 (SCO, Direct-to-Phase-II) asks for a mature, scalable, robust game-theoretic AI that computes approximate Nash equilibria in imperfect-information multi-domain wargames; beats expert red teams; stays human-interpretable; scales tactical → operational; runs CPU-only; is anytime. The "non-responsive" floor for the proposal is a working prototype with quantified performance against expert humans or recognized AI benchmarks — narrative does not clear the bar.
uci-demo today (post v1.3.0 main, 21/596 UCI v2.5 MTs) is a strong demo substrate but is not the engine the RFP wants. The Agent interface in services/copilot/src/types.ts:73-86 and the modular scriptedAgent.ts ↔ claudeAgent.ts slot exactly fit the "modular doctrinal subroutines, not monolithic NN" attribute. Comms-degrade injection (apps/cop-ui/lib/degrade.ts + services/world-sim/src/sim.ts:137-218) and replay (apps/cop-ui/lib/{messageBuffer,replay}.ts) partially answer Phase II V&V. Validator audit (services/validator/) and CI gate exist. But there is no solver, no Red agent, no utility function, no headless eval harness anywhere in the repo — every scoring story is net-new code.
A reviewer audit of the prior chat plan found three structural weaknesses: (1) the 4–6-week pre-proposal sprint is too short for a credible D2P2 prior-art package — realistic is 12 weeks; (2) benchmarking only against scriptedAgent does not address the "paramount evaluation criterion" of defeating experienced human red teams; (3) AFSIM access realistically takes 6–12 months post-award (ITAR/JCP), so M&S transition must lead with Command: Professional Edition. This plan re-scopes Phase 0 accordingly.
The intended outcome: a 12-week pre-proposal sprint that produces a CPU-only ES-MCCFR + Public Belief State solver, a programmatic Red agent, a headless eval harness, a 4–6 SME human micro-tournament, and the white-paper / plot pair / video that make the D2P2 proposal responsive on every required attribute. The architecture choices preserve the demo's existing "smart-but-readable" code character: pure TypeScript, single MQTT transport, single XML parser, no GPU dependency, no neural-network black boxes.
Standards posture (UCI v2.5 + OMS v2.5)¶
The demo substrate is already shaped like an OMS v2.5
Mission Package (released 2026-01-22, governed by the OACWG): Mosquitto is the Abstract
Service Bus, packages/uci-bus + packages/uci-codec are the de facto Critical Abstraction
Layer, services/* are OMS Services, and services/adsb-bridge is shaped like an OMS Isolator.
Phase 0 does not add OMS conformance work — that's a Phase II Year-1 deliverable in the
companion plan — but the proposal text names OMS v2.5 as the framing standard so reviewers
see the system positioned for an OMS-compliant Phase II without overpromising. The pre-proposal
sprint stays focused on the solver, the Red agent, and the SME tournament.
Recommended approach¶
Architecture (one paragraph)¶
Five new workspace members. Two library packages that build to dist/ like @uci-demo/bus and @uci-demo/codec: @uci-demo/game (pure domain types — GameState, InformationSet, PublicBeliefState, Payoff, GameDynamics, no I/O) and @uci-demo/solver (ES-MCCFR core, Float32Array-backed regret tables, StrategyBank of modular subroutines, AnytimeBlueprint query API, Kuhn-poker correctness test). Three services: services/solver-daemon/ (long-lived self-play, owns the only RegretTable instance, answers info-set queries over MQTT side-channel uci-demo/solver/query/+/+ with a uci-demo/solver/reply/<requestId> response and uci-demo/solver/status retained heartbeat), services/red-agent/ (programmatic adversary publishing EntityNotificationMT + PositionReportMT exactly like services/adsb-bridge/src/bridge.ts:120-275 — zero changes to world-sim or scenario.ts), and services/eval-harness/ (headless N scenarios × M agents × K degrade presets runner emitting a versioned EvalReport JSON). The third SolverAgent is not a new package — it lives at services/copilot/src/solverAgent.ts alongside the existing scripted/claude impls, queries the daemon over MQTT, returns an AgentDecision, and the copilot's existing orchestration (services/copilot/src/main.ts:88-109) publishes everything to the wire. Solver core is pure TypeScript on tsx with typed-array regret tables — Rust/N-API is deferred behind a RegretTable interface escape hatch, only invoked if the operational benchmark workflow demands it.
Language & runtime constraints¶
- TypeScript only (no Rust core in Phase 0). V8 typed arrays sustain ~10⁷–10⁸ regret updates/sec/core, covering tactical scale.
- One MQTT client per service via
@uci-demo/busconnectBus. Solver query/reply rides MQTT, not gRPC. - One XML parser: solver-daemon reuses the
XMLParserpattern fromservices/copilot/src/worldState.ts:4-10. Red-agent only builds XML via codec, never parses. - ESM +
verbatimModuleSyntax+.jsimport extensions for all new TS code (matchestsconfig.base.json). - No
schema/UCI_v2_5/changes. Every new MT use already has a builder inpackages/uci-codec/src/builders/. - No
@uci-demo/codectop-level import fromapps/cop-ui/— if cop-ui ever needs solver visualization, use@uci-demo/codec/browser. - LLM rationalization layer is fully model-agnostic; Claude stays first-class. New
packages/uci-llm/defines aLanguageModelClientinterface with structured tool-use, completion, streaming, prompt caching (where supported), and arbitrary sampling-param overrides. The operator can use any backend they want — every client is a single ≤200-LOC adapter underpackages/uci-llm/src/clients/<name>.tsimplementing the same interface. Headline set shipped on day 1:anthropic(Claude — the preferred default wherever reachable, prompt-caching on, current demo behavior preserved),ollama(local models — preferred default in air-gapped deployments),bedrock(Claude / Nova / Llama / Mistral on AWS, including GovCloud), andopenai-compat(one client covers OpenAI, Azure OpenAI, Together, Groq, Fireworks, vLLM with OpenAI-compatible endpoint, llama.cpp server, and any OpenAI-shaped HTTP service). Provider chosen via env:LLM_PROVIDER,LLM_BASE_URL,LLM_MODEL,LLM_API_KEY. Default selection:LLM_PROVIDERif set, elseanthropicwhenANTHROPIC_API_KEYis present (current demo behavior), elseollama. Capability flags (supportsToolUse,supportsPromptCache,supportsStreaming,supportsGrammar) declared per client; where a backend lacks native tool-use the interface layer transparently falls back to JSON-mode + grammar or instruction-prompted structured output, so the sameAgentcontract works against any backend. AregisterClient(name, factory)registry lets downstream consumers add custom backends without forkingpackages/uci-llm/. The existingservices/copilot/src/claudeAgent.tsis renamedllmAgent.tsin Phase 0 week 1-2 and refactored to consume the abstraction — the system prompt, tool schemas, prompt-cache strategy, and tool-use semantics are preserved verbatim; only the SDK call site moves behind the interface. No service may import any vendor SDK directly outsidepackages/uci-llm/src/clients/. A shared structured-output parity test (packages/uci-llm/test/parity.test.ts) runs the same prompt + tool-schema against every shipped backend so swapping providers never silently changes agent behavior. Optionalnarrate?: (trace) => Promise<string[]>on SolverAgent accepts aLanguageModelClientand natural-languages a regret decomposition; the action is always the solver's.
Utility function (packages/uci-game/src/payoff.ts)¶
Zero-sum: Red's payoff is -U_B. Version-tagged PAYOFF_V = 1 so weight changes don't silently invalidate historic runs. Computed online by solver-daemon's worldMirror.ts and offline by services/eval-harness/src/scoreReplay.ts.
U_B = + 1.0 · neutralized_hostiles // EntityLostMT for HOSTILE/SUSPECT trackIds
- 5.0 · fratricide_events // EntityLostMT for FRIEND inside any CapabilityCoverageAreaMT polygon
- 0.2 · roe_violations // proposals violating uci-demo/world/roe band
- 0.05 · fuel_fraction_burned_total // integrated SubsystemStatusMT.state bands
- 0.3 · failed_effects // EffectStatusMT.state = FAILED
- 0.001· comms_degrade_seconds // integral over uci-demo/world/degrade window
- 0.002· mean_time_to_decision_ms / 1000 // copilot evaluate() wall time, capped 5s
Weights are constants, not learned — they are the operator's doctrinal preferences surface.
Information-set / PBS factoring¶
| Wire field | Visibility | Goes into |
|---|---|---|
PositionReportMT.{lat,lng,alt} |
public | belief.publicPositions |
EntityNotificationMT.Severity + EntityMT.Identity.Platform.ThreatType/Confidence |
public (noisy) | drives Bayesian update of identityBelief |
true Identity enum |
hidden from Blue | GameState.hidden.trueIdentity |
SubsystemStatusMT.state band |
public | belief.fuelBelief |
| exact fuel fraction | hidden from Red | GameState.hidden.trueFuel |
uci-demo/world/roe (retained) |
public | belief.roe |
uci-demo/world/degrade |
public | belief.commsDegrade |
InfoSet key = FNV-1a-64 over canonical encoding of bucketed (roe, commsBucket, ∀trackId: identityBelief decile + threatType bucket, ∀effectorId: fuel band, recent_actions[last 8]). Bucketing holds tactical info-set count near 10³.
Modular doctrinal subroutines (packages/uci-solver/src/subroutines/)¶
The regret table is keyed over subroutine IDs, not raw actions. This is the architectural answer to RFP attribute #2 (interpretability) — nothing is a neural blob, every component is reviewable TS.
Seeded from scriptedAgent.ts:24-66's existing factoring:
| Subroutine | Doctrine |
|---|---|
IdentityGate |
withhold when P(FRIEND) > 0.4 |
RoeEscalation |
ROE RED ⇒ kinetic-first |
SoftKillFirst |
AMBER + low PID confidence ⇒ EW-first |
ReplanEscalation |
soft-kill failed ⇒ kinetic |
JammerCounter |
threatType==="JAMMER" ⇒ skip EW (mirrors scriptedAgent.ts:82-85) |
FratricideAvoidance |
withhold if FRIEND inside CapabilityCoverageAreaMT polygon |
FuelConservation |
degrade effector preference when fuel band CRITICAL |
CommsDegradeHedge |
high belief.commsDegrade ⇒ prefer autonomous-capable effector |
StrategyBank.composedPolicy softmax-weights each subroutine's distribution by regret, mixes, renormalizes. bank.explain(info, regrets) emits one SubroutineTrace per active subroutine — these become the decision.rationale[] strings the copilot publishes to uci-demo/copilot/reason/<planId> via existing publishReasoningLine (services/copilot/src/main.ts:120-133).
Anytime semantics¶
Blueprint-in-daemon model. Solver-daemon runs ES-MCCFR continuously; blueprint exploitability decreases monotonically with daemon uptime. SolverAgent.evaluate() is O(1) info-set lookup + ~50ms MQTT RPC, comfortably inside the 5s budget at services/copilot/src/types.ts:73-86. The retained uci-demo/solver/status heartbeat carries {iterations, exploitability, infoSetCount} — that retained message is the anytime guarantee. Cold-start falls back to scriptedAgent and tags the rationale with "solver-blueprint cold; using scripted fallback".
Red agent¶
Identical pattern to services/adsb-bridge/src/bridge.ts:120-275. Publishes EntityNotificationMT, EntityMT, PositionReportMT, EntityLostMT with a distinct senderSystemId and RED- prefixed topic ids. No changes to services/world-sim/src/sim.ts or services/world-sim/src/scenario.ts. Two policy backends: scripted (heuristic baseline for Phase 0 ladder) and solver-driven (queries solver-daemon for Red-side policy via the same RPC surface).
Headless eval harness¶
CLI: tsx services/eval-harness/src/main.ts --scenarios ... --agents ... --degrade ... --episodes-per-cell N --report out/eval/<ts>.json. Boots Mosquitto via existing docker-compose.yml, then imports extracted startWorldSim() and startCopilot() functions in-process — no cop-ui. Deterministic seeds for scenario, Red, MCCFR. Emits EvalReport JSON + per-episode NDJSON timeline + bus log to out/eval/<runId>/<scenario>-<agent>-<degrade>-<idx>/.
Phase 0 — 12-week sprint (not 4-6)¶
The 4–6-week estimate in the prior chat plan was unrealistic. Re-scoped against the actual D2P2 evidence list:
| Week | Workstream |
|---|---|
| 1-2 | Refactor for extractability. Extract startCopilot() / startWorldSim(); promote worldState.ts to @uci-demo/game/worldMirror; extract runWithConcurrency from adsb-bridge/bridge.ts:89-108; introduce packages/uci-llm/ (LanguageModelClient + AnthropicClient + OllamaClient) and refactor claudeAgent.ts → llmAgent.ts behind it. Every refactor PR runs an end-to-end smoke test — pnpm up, validator audit at http://127.0.0.1:7700/audit?n=50 is 100% valid, approval card / MODIFY round-trip / comms-degrade injection / replay reconstruction all still work. Typecheck + unit tests alone do not gate refactor merges. PR-mergeable to main independently. |
| 2-4 | @uci-demo/game types + dynamics + PBS belief update + payoff. Vitest suite for belief Bayesian update + payoff math. |
| 3-6 | @uci-demo/solver ES-MCCFR + regret/strategy tables + strategy bank + blueprint. Kuhn poker correctness test at packages/uci-solver/test/kuhn.test.ts — converges to <0.01 exploitability in 10k iterations. Without this no review trusts the kernel. |
| 5-7 | services/red-agent/ + services/solver-daemon/. Subprocess gated on USE_SOLVER=1 so default pnpm up is unchanged. |
| 6-8 | services/copilot/src/solverAgent.ts + three-way agent selection at services/copilot/src/main.ts:58-68. |
| 7-9 | services/eval-harness/ + .github/workflows/eval.yml. Tactical bench runs on every PR labeled solver-perf; operational bench nightly on self-hosted. |
| 8-10 | Scaling study. 3-effector Tripwire → 10/30 synthetic → 30/100 synthetic. Plot wall-clock-to-target-exploitability vs problem size (3 data points = credible log-log fit). |
| 9-11 | Human SME micro-tournament. 4–6 retired O-3/O-4 with C-UAS or air-ops backgrounds, 3 games each vs solver on Tripwire/Vanguard. n≈15-20 games with logged decisions. Budget ~$25k in honoraria. Highest-leverage single line item in the entire sprint. |
| 10-12 | White paper (≥15 pages: model, algorithm, abstraction, exploitability plot, scaling plot, tournament results, integration story, transition path) + demo video (3–5 min, solver-vs-scripted side-by-side on same scenario seed) + interpretability case studies (3 worked decisions with regret decomposition + mixed strategy + outcome). |
Phase II Year 1 (post-award)¶
Primary M&S integration: Command: Professional Edition (~$3–10k/seat, commercially licensable, Lua API, achievable on SBIR budget). New services/cpe-bridge/ mirrors the adsb-bridge pattern.
Secondary, stretch: AFSIM via JCP/DD2345 sponsorship — apply for JCP now, before proposal submission, since realistic access timeline is 6–12 months. Fallback if government access slips a quarter: hardened services/world-sim/ as the V&V environment.
Multi-domain expansion of services/world-sim/src/scenario.ts schema (sea + land asset kinds; corresponding MT builders). Scaling work via PBS subgame-resolving (Pluribus technique). Containerization: Dockerfile per service + docker-compose.prod.yml. TDP: architecture doc + interpretability case-book + V&V report.
Phase III (commercialization)¶
Cut the breadth claim in the prior chat plan ("cyber, supply chain, market-making"). Cyber and supply chain are not naturally two-player zero-sum; the claim signals the team has not thought through the framing limits. Replace with one defensible vertical: Counter-UAS in joint coalition contexts (or naval surface engagement planning). One grounded paragraph beats three handwaves.
Critical files¶
New files (exhaustive)¶
Phase 0 refactor PR (independent of solver work):
- packages/uci-game/ — package.json, tsconfig.json, src/{index,types,dynamics,belief,actions,hash,payoff,worldMirror,report}.ts, test/{dynamics,belief,payoff,hash}.test.ts, fixtures/assets.json
- packages/uci-llm/ — package.json, tsconfig.json, src/{index,types,registry,toolUse,fallbackStructuredOutput,select}.ts, src/clients/{anthropic,ollama,bedrock,openai-compat}.ts, test/{registry,parity,toolUse,fallback}.test.ts. LanguageModelClient interface + 4 shipped clients + registerClient(name, factory) registry for user-defined backends. Parity test runs the same prompt + tool schema against every shipped backend and asserts structurally equivalent output.
- services/copilot/src/llmAgent.ts — refactor of claudeAgent.ts:1-207 consuming @uci-demo/llm; original file is renamed in this PR.
- services/copilot/src/service.ts — extracted startCopilot(opts)
- services/world-sim/src/service.ts — extracted startWorldSim(opts)
- packages/uci-bus/src/concurrency.ts — extracted runWithConcurrency
Solver core PR:
- packages/uci-solver/ — package.json, tsconfig.json, src/{index,escfr,regret,bank,blueprint,serialize}.ts, src/subroutines/{identityGate,roeEscalation,softKillFirst,replanEscalation,jammerCounter,fratricideAvoidance,fuelConservation,commsDegradeHedge,index}.ts, test/{escfr,regret,bank,blueprint,kuhn}.test.ts, test/subroutines/*.test.ts
Red + daemon PR:
- services/red-agent/ — package.json, tsconfig.json, src/{main,redLoop,scenarios}.ts, src/policies/{scripted,solverDriven,index}.ts, test/redLoop.test.ts
- services/solver-daemon/ — package.json, tsconfig.json, src/{main,selfPlay,rpc}.ts, test/rpc.test.ts
SolverAgent + eval PR:
- services/copilot/src/solverAgent.ts, services/copilot/src/solverAgent.test.ts
- services/eval-harness/ — package.json, tsconfig.json, src/{main,runner,scoreReplay,scenarios}.ts, regression.config.json, test/runner.test.ts
- .github/workflows/eval.yml
Whitepaper / artifacts PR:
- docs/whitepaper/ — markdown source + figures
- docs/benchmarks/ — exploitability-vs-iterations plot, wall-clock-vs-problem-size plot, tournament results JSON
Edited files¶
services/copilot/src/main.ts— lines 54-786 body extracted toservice.ts; lines 58-68 agent selection becomes three-way (USE_SOLVER=1>LLM_PROVIDERset > scripted). Selection no longer keys onANTHROPIC_API_KEYspecifically — that env var is just one input topackages/uci-llmprovider selection.services/copilot/src/claudeAgent.ts→ renamedservices/copilot/src/llmAgent.ts; body refactored to consumeLanguageModelClientfrom@uci-demo/llm. No direct@anthropic-ai/sdkimport remains anywhere inservices/copilot/.services/copilot/src/worldState.ts— re-export from@uci-demo/game/worldMirrorservices/copilot/package.json— add@uci-demo/game,@uci-demo/solverworkspace depsservices/world-sim/src/main.ts— body extracted toservice.tsservices/adsb-bridge/src/bridge.ts— importrunWithConcurrencyfrom@uci-demo/bus/concurrencypackage.json(root) —pnpm upscript addsRED+SOLVERto concurrently list, gated onUSE_SOLVER=1(default off — keeps existing demo behavior)README.md— add Solver Quickstart section pointing atUSE_SOLVER=1 pnpm upBUILD.md— Day 12+ entries
Existing utilities to reuse (do not duplicate)¶
- Agent contract:
services/copilot/src/types.ts:73-86(Agentinterface,AgentDecisionunion,EvaluationContext). - External-source publishing pattern:
services/adsb-bridge/src/bridge.ts:120-275(template for red-agent — emit EntityNotification + PositionReport + EntityLost lifecycle). - World state mirroring:
services/copilot/src/worldState.ts:4-10(XMLParserinstance) and:49-139(entity ingestion) — promote to@uci-demo/game/worldMirror, both copilot and solver-daemon import from there. - Reasoning-line streaming:
services/copilot/src/main.ts:120-133(publishReasoningLine). SolverAgent populatesdecision.rationale[]; copilot publishes one line per subroutine via existing loop at:447-453. - Codec builders:
buildEntityNotification,buildEntity,buildPositionReport,buildEntityLostfrom@uci-demo/codec(Red agent reuses; never hand-rolls XML). - Bus client:
connectBusfrom@uci-demo/bus(every new service). - Comms-degrade injection:
apps/cop-ui/lib/degrade.tspublishDegrade()is callable from the eval harness (not UI-coupled). - Validator audit:
http://127.0.0.1:7700/audit?n=N— eval harness consumes for schema-validity row inEvalReport. Note sampling regime (first 20/MT + 1-in-10) — extend validator withVALIDATOR_FULL_AUDIT=1env for benchmark runs. - Scenario YAML schema:
services/world-sim/src/scenario.ts:1-122— extend with new event types (runtime_spawn,red_inject) only if non-bridge Red is needed; default approach uses bridge pattern. - Playwright extensibility:
apps/cop-ui/playwright.config.ts— can wrap E2E in scenario × degrade × agent matrix later; not Phase 0 critical.
Verification¶
Algorithmic correctness (CI)¶
pnpm -F @uci-demo/solver test
# Must pass: packages/uci-solver/test/kuhn.test.ts
# (ES-MCCFR converges to <0.01 exploitability in 10k iterations on Kuhn poker)
This is the gate that says "the kernel is real." Reviewer will look for this.
End-to-end smoke¶
# 1. Baseline matrix.
pnpm -F @uci-demo/eval run bench -- \
--scenarios tripwire,vanguard,stillwater \
--agents scripted \
--degrade none,light,heavy \
--episodes-per-cell 20 \
--report out/eval/baseline-scripted.json
# 2. Run solver-daemon to convergence (~10 min wall-clock on tactical).
USE_SOLVER=1 pnpm -F @uci-demo/solver-daemon start &
until [ "$(mosquitto_sub -t 'uci-demo/solver/status' -C 1 | jq '.exploitability < 0.05')" = "true" ]; do sleep 30; done
# 3. Solver-agent matrix.
pnpm -F @uci-demo/eval run bench -- \
--scenarios tripwire,vanguard,stillwater \
--agents solver \
--degrade none,light,heavy \
--episodes-per-cell 20 \
--report out/eval/solver.json
# 4. Compare. Pass condition: median ΔU_B >= +0.5, Welch's t p < 0.05.
pnpm -F @uci-demo/eval run compare \
--baseline out/eval/baseline-scripted.json \
--candidate out/eval/solver.json \
--metric blueUtility
Manual demo verification¶
USE_SOLVER=1 RED_AGENT=solver-driven pnpm up. Browser: proposals appear within ~3s of each Red contact. Reasoning panel shows subroutine-weighted explanations (e.g.,"ReplanEscalation 0.72 — soft-kill failed twice; escalating to kinetic"). Validator audit feed athttp://127.0.0.1:7700/audit?n=20shows 100% valid.mosquitto_pub -t uci-demo/world/degrade -m '{"dropPercent":80,"latencyMs":500,"durationMs":20000}'. Within one self-play epoch (visible onuci-demo/solver/statusretained heartbeat), Blue blueprint shifts mass towardCommsDegradeHedge. Reasoning trace reflects the shift.- Force a HAWK-2 fuel exhaustion. Observe
FuelConservationsubroutine gain weight on the next evaluation; copilot recommends GUARDIAN-3 handoff in the reasoning rail. - Click MODIFY on an active proposal. Copilot publishes UPDATE trio; reasoning rail logs
OPERATOR // MODIFY; solver-daemon's worldMirror records the operator action as a public observation, which the next info-set's belief reflects.
Plot/artifact verification (proposal deliverables)¶
docs/benchmarks/exploitability-vs-iterations.pngexists, has a monotone-decreasing curve on Kuhn (and on Tripwire abstraction).docs/benchmarks/wallclock-vs-problemsize.{png,json}exists with 3 data points (3/10/30 effectors).docs/benchmarks/sme-tournament.jsonexists with n≥15 logged games; aggregate Blue-utility result and per-SME breakdown.docs/whitepaper/main.md≥15 pages covering model, algorithm, abstraction, results, scaling, integration, transition.docs/video/solver-vs-scripted-tripwire.mp4exists, 3–5 min, side-by-side on identical scenario seed.
Compliance gate against RFP attributes¶
Before submission, every row in this table must be answerable with a specific repo artifact:
| RFP attribute | Required artifact | Source |
|---|---|---|
| Dominant Performance | SME tournament JSON (n≥15) + scripted-baseline delta with CI95 | docs/benchmarks/sme-tournament.json + out/eval/*.json |
| Human-Interpretability | 3 worked case studies showing regret decomposition + mixed strategy + outcome | docs/whitepaper/case-studies/ |
| Scalability | Wall-clock-vs-problem-size plot, 3 data points, log-log fit | docs/benchmarks/wallclock-vs-problemsize.{png,json} |
| Computational Efficiency | Single-workstation benchmark, no GPU, named CPU spec | docs/benchmarks/host.json + reproduce instructions |
| Anytime | Exploitability-vs-iterations plot + retained uci-demo/solver/status heartbeat in live demo |
docs/benchmarks/exploitability-vs-iterations.png + video |