Skip to content

Plan — Game-Theoretic COA Engine on uci-demo for SBIR OSW26BZ02-DV004 (D2P2)

Context

SBIR topic OSW26BZ02-DV004 (SCO, Direct-to-Phase-II) asks for a mature, scalable, robust game-theoretic AI that computes approximate Nash equilibria in imperfect-information multi-domain wargames; beats expert red teams; stays human-interpretable; scales tactical → operational; runs CPU-only; is anytime. The "non-responsive" floor for the proposal is a working prototype with quantified performance against expert humans or recognized AI benchmarks — narrative does not clear the bar.

uci-demo today (post v1.3.0 main, 21/596 UCI v2.5 MTs) is a strong demo substrate but is not the engine the RFP wants. The Agent interface in services/copilot/src/types.ts:73-86 and the modular scriptedAgent.tsclaudeAgent.ts slot exactly fit the "modular doctrinal subroutines, not monolithic NN" attribute. Comms-degrade injection (apps/cop-ui/lib/degrade.ts + services/world-sim/src/sim.ts:137-218) and replay (apps/cop-ui/lib/{messageBuffer,replay}.ts) partially answer Phase II V&V. Validator audit (services/validator/) and CI gate exist. But there is no solver, no Red agent, no utility function, no headless eval harness anywhere in the repo — every scoring story is net-new code.

A reviewer audit of the prior chat plan found three structural weaknesses: (1) the 4–6-week pre-proposal sprint is too short for a credible D2P2 prior-art package — realistic is 12 weeks; (2) benchmarking only against scriptedAgent does not address the "paramount evaluation criterion" of defeating experienced human red teams; (3) AFSIM access realistically takes 6–12 months post-award (ITAR/JCP), so M&S transition must lead with Command: Professional Edition. This plan re-scopes Phase 0 accordingly.

The intended outcome: a 12-week pre-proposal sprint that produces a CPU-only ES-MCCFR + Public Belief State solver, a programmatic Red agent, a headless eval harness, a 4–6 SME human micro-tournament, and the white-paper / plot pair / video that make the D2P2 proposal responsive on every required attribute. The architecture choices preserve the demo's existing "smart-but-readable" code character: pure TypeScript, single MQTT transport, single XML parser, no GPU dependency, no neural-network black boxes.

Standards posture (UCI v2.5 + OMS v2.5)

The demo substrate is already shaped like an OMS v2.5 Mission Package (released 2026-01-22, governed by the OACWG): Mosquitto is the Abstract Service Bus, packages/uci-bus + packages/uci-codec are the de facto Critical Abstraction Layer, services/* are OMS Services, and services/adsb-bridge is shaped like an OMS Isolator. Phase 0 does not add OMS conformance work — that's a Phase II Year-1 deliverable in the companion plan — but the proposal text names OMS v2.5 as the framing standard so reviewers see the system positioned for an OMS-compliant Phase II without overpromising. The pre-proposal sprint stays focused on the solver, the Red agent, and the SME tournament.


Architecture (one paragraph)

Five new workspace members. Two library packages that build to dist/ like @uci-demo/bus and @uci-demo/codec: @uci-demo/game (pure domain types — GameState, InformationSet, PublicBeliefState, Payoff, GameDynamics, no I/O) and @uci-demo/solver (ES-MCCFR core, Float32Array-backed regret tables, StrategyBank of modular subroutines, AnytimeBlueprint query API, Kuhn-poker correctness test). Three services: services/solver-daemon/ (long-lived self-play, owns the only RegretTable instance, answers info-set queries over MQTT side-channel uci-demo/solver/query/+/+ with a uci-demo/solver/reply/<requestId> response and uci-demo/solver/status retained heartbeat), services/red-agent/ (programmatic adversary publishing EntityNotificationMT + PositionReportMT exactly like services/adsb-bridge/src/bridge.ts:120-275zero changes to world-sim or scenario.ts), and services/eval-harness/ (headless N scenarios × M agents × K degrade presets runner emitting a versioned EvalReport JSON). The third SolverAgent is not a new package — it lives at services/copilot/src/solverAgent.ts alongside the existing scripted/claude impls, queries the daemon over MQTT, returns an AgentDecision, and the copilot's existing orchestration (services/copilot/src/main.ts:88-109) publishes everything to the wire. Solver core is pure TypeScript on tsx with typed-array regret tables — Rust/N-API is deferred behind a RegretTable interface escape hatch, only invoked if the operational benchmark workflow demands it.

Language & runtime constraints

  • TypeScript only (no Rust core in Phase 0). V8 typed arrays sustain ~10⁷–10⁸ regret updates/sec/core, covering tactical scale.
  • One MQTT client per service via @uci-demo/bus connectBus. Solver query/reply rides MQTT, not gRPC.
  • One XML parser: solver-daemon reuses the XMLParser pattern from services/copilot/src/worldState.ts:4-10. Red-agent only builds XML via codec, never parses.
  • ESM + verbatimModuleSyntax + .js import extensions for all new TS code (matches tsconfig.base.json).
  • No schema/UCI_v2_5/ changes. Every new MT use already has a builder in packages/uci-codec/src/builders/.
  • No @uci-demo/codec top-level import from apps/cop-ui/ — if cop-ui ever needs solver visualization, use @uci-demo/codec/browser.
  • LLM rationalization layer is fully model-agnostic; Claude stays first-class. New packages/uci-llm/ defines a LanguageModelClient interface with structured tool-use, completion, streaming, prompt caching (where supported), and arbitrary sampling-param overrides. The operator can use any backend they want — every client is a single ≤200-LOC adapter under packages/uci-llm/src/clients/<name>.ts implementing the same interface. Headline set shipped on day 1: anthropic (Claude — the preferred default wherever reachable, prompt-caching on, current demo behavior preserved), ollama (local models — preferred default in air-gapped deployments), bedrock (Claude / Nova / Llama / Mistral on AWS, including GovCloud), and openai-compat (one client covers OpenAI, Azure OpenAI, Together, Groq, Fireworks, vLLM with OpenAI-compatible endpoint, llama.cpp server, and any OpenAI-shaped HTTP service). Provider chosen via env: LLM_PROVIDER, LLM_BASE_URL, LLM_MODEL, LLM_API_KEY. Default selection: LLM_PROVIDER if set, else anthropic when ANTHROPIC_API_KEY is present (current demo behavior), else ollama. Capability flags (supportsToolUse, supportsPromptCache, supportsStreaming, supportsGrammar) declared per client; where a backend lacks native tool-use the interface layer transparently falls back to JSON-mode + grammar or instruction-prompted structured output, so the same Agent contract works against any backend. A registerClient(name, factory) registry lets downstream consumers add custom backends without forking packages/uci-llm/. The existing services/copilot/src/claudeAgent.ts is renamed llmAgent.ts in Phase 0 week 1-2 and refactored to consume the abstraction — the system prompt, tool schemas, prompt-cache strategy, and tool-use semantics are preserved verbatim; only the SDK call site moves behind the interface. No service may import any vendor SDK directly outside packages/uci-llm/src/clients/. A shared structured-output parity test (packages/uci-llm/test/parity.test.ts) runs the same prompt + tool-schema against every shipped backend so swapping providers never silently changes agent behavior. Optional narrate?: (trace) => Promise<string[]> on SolverAgent accepts a LanguageModelClient and natural-languages a regret decomposition; the action is always the solver's.

Utility function (packages/uci-game/src/payoff.ts)

Zero-sum: Red's payoff is -U_B. Version-tagged PAYOFF_V = 1 so weight changes don't silently invalidate historic runs. Computed online by solver-daemon's worldMirror.ts and offline by services/eval-harness/src/scoreReplay.ts.

U_B = + 1.0  · neutralized_hostiles               // EntityLostMT for HOSTILE/SUSPECT trackIds
      - 5.0  · fratricide_events                   // EntityLostMT for FRIEND inside any CapabilityCoverageAreaMT polygon
      - 0.2  · roe_violations                      // proposals violating uci-demo/world/roe band
      - 0.05 · fuel_fraction_burned_total          // integrated SubsystemStatusMT.state bands
      - 0.3  · failed_effects                      // EffectStatusMT.state = FAILED
      - 0.001· comms_degrade_seconds               // integral over uci-demo/world/degrade window
      - 0.002· mean_time_to_decision_ms / 1000     // copilot evaluate() wall time, capped 5s

Weights are constants, not learned — they are the operator's doctrinal preferences surface.

Information-set / PBS factoring

Wire field Visibility Goes into
PositionReportMT.{lat,lng,alt} public belief.publicPositions
EntityNotificationMT.Severity + EntityMT.Identity.Platform.ThreatType/Confidence public (noisy) drives Bayesian update of identityBelief
true Identity enum hidden from Blue GameState.hidden.trueIdentity
SubsystemStatusMT.state band public belief.fuelBelief
exact fuel fraction hidden from Red GameState.hidden.trueFuel
uci-demo/world/roe (retained) public belief.roe
uci-demo/world/degrade public belief.commsDegrade

InfoSet key = FNV-1a-64 over canonical encoding of bucketed (roe, commsBucket, ∀trackId: identityBelief decile + threatType bucket, ∀effectorId: fuel band, recent_actions[last 8]). Bucketing holds tactical info-set count near 10³.

Modular doctrinal subroutines (packages/uci-solver/src/subroutines/)

The regret table is keyed over subroutine IDs, not raw actions. This is the architectural answer to RFP attribute #2 (interpretability) — nothing is a neural blob, every component is reviewable TS.

Seeded from scriptedAgent.ts:24-66's existing factoring:

Subroutine Doctrine
IdentityGate withhold when P(FRIEND) > 0.4
RoeEscalation ROE RED ⇒ kinetic-first
SoftKillFirst AMBER + low PID confidence ⇒ EW-first
ReplanEscalation soft-kill failed ⇒ kinetic
JammerCounter threatType==="JAMMER" ⇒ skip EW (mirrors scriptedAgent.ts:82-85)
FratricideAvoidance withhold if FRIEND inside CapabilityCoverageAreaMT polygon
FuelConservation degrade effector preference when fuel band CRITICAL
CommsDegradeHedge high belief.commsDegrade ⇒ prefer autonomous-capable effector

StrategyBank.composedPolicy softmax-weights each subroutine's distribution by regret, mixes, renormalizes. bank.explain(info, regrets) emits one SubroutineTrace per active subroutine — these become the decision.rationale[] strings the copilot publishes to uci-demo/copilot/reason/<planId> via existing publishReasoningLine (services/copilot/src/main.ts:120-133).

Anytime semantics

Blueprint-in-daemon model. Solver-daemon runs ES-MCCFR continuously; blueprint exploitability decreases monotonically with daemon uptime. SolverAgent.evaluate() is O(1) info-set lookup + ~50ms MQTT RPC, comfortably inside the 5s budget at services/copilot/src/types.ts:73-86. The retained uci-demo/solver/status heartbeat carries {iterations, exploitability, infoSetCount} — that retained message is the anytime guarantee. Cold-start falls back to scriptedAgent and tags the rationale with "solver-blueprint cold; using scripted fallback".

Red agent

Identical pattern to services/adsb-bridge/src/bridge.ts:120-275. Publishes EntityNotificationMT, EntityMT, PositionReportMT, EntityLostMT with a distinct senderSystemId and RED- prefixed topic ids. No changes to services/world-sim/src/sim.ts or services/world-sim/src/scenario.ts. Two policy backends: scripted (heuristic baseline for Phase 0 ladder) and solver-driven (queries solver-daemon for Red-side policy via the same RPC surface).

Headless eval harness

CLI: tsx services/eval-harness/src/main.ts --scenarios ... --agents ... --degrade ... --episodes-per-cell N --report out/eval/<ts>.json. Boots Mosquitto via existing docker-compose.yml, then imports extracted startWorldSim() and startCopilot() functions in-process — no cop-ui. Deterministic seeds for scenario, Red, MCCFR. Emits EvalReport JSON + per-episode NDJSON timeline + bus log to out/eval/<runId>/<scenario>-<agent>-<degrade>-<idx>/.

Phase 0 — 12-week sprint (not 4-6)

The 4–6-week estimate in the prior chat plan was unrealistic. Re-scoped against the actual D2P2 evidence list:

Week Workstream
1-2 Refactor for extractability. Extract startCopilot() / startWorldSim(); promote worldState.ts to @uci-demo/game/worldMirror; extract runWithConcurrency from adsb-bridge/bridge.ts:89-108; introduce packages/uci-llm/ (LanguageModelClient + AnthropicClient + OllamaClient) and refactor claudeAgent.tsllmAgent.ts behind it. Every refactor PR runs an end-to-end smoke testpnpm up, validator audit at http://127.0.0.1:7700/audit?n=50 is 100% valid, approval card / MODIFY round-trip / comms-degrade injection / replay reconstruction all still work. Typecheck + unit tests alone do not gate refactor merges. PR-mergeable to main independently.
2-4 @uci-demo/game types + dynamics + PBS belief update + payoff. Vitest suite for belief Bayesian update + payoff math.
3-6 @uci-demo/solver ES-MCCFR + regret/strategy tables + strategy bank + blueprint. Kuhn poker correctness test at packages/uci-solver/test/kuhn.test.ts — converges to <0.01 exploitability in 10k iterations. Without this no review trusts the kernel.
5-7 services/red-agent/ + services/solver-daemon/. Subprocess gated on USE_SOLVER=1 so default pnpm up is unchanged.
6-8 services/copilot/src/solverAgent.ts + three-way agent selection at services/copilot/src/main.ts:58-68.
7-9 services/eval-harness/ + .github/workflows/eval.yml. Tactical bench runs on every PR labeled solver-perf; operational bench nightly on self-hosted.
8-10 Scaling study. 3-effector Tripwire → 10/30 synthetic → 30/100 synthetic. Plot wall-clock-to-target-exploitability vs problem size (3 data points = credible log-log fit).
9-11 Human SME micro-tournament. 4–6 retired O-3/O-4 with C-UAS or air-ops backgrounds, 3 games each vs solver on Tripwire/Vanguard. n≈15-20 games with logged decisions. Budget ~$25k in honoraria. Highest-leverage single line item in the entire sprint.
10-12 White paper (≥15 pages: model, algorithm, abstraction, exploitability plot, scaling plot, tournament results, integration story, transition path) + demo video (3–5 min, solver-vs-scripted side-by-side on same scenario seed) + interpretability case studies (3 worked decisions with regret decomposition + mixed strategy + outcome).

Phase II Year 1 (post-award)

Primary M&S integration: Command: Professional Edition (~$3–10k/seat, commercially licensable, Lua API, achievable on SBIR budget). New services/cpe-bridge/ mirrors the adsb-bridge pattern.

Secondary, stretch: AFSIM via JCP/DD2345 sponsorship — apply for JCP now, before proposal submission, since realistic access timeline is 6–12 months. Fallback if government access slips a quarter: hardened services/world-sim/ as the V&V environment.

Multi-domain expansion of services/world-sim/src/scenario.ts schema (sea + land asset kinds; corresponding MT builders). Scaling work via PBS subgame-resolving (Pluribus technique). Containerization: Dockerfile per service + docker-compose.prod.yml. TDP: architecture doc + interpretability case-book + V&V report.

Phase III (commercialization)

Cut the breadth claim in the prior chat plan ("cyber, supply chain, market-making"). Cyber and supply chain are not naturally two-player zero-sum; the claim signals the team has not thought through the framing limits. Replace with one defensible vertical: Counter-UAS in joint coalition contexts (or naval surface engagement planning). One grounded paragraph beats three handwaves.


Critical files

New files (exhaustive)

Phase 0 refactor PR (independent of solver work): - packages/uci-game/package.json, tsconfig.json, src/{index,types,dynamics,belief,actions,hash,payoff,worldMirror,report}.ts, test/{dynamics,belief,payoff,hash}.test.ts, fixtures/assets.json - packages/uci-llm/package.json, tsconfig.json, src/{index,types,registry,toolUse,fallbackStructuredOutput,select}.ts, src/clients/{anthropic,ollama,bedrock,openai-compat}.ts, test/{registry,parity,toolUse,fallback}.test.ts. LanguageModelClient interface + 4 shipped clients + registerClient(name, factory) registry for user-defined backends. Parity test runs the same prompt + tool schema against every shipped backend and asserts structurally equivalent output. - services/copilot/src/llmAgent.ts — refactor of claudeAgent.ts:1-207 consuming @uci-demo/llm; original file is renamed in this PR. - services/copilot/src/service.ts — extracted startCopilot(opts) - services/world-sim/src/service.ts — extracted startWorldSim(opts) - packages/uci-bus/src/concurrency.ts — extracted runWithConcurrency

Solver core PR: - packages/uci-solver/package.json, tsconfig.json, src/{index,escfr,regret,bank,blueprint,serialize}.ts, src/subroutines/{identityGate,roeEscalation,softKillFirst,replanEscalation,jammerCounter,fratricideAvoidance,fuelConservation,commsDegradeHedge,index}.ts, test/{escfr,regret,bank,blueprint,kuhn}.test.ts, test/subroutines/*.test.ts

Red + daemon PR: - services/red-agent/package.json, tsconfig.json, src/{main,redLoop,scenarios}.ts, src/policies/{scripted,solverDriven,index}.ts, test/redLoop.test.ts - services/solver-daemon/package.json, tsconfig.json, src/{main,selfPlay,rpc}.ts, test/rpc.test.ts

SolverAgent + eval PR: - services/copilot/src/solverAgent.ts, services/copilot/src/solverAgent.test.ts - services/eval-harness/package.json, tsconfig.json, src/{main,runner,scoreReplay,scenarios}.ts, regression.config.json, test/runner.test.ts - .github/workflows/eval.yml

Whitepaper / artifacts PR: - docs/whitepaper/ — markdown source + figures - docs/benchmarks/ — exploitability-vs-iterations plot, wall-clock-vs-problem-size plot, tournament results JSON

Edited files

  • services/copilot/src/main.ts — lines 54-786 body extracted to service.ts; lines 58-68 agent selection becomes three-way (USE_SOLVER=1 > LLM_PROVIDER set > scripted). Selection no longer keys on ANTHROPIC_API_KEY specifically — that env var is just one input to packages/uci-llm provider selection.
  • services/copilot/src/claudeAgent.tsrenamed services/copilot/src/llmAgent.ts; body refactored to consume LanguageModelClient from @uci-demo/llm. No direct @anthropic-ai/sdk import remains anywhere in services/copilot/.
  • services/copilot/src/worldState.ts — re-export from @uci-demo/game/worldMirror
  • services/copilot/package.json — add @uci-demo/game, @uci-demo/solver workspace deps
  • services/world-sim/src/main.ts — body extracted to service.ts
  • services/adsb-bridge/src/bridge.ts — import runWithConcurrency from @uci-demo/bus/concurrency
  • package.json (root) — pnpm up script adds RED + SOLVER to concurrently list, gated on USE_SOLVER=1 (default off — keeps existing demo behavior)
  • README.md — add Solver Quickstart section pointing at USE_SOLVER=1 pnpm up
  • BUILD.md — Day 12+ entries

Existing utilities to reuse (do not duplicate)

  • Agent contract: services/copilot/src/types.ts:73-86 (Agent interface, AgentDecision union, EvaluationContext).
  • External-source publishing pattern: services/adsb-bridge/src/bridge.ts:120-275 (template for red-agent — emit EntityNotification + PositionReport + EntityLost lifecycle).
  • World state mirroring: services/copilot/src/worldState.ts:4-10 (XMLParser instance) and :49-139 (entity ingestion) — promote to @uci-demo/game/worldMirror, both copilot and solver-daemon import from there.
  • Reasoning-line streaming: services/copilot/src/main.ts:120-133 (publishReasoningLine). SolverAgent populates decision.rationale[]; copilot publishes one line per subroutine via existing loop at :447-453.
  • Codec builders: buildEntityNotification, buildEntity, buildPositionReport, buildEntityLost from @uci-demo/codec (Red agent reuses; never hand-rolls XML).
  • Bus client: connectBus from @uci-demo/bus (every new service).
  • Comms-degrade injection: apps/cop-ui/lib/degrade.ts publishDegrade() is callable from the eval harness (not UI-coupled).
  • Validator audit: http://127.0.0.1:7700/audit?n=N — eval harness consumes for schema-validity row in EvalReport. Note sampling regime (first 20/MT + 1-in-10) — extend validator with VALIDATOR_FULL_AUDIT=1 env for benchmark runs.
  • Scenario YAML schema: services/world-sim/src/scenario.ts:1-122 — extend with new event types (runtime_spawn, red_inject) only if non-bridge Red is needed; default approach uses bridge pattern.
  • Playwright extensibility: apps/cop-ui/playwright.config.ts — can wrap E2E in scenario × degrade × agent matrix later; not Phase 0 critical.

Verification

Algorithmic correctness (CI)

pnpm -F @uci-demo/solver test
# Must pass: packages/uci-solver/test/kuhn.test.ts
# (ES-MCCFR converges to <0.01 exploitability in 10k iterations on Kuhn poker)

This is the gate that says "the kernel is real." Reviewer will look for this.

End-to-end smoke

# 1. Baseline matrix.
pnpm -F @uci-demo/eval run bench -- \
  --scenarios tripwire,vanguard,stillwater \
  --agents scripted \
  --degrade none,light,heavy \
  --episodes-per-cell 20 \
  --report out/eval/baseline-scripted.json

# 2. Run solver-daemon to convergence (~10 min wall-clock on tactical).
USE_SOLVER=1 pnpm -F @uci-demo/solver-daemon start &
until [ "$(mosquitto_sub -t 'uci-demo/solver/status' -C 1 | jq '.exploitability < 0.05')" = "true" ]; do sleep 30; done

# 3. Solver-agent matrix.
pnpm -F @uci-demo/eval run bench -- \
  --scenarios tripwire,vanguard,stillwater \
  --agents solver \
  --degrade none,light,heavy \
  --episodes-per-cell 20 \
  --report out/eval/solver.json

# 4. Compare. Pass condition: median ΔU_B >= +0.5, Welch's t p < 0.05.
pnpm -F @uci-demo/eval run compare \
  --baseline out/eval/baseline-scripted.json \
  --candidate out/eval/solver.json \
  --metric blueUtility

Manual demo verification

  1. USE_SOLVER=1 RED_AGENT=solver-driven pnpm up. Browser: proposals appear within ~3s of each Red contact. Reasoning panel shows subroutine-weighted explanations (e.g., "ReplanEscalation 0.72 — soft-kill failed twice; escalating to kinetic"). Validator audit feed at http://127.0.0.1:7700/audit?n=20 shows 100% valid.
  2. mosquitto_pub -t uci-demo/world/degrade -m '{"dropPercent":80,"latencyMs":500,"durationMs":20000}'. Within one self-play epoch (visible on uci-demo/solver/status retained heartbeat), Blue blueprint shifts mass toward CommsDegradeHedge. Reasoning trace reflects the shift.
  3. Force a HAWK-2 fuel exhaustion. Observe FuelConservation subroutine gain weight on the next evaluation; copilot recommends GUARDIAN-3 handoff in the reasoning rail.
  4. Click MODIFY on an active proposal. Copilot publishes UPDATE trio; reasoning rail logs OPERATOR // MODIFY; solver-daemon's worldMirror records the operator action as a public observation, which the next info-set's belief reflects.

Plot/artifact verification (proposal deliverables)

  • docs/benchmarks/exploitability-vs-iterations.png exists, has a monotone-decreasing curve on Kuhn (and on Tripwire abstraction).
  • docs/benchmarks/wallclock-vs-problemsize.{png,json} exists with 3 data points (3/10/30 effectors).
  • docs/benchmarks/sme-tournament.json exists with n≥15 logged games; aggregate Blue-utility result and per-SME breakdown.
  • docs/whitepaper/main.md ≥15 pages covering model, algorithm, abstraction, results, scaling, integration, transition.
  • docs/video/solver-vs-scripted-tripwire.mp4 exists, 3–5 min, side-by-side on identical scenario seed.

Compliance gate against RFP attributes

Before submission, every row in this table must be answerable with a specific repo artifact:

RFP attribute Required artifact Source
Dominant Performance SME tournament JSON (n≥15) + scripted-baseline delta with CI95 docs/benchmarks/sme-tournament.json + out/eval/*.json
Human-Interpretability 3 worked case studies showing regret decomposition + mixed strategy + outcome docs/whitepaper/case-studies/
Scalability Wall-clock-vs-problem-size plot, 3 data points, log-log fit docs/benchmarks/wallclock-vs-problemsize.{png,json}
Computational Efficiency Single-workstation benchmark, no GPU, named CPU spec docs/benchmarks/host.json + reproduce instructions
Anytime Exploitability-vs-iterations plot + retained uci-demo/solver/status heartbeat in live demo docs/benchmarks/exploitability-vs-iterations.png + video