` + `SolverPill` (Phase 0 weeks 5-7)¶

Pre-code design memo for the workstream that makes the @uci-demo/solver kernel visibly alive in the demo. The kernel ships (PR #32); doctrinal subroutines + bank ship (PR #33); now we run ES-MCCFR continuously and publish the blueprint so the SolverAgent (next PR) and eval-harness (week 7-9) have a real trained policy to consume.

This PR is deliberately scoped to the daemon plus the visible proof-of-life pill. SolverAgent integration and red-agent are follow-ups in PR #35 and #36.

The headline acceptance gate: a reviewer running USE_SOLVER=1 pnpm run up sees a SOLVER ▸ 12.4K iter · 3,217 IS · δ 0.014 pill in the top-right of the COP, with the iteration counter incrementing. Without that visible proof, the algorithmic story remains invisible to anyone who doesn't read the code.

Strand 1 — `services/solver-daemon/`¶

What it does (one paragraph)¶

A standalone Node service that loads a synthetic version of the Tripwire scenario, builds a GameDynamics over it via @uci-demo/game, and runs iterate() from @uci-demo/solver continuously. Every ~100 iterations it publishes a status heartbeat to uci-demo/solver/status (retained). Every ~1000 iterations it publishes the current trained blueprint to uci-demo/solver/blueprint (retained). The daemon never reads from uci/v2_5/* — its world is the synthetic dynamics, not the live bus. The blueprint it publishes is the contract; downstream consumers (SolverAgent in PR #35, eval-harness in week 7-9) subscribe and consume in-memory without RPC.

What it does NOT do¶

Drive Blue decisions. PR #35 ships SolverAgent consuming the published blueprint via a BlueprintHolder module. The daemon's only output is the blueprint itself.
Read live bus traffic. Self-play happens against synthetic dynamics. Live integration is the SolverAgent's job.
Run Red. The red-agent service is PR #36.
Compute true exploitability. Tactical-scale exact best-response is intractable. We publish a delta-norm proxy (L2 norm of strategy change over the last 1k iterations) as the visible convergence signal. Real exploitability via local best-response (Lisý et al.) lands with the eval-harness.

Service shape¶

services/solver-daemon/
├── package.json          # workspace deps on @uci-demo/{bus,game,solver}
├── tsconfig.json
├── src/
│   ├── main.ts           # entry point, env parsing, bus connect, kick off daemon
│   ├── daemon.ts         # iteration loop, heartbeat + blueprint publish
│   ├── scenario.ts       # Tripwire YAML → GameState + GameDynamics adapter
│   └── deltaNorm.ts      # convergence-proxy helper
└── test/
    └── daemon.test.ts    # 1k-iter convergence smoke + publish-cadence check

Strand 2 — MQTT side-channels¶

Both topics are under uci-demo/solver/... (free-form JSON, the same convention as the existing uci-demo/world/... and uci-demo/copilot/... channels).

`uci-demo/solver/status` (retained)¶

Published every STATUS_INTERVAL_ITER = 100 iterations.

{
  "schemaV": 1,
  "iterations": 12400,
  "infoSetCount": 3217,
  "deltaNorm": 0.0142,
  "warm": true,           // true once iterations >= WARM_THRESHOLD (5000)
  "scenarioName": "OPERATION TRIPWIRE",
  "ts": 1779320000000,
  "beliefV": 1,
  "payoffV": 2,
  "infosetV": 1,
  "solverSchemaVersion": 1   // matches SCHEMA_VERSION from @uci-demo/solver
}

iterations: monotonically increasing.
infoSetCount: regretTableBlue.size() — visible memory proxy. Reaches a stable plateau as the policy covers the reachable info-set space.
deltaNorm: L2 norm of (σ̄_now - σ̄_1000_ago) averaged across info-sets. Decreasing curve = convergence.
warm: gate flag for SolverAgent — when false, the agent rationale should tag traces as "cold-start; using fallback."
Version tags: every payload embeds the four version constants from @uci-demo/game + @uci-demo/solver. Stale UI or SolverAgent can refuse to trust mismatched payloads.

`uci-demo/solver/blueprint` (retained)¶

Published every BLUEPRINT_INTERVAL_ITER = 1000 iterations. Payload is the JSON produced by serializeBlueprint() from @uci-demo/solver — already version-tagged, base64-encoded Float32Array buffers, refuses to load on version mismatch.

Size: ~5 MB for tactical-scale tables (10⁴ info-sets × 64 actions × 4 bytes × 2 tables, base64-inflated). Mosquitto handles this fine; SolverAgent will need to queueMicrotask the deserialize to avoid blocking the event loop (PR #35 concern).

What's NOT published yet¶

Per-side blueprints. The current serializeBlueprint envelope carries Blue's regret/avg-strategy only (red's is held in redRegret but not serialized). When PR #36 ships the red-agent, we'll either ship two retained envelopes (one per player) OR extend the schema (and bump SCHEMA_VERSION to 2).

Strand 3 — Synthetic scenario adapter¶

The daemon doesn't drive an MQTT world. It builds an in-memory GameState from the Tripwire YAML once at startup, plus a ScenarioTruth (the hidden identity/threat-type/fuel map), and calls createTacticalDynamics() to get a callable simulator.

Where the truth comes from¶

The Tripwire YAML (scenarios/counter-uas-tripwire.yaml) defines spawn events with identity and threatType per track. For the daemon's purposes, that IS the truth — we mirror those fields into ScenarioTruth.trueIdentity and ScenarioTruth.trueThreatType.

Effector fuel truth: each EFFECTOR_* asset starts at 1.0 (NORMAL), drains by 0.05 per engage. The daemon's trueFuel map is updated by the dynamics' apply step (which already tracks fuelBurnedTotal in counters — we just need to project that back to per-effector fuel fractions).

For Phase 0 simplicity, hardcode the Tripwire truth in the adapter. A YAML schema extension lands as a follow-up when we generalize to multi-scenario daemons.

Adapter signature¶

export interface SolverScenario {
  readonly name: string;
  readonly initialState: GameState;
  readonly truth: ScenarioTruth;
  readonly dynamics: GameDynamics;
}

export function loadTripwireScenario(): SolverScenario;

The adapter: 1. Builds the WorldSnapshot from the Tripwire spawn events 2. Builds the ScenarioTruth from the YAML identity/threatType 3. Calls buildGameState({world, truth, ...}) from @uci-demo/game 4. Calls createTacticalDynamics() for the dynamics 5. Returns the bundle

Reset / episode boundaries¶

The daemon iterates against this single static initial state. dynamics.isTerminal(state) triggers episode reset; the inner loop in @uci-demo/solver's iterate() already handles this via rootStateFactory(rng). The daemon's rootStateFactory returns the adapter's initialState (immutable) and lets apply() walk the tree from there.

Strand 4 — Daemon iteration loop¶

Cadence¶

Inner ES-MCCFR loop: BATCH_ITERATIONS = 50 iterations per loop tick (a small batch keeps the event loop responsive)
Outer batch loop: schedules the next batch via setImmediate() so we don't starve the event loop
Status publish: every STATUS_INTERVAL_ITER = 100 iterations (so every 2 batches)
Blueprint publish: every BLUEPRINT_INTERVAL_ITER = 1000 iterations
Delta-norm computation: every 100 iterations, comparing current avg-strategy to a snapshot taken 1000 iterations ago. Stored in a small ring buffer

Pseudocode¶

let totalIterations = 0;
const blueRegret = createRegretTable(64);
const redRegret = createRegretTable(64);
let lastSnapshot: Map<bigint, Float32Array> = new Map();

function tick() {
  if (shuttingDown) return;
  const result = iterate({
    iterations: BATCH_ITERATIONS,
    dynamics: scenario.dynamics,
    rootStateFactory: () => scenario.initialState,
    rng: mulberry32(SEED + totalIterations),
    regretBlue: blueRegret,
    regretRed: redRegret,
  });
  totalIterations += BATCH_ITERATIONS;

  if (totalIterations % STATUS_INTERVAL_ITER === 0) {
    const dnorm = computeDeltaNorm(blueRegret, lastSnapshot);
    publishStatus({
      iterations: totalIterations,
      infoSetCount: blueRegret.size(),
      deltaNorm: dnorm,
      warm: totalIterations >= WARM_THRESHOLD,
      // ... version tags ...
    });
  }

  if (totalIterations % BLUEPRINT_INTERVAL_ITER === 0) {
    publishBlueprint(blueRegret);
    lastSnapshot = snapshotAvgStrategy(blueRegret);  // for the next delta-norm window
  }

  setImmediate(tick);  // yield to other work; resume immediately
}

Convergence proxy: delta-norm¶

function computeDeltaNorm(table: RegretTable, prev: Map<bigint, Float32Array>): number {
  let sumSq = 0;
  let count = 0;
  for (const { info, avgStrategy } of table.entries()) {
    const prevRow = prev.get(info);
    if (!prevRow) continue;
    for (let i = 0; i < avgStrategy.length; i++) {
      const d = avgStrategy[i] - prevRow[i];
      sumSq += d * d;
      count++;
    }
  }
  return count > 0 ? Math.sqrt(sumSq / count) : 1.0;  // 1.0 = "no signal yet"
}

As the policy approaches Nash, consecutive averages converge and the L2 norm shrinks. Reaches ~0.001 in well-converged tables.

Why setImmediate, not setInterval¶

setImmediate() yields control to I/O (the MQTT publish + any incoming messages) between batches without artificially throttling the iteration rate. Performance-bound by the kernel's per-iteration cost, not by a wall-clock interval.

Throughput target¶

Phase 0 acceptance: 1k iterations / second of wall-clock on commodity dev hardware (M1/M2 MacBook). At that rate, the daemon reaches "warm" (5000 iterations) in 5 seconds and ships its first blueprint within ~1 minute.

Strand 5 — `SolverPill` UI component¶

Slot¶

Top-right of TopStrip, next to the existing ROE indicator. The TopStrip currently shows scenario name + ROE + comms; the pill goes between ROE and comms.

Props¶

interface SolverPillProps {
  readonly status: SolverStatus | null;
}

The pill is a pure render — it subscribes to state.solverStatus from the zustand store.

Visual spec¶

When status === null (daemon offline):

[ SOLVER ▸ OFFLINE ]

In --color-fg-faint, no glow.

When status is set but !warm (cold-start):

[ SOLVER ▸ 2.1K iter · 412 IS · δ — ]

In --color-amber with phosphor glow. The δ shows — because no prior snapshot exists yet.

When status.warm === true (running normally):

[ SOLVER ▸ 12.4K iter · 3,217 IS · δ 0.014 ]

In --color-cyan with phosphor glow. The δ value renders in: - --color-grant when δ < 0.02 (converged-ish) - --color-cyan when 0.02 ≤ δ < 0.1 (training) - --color-amber when δ ≥ 0.1 (early-training)

Stale heartbeat (last ts > 30s ago):

[ SOLVER ▸ STALLED 12.4K iter ]

In --color-threat. Detection via ts field on the most recent status payload.

Iteration formatter¶

function fmtIter(n: number): string {
  if (n < 1000) return n.toString();
  if (n < 1_000_000) return (n / 1000).toFixed(1) + "K";
  return (n / 1_000_000).toFixed(2) + "M";
}

12_437 → 12.4K. 1_412_000 → 1.41M.

InfoSet count formatter¶

function fmtIS(n: number): string {
  return n.toLocaleString();  // 3217 → "3,217"
}

title attribute with full breakdown: "iterations=12437 IS=3217 δ=0.0142 warm=true scenario=OPERATION TRIPWIRE payoffV=2 beliefV=1 infosetV=1"

Strand 6 — `USE_SOLVER=1` gating¶

The default pnpm run up should NOT spawn the daemon — existing demo behavior is preserved. The daemon is opt-in via env flag:

USE_SOLVER=1 pnpm run up   # spawns the solver-daemon alongside everything else
pnpm run up                 # existing behavior; daemon stays off

Implementation¶

Two approaches:

A. Conditional concurrently argv (clean, declarative).

Update root package.json's up script to spawn the daemon conditionally via a small shell expansion:

"up": "docker compose up -d mosquitto && concurrently ... $([ \"$USE_SOLVER\" = \"1\" ] && echo '--names VAL,SIM,CP,UI,ADSB,SLV --prefix-colors cyan,yellow,magenta,green,blue,violet' || echo '--names VAL,SIM,CP,UI,ADSB --prefix-colors cyan,yellow,magenta,green,blue') ... \"pnpm --filter @uci-demo/validator start\" ... $([ \"$USE_SOLVER\" = \"1\" ] && echo '\"pnpm --filter @uci-demo/solver-daemon start\"' || echo '')"

That's ugly. Reject.

B. Two scripts (clearest).

"up": "...existing concurrently list, no daemon...",
"up:solver": "... existing list + '\"pnpm --filter @uci-demo/solver-daemon start\"' ..."

Operator runs pnpm run up:solver when they want the solver. Two scripts, no shell magic, completely transparent.

Recommendation: B. Add a one-line note in README.md / CLAUDE.md explaining when to use which.

Why not a runtime flag inside the daemon?¶

We could ship the daemon always-on and have it gate its own heartbeat on an env var. But that wastes CPU on a service that's mostly unused in dev. The two-script approach matches the "intentional gates" discipline from CLAUDE.md.

Strand 7 — Cop-UI integration¶

Store extensions (`lib/store.ts`)¶

interface SolverStatus {
  readonly iterations: number;
  readonly infoSetCount: number;
  readonly deltaNorm: number;
  readonly warm: boolean;
  readonly scenarioName: string;
  readonly ts: number;
  readonly beliefV: number;
  readonly payoffV: number;
  readonly infosetV: number;
}

interface CopState {
  // ...existing fields
  solverStatus: SolverStatus | null;
  /** Most recently received blueprint blob (raw JSON string) — for replay/debug. */
  solverBlueprintJson: string | null;
  setSolverStatus: (s: SolverStatus | null) => void;
  setSolverBlueprintJson: (json: string) => void;
}

Bus subscriber additions (`lib/busSubscriber.ts`)¶

Two new dispatch branches:

if (topic === "uci-demo/solver/status") {
  try {
    const json = JSON.parse(payloadText) as SolverStatus & { schemaV?: number };
    if (typeof json.iterations === "number" && typeof json.ts === "number") {
      useCop.getState().setSolverStatus(json);
    }
  } catch (err) { console.error("[cop-ui] solver/status parse error:", err); }
  return;
}

if (topic === "uci-demo/solver/blueprint") {
  // Just store the JSON; SolverAgent (PR #35) deserializes when it ships.
  if (payloadText.length > 0) {
    useCop.getState().setSolverBlueprintJson(payloadText);
  }
  return;
}

Subscribe list extension (currently uci-demo/copilot/#, uci-demo/world/#, uci-demo/scenario/#): add uci-demo/solver/#.

Replay buffer¶

The existing message buffer captures all subscribed topics with no allowlist filtering. The solver topics are heavy (~5 MB blueprint, retained-and-resent-on-reconnect), so consider capping buffer entries for the blueprint topic OR excluding it from the buffer. Trade-off: blueprint excluded means replay doesn't reconstruct the solver pill's state precisely. Acceptable for Phase 0; revisit when replay UX matures.

Strand 8 — Tests¶

`services/solver-daemon/test/daemon.test.ts`¶

Synthetic 1k-iter convergence smoke: build the Tripwire adapter, run 1000 iterations against the dynamics, assert regretBlue.size() > 0 (info-sets were touched) and deltaNorm < 1.0 (not stuck at initial uniform).
Status payload shape: mock the bus client, run 200 iterations, capture published uci-demo/solver/status messages. Assert exactly 2 status emissions (at iter 100 and 200). Assert each has the version tags.
Blueprint cadence: run 2000 iterations, assert exactly 2 blueprint emissions (at iter 1000 and 2000). Deserialize each via deserializeBlueprint from @uci-demo/solver — must not throw.
Delta-norm decreasing: after warming up (5k iterations), the delta-norm at iter 5000 should be smaller than at iter
(Probabilistic — tolerate one failure with a seed sweep, similar to the Kuhn correctness gate.)

Cop-UI smoke¶

No formal test (cop-ui has no vitest harness). Manual via the acceptance gate below.

Acceptance criteria¶

pnpm -r typecheck clean across all 12 packages plus the new solver-daemon.
pnpm -r test passes all existing 380 tests plus the new daemon tests.
USE_SOLVER=1 pnpm run up (or pnpm run up:solver — exact syntax per the memo's Strand 6) boots all services AND the solver-daemon. Default pnpm run up continues to NOT spawn the daemon.
Within 10 seconds of pnpm run up:solver:
mosquitto_sub -t uci-demo/solver/status -C 1 shows a JSON payload with iterations > 0
The COP's SolverPill reads SOLVER ▸ <N> iter · <M> IS · δ <V> and the iteration count visibly increments
Within 1 minute:
uci-demo/solver/blueprint has been published at least once
mosquitto_sub on that topic shows a JSON blob ≥ 100 KB
Validator audit stays at 100% valid — none of the new solver topics touch the schema-bound uci/v2_5/# channels.

What this memo deliberately does not specify¶

SolverAgent integration (PR #35) — the daemon publishes; an Agent that consumes the blueprint to drive Blue decisions is a separate workstream. PR #35 adds BlueprintHolder + SolverAgent + three-way agent selection in services/copilot/.
Red-agent service (PR #36) — the bus-symmetric Red. Requires the daemon to be stable but doesn't block on it.
Eval-harness (weeks 7-9) — consumes the blueprint offline, computes true exploitability via local best-response (Lisý et al.), generates the SBIR proposal's exploitability plot.
YAML schema extension for ScenarioTruth — the Tripwire truth is hardcoded in scenario.ts for Phase 0. Generalization lands when the daemon supports multi-scenario training.
Multi-process scaling — single Node process for Phase 0. Worker thread or process-level parallelism (multiple daemons with periodic regret-table merge) is a Phase II problem.
Local best-response exploitability — proxied via delta-norm. True LBR computation is a substantial standalone module.

Open questions to resolve before code lands¶

Blueprint retention behavior on broker restart. Retained messages survive subscriber disconnects but are lost when Mosquitto restarts. Acceptable for Phase 0 (the daemon re-emits on the next blueprint cadence ~1 minute later). Reviewer can flag if a persistent blueprint store is needed.
Delta-norm sensitivity to action-set size changes. When the regret table sees new info-sets during a window, the snapshot map doesn't have rows for them — they're skipped in the delta calc. Slight underestimate but consistent.
Solver shutdown behavior. SIGINT mid-batch should publish a final status with warm=false-equivalent + a shuttingDown flag? Or just stop cleanly without notifying? Recommend the latter for simplicity; downstream consumers treat stale heartbeats as offline (already specified above).
TopStrip space. The current TopStrip has scenario name, ROE, comms, mute indicator, scenario menu. Adding a 5^th indicator is fine on standard viewports but might wrap on narrow ones. The memo's recommended slot (between ROE and comms) should work; reviewer can verify visually.

None block writing the rest of the package.

Design — services/solver-daemon/ + SolverPill (Phase 0 weeks 5-7)¶

Strand 1 — services/solver-daemon/¶