Skip to content

Design — @uci-demo/game fill-out (Phase 0 weeks 2-4)

Pre-code design memo for the domain-types package introduced in #29. The package currently holds the world-mirror types and the projection logic. This memo specifies what gets added in weeks 2-4 of sbir-osw26bz02-dv004-game-theoretic-coa.md: GameState, InformationSet, PublicBeliefState, Payoff, GameDynamics, and the Bayesian belief update.

The point of writing this first is to fix the math and the encoding before the solver consumes them. The plan named these artifacts; this memo says what their concrete TypeScript shape and numerical behavior must be.


Game model

Two-player zero-sum imperfect-information game with explicit chance nodes.

Player Acts at Action space (tactical, T1 scale)
Blue (defender) New contact, replan-after-failure, periodic check WITHHOLD(EffectType × EffectorId) — typically 1 + 5×2 = 11 actions
Red (attacker) Spawn opportunities, per-track maneuver ticks (spawnLocation × threatType × identityMask)(maneuverDelta)WITHDRAW — typically 3–5 actions per opportunity
Chance (Nature) Sensor confidence, effect outcomes, comms-degrade events Discrete weighted draws from named distributions

The action space at T1 is small enough that the regret table is keyed by (infoSetKey, actionIndex) and stored as a Float32Array. Scaling to T2 (10 effectors / 30 tracks / 1h) keeps this representation; T3 demands subgame resolving and is out of scope for this memo.

Player and turn model

export type Player = "blue" | "red";

export type GamePhase =
  | { kind: "decision"; player: Player }   // active decision node
  | { kind: "chance" };                    // Nature acts

export interface GameTurn {
  phase: GamePhase;
  /** Simulated time at which this turn was reached. */
  tSec: number;
  /** Monotonically increasing decision counter; the `recentActions` cursor. */
  step: number;
}

Chance nodes are explicit and resolved by sampling from the Nature distribution — they do not bypass the dynamics function. The solver unrolls chance with sample averages; the eval-harness with a fixed RNG seed.


State factoring (public vs hidden)

This is the most important table in this memo. Everything downstream (belief update, infoset key, payoff) reads from this split.

Field Visibility Source (wire) Lives in
Track position (lat/lon/alt) public PositionReportMT PublicState.tracks[i].position
Track severity public EntityNotificationMT.Severity PublicState.tracks[i].severity
Platform threat-type public (noisy) EntityMT.Identity.Platform.ThreatType PublicState.tracks[i].threatTypeObs
Threat-type confidence public EntityMT.Identity.Platform.Confidence PublicState.tracks[i].threatConfObs
True identity hidden from Blue scenario YAML, never on wire as truth HiddenState.trueIdentity[trackId]
True threat-type hidden from Blue scenario YAML HiddenState.trueThreatType[trackId]
Subsystem state band public SubsystemStatusMT.state (CRITICAL/LOW/NORMAL/FULL) PublicState.effectors[i].fuelBand
Exact fuel fraction hidden from Red internal HiddenState.trueFuel[effectorId]
ROE band public (retained) uci-demo/world/roe PublicState.roe
Comms-degrade window public uci-demo/world/degrade PublicState.commsDegrade
Recent actions (last N) public (on the wire) EffectPlanCommandMT + EffectStatusMT PublicState.recentActions (ring buffer, N=8)
export interface PublicState {
  readonly roe: RoeBand;
  readonly commsDegrade: CommsDegradeState;
  readonly tracks: readonly PublicTrackInfo[];
  readonly effectors: readonly PublicEffectorInfo[];
  readonly recentActions: readonly ActionRecord[];
}

export interface HiddenState {
  /** What Red knows that Blue doesn't. */
  readonly trueIdentity: ReadonlyMap<string, Identity>;
  readonly trueThreatType: ReadonlyMap<string, ThreatType>;
  /** What Blue knows that Red doesn't. */
  readonly trueFuel: ReadonlyMap<string, number>;
}

export interface GameState {
  readonly turn: GameTurn;
  readonly publicState: PublicState;
  readonly hidden: HiddenState;
}

GameState is immutable. Every dynamics step returns a new instance. This matches the rest of the codebase (the world-mirror returns new TrackSnapshot objects rather than mutating in place) and keeps the solver's apply() safe to use as a key in memoization.


Bayesian identity belief

Blue does not see trueIdentity. Blue maintains a posterior over the identity enum for each track, updated as observations arrive.

Identity distribution

export type IdentityDistribution = Readonly<Record<Identity, number>>;

/**
 * First-contact prior. Not flat: weights the air-picture context so that
 * `UNKNOWN` carries the most mass before classification arrives.
 */
export const FIRST_CONTACT_PRIOR: IdentityDistribution = Object.freeze({
  UNKNOWN: 0.30,
  ASSUMED_FRIEND: 0.15,
  FRIEND: 0.10,
  NEUTRAL: 0.10,
  SUSPECT: 0.20,
  HOSTILE: 0.15,
});

The non-flat prior reflects the air-picture context: most aerial contacts are not engaged before some classification arrives, so UNKNOWN carries the largest mass at first detection. This is the prior used when no EntityNotificationMT or EntityMT has been received yet.

Observation likelihoods

Observation Source Likelihood matrix
severity EntityNotificationMT P(severity \| identity) — 4 severities × 6 identities
threatTypeObs EntityMT.Platform.ThreatType P(observedThreatType \| trueThreatType) × indicator of identity-consistency

The severity likelihood matrix used for Phase 0 (calibrated against the Tripwire scenario corpus; refined via self-play in week 5+):

INFO ADV CAUTION WARNING
UNKNOWN 0.05 0.15 0.60 0.20
ASSUMED_FRIEND 0.10 0.70 0.18 0.02
FRIEND 0.15 0.80 0.04 0.01
NEUTRAL 0.10 0.65 0.20 0.05
SUSPECT 0.03 0.07 0.55 0.35
HOSTILE 0.01 0.02 0.17 0.80

Rows sum to 1. The matrix is exported as a readonly constant and the update function is pure.

Update rule

Standard Bayesian update:

posterior(i) ∝ likelihood(obs | i) · prior(i)
export function updateIdentityBelief(
  prior: IdentityDistribution,
  obs: { severity?: Severity; threatType?: ThreatType; threatConf?: number },
): IdentityDistribution {
  const unnormalized: Record<Identity, number> = {} as Record<Identity, number>;
  let sum = 0;
  for (const id of IDENTITIES) {
    let lik = 1;
    if (obs.severity) lik *= SEVERITY_LIKELIHOOD[id][obs.severity];
    if (obs.threatType !== undefined) {
      lik *= threatTypeLikelihood(id, obs.threatType, obs.threatConf ?? 50);
    }
    const post = lik * prior[id];
    unnormalized[id] = post;
    sum += post;
  }
  if (sum === 0) return prior;          // pathological — degenerate likelihood, hold belief
  const out = {} as Record<Identity, number>;
  for (const id of IDENTITIES) out[id] = unnormalized[id] / sum;
  return Object.freeze(out);
}

Worked example

Prior: FIRST_CONTACT_PRIOR. Observation: severity=WARNING.

HOSTILE:        0.80 · 0.15 = 0.120
SUSPECT:        0.35 · 0.20 = 0.070
UNKNOWN:        0.20 · 0.30 = 0.060
NEUTRAL:        0.05 · 0.10 = 0.005
ASSUMED_FRIEND: 0.02 · 0.15 = 0.003
FRIEND:         0.01 · 0.10 = 0.001
Σ = 0.259

Posterior (normalized): HOSTILE 0.463, SUSPECT 0.270, UNKNOWN 0.232, NEUTRAL 0.019, ASSUMED_FRIEND 0.012, FRIEND 0.004.

Belief mass on HOSTILE | SUSPECT | UNKNOWN = 0.965. Matches the scripted agent's "hostile-like" gate; the solver will see the same shape and won't disagree on first-contact triage.

Why these numbers

The likelihood matrix is a single-page commitment, but it has three defensible properties:

  1. Diagonal dominance for clean cases. P(WARNING|HOSTILE) = 0.80 and P(ADVISORY|FRIEND) = 0.80 — strong but not perfect channels. Reflects real sensor systems: high agreement, never certain.
  2. Asymmetric noise toward CAUTION. UNKNOWN carries 0.60 mass on CAUTION — the "I don't know yet" channel is the dominant noise mode.
  3. Negligible mass for impossible-looking observations. P(WARNING|FRIEND) = 0.01 — not zero (a malfunctioning sensor or a misclassified friendly can warning-flag), but small enough that one observation is decisive.

These numbers are version-tagged (BELIEF_V = 1) so future tuning doesn't silently invalidate cached blueprints.


Payoff function

The plan specified the weights; this section confirms them, adds the counter-extraction contract, and works three end-game examples.

export const PAYOFF_V = 1;

export interface PayoffCounters {
  /** EntityLostMT for trackIds with trueIdentity ∈ {HOSTILE, SUSPECT}. */
  neutralizedHostiles: number;
  /** EntityLostMT for trackIds with trueIdentity = FRIEND inside any CapabilityCoverageAreaMT polygon. */
  fratricideEvents: number;
  /** Proposals where the chosen effect violates the retained ROE band. */
  roeViolations: number;
  /** Integrated SubsystemStatusMT.state across the episode. Units: fuel-fraction-seconds, summed across all effectors. */
  fuelBurnedTotal: number;
  /** EffectStatusMT.state = FAILED events. */
  failedEffects: number;
  /** Integral over the uci-demo/world/degrade window where dropPercent > 0. */
  commsDegradeSeconds: number;
  /** Wall-time the copilot spent in evaluate(); capped at 5s per call. */
  meanTimeToDecisionMs: number;
}

export function bluePayoff(c: PayoffCounters): number {
  return (
    +1.000 * c.neutralizedHostiles
    -5.000 * c.fratricideEvents
    -0.200 * c.roeViolations
    -0.050 * c.fuelBurnedTotal
    -0.300 * c.failedEffects
    -0.001 * c.commsDegradeSeconds
    -0.002 * (c.meanTimeToDecisionMs / 1000)
  );
}

export function redPayoff(c: PayoffCounters): number {
  return -bluePayoff(c);  // zero-sum
}

Worked examples

Example A — Clean Tripwire run. 3 hostiles neutralized, 0 fratricide, 0 ROE violations, 0.4 fuel-fraction burned across two effectors, 0 failed effects, 0 comms degrade, 2400ms mean decision time.

U_B = 3.000 - 0 - 0 - 0.050·0.4 - 0 - 0 - 0.002·2.4
    = 3.000 - 0.020 - 0.005
    = +2.975

Example B — Fratricide. 2 hostiles neutralized, 1 fratricide event.

U_B = 2.000 - 5.000 - ...
    = -3.000  (fratricide alone outweighs every gain)

This is the property the 5:1 weight ratio targets: a single fratricide is worse than walking past 4 confirmed threats. Designed.

Example C — ROE violation spike. 3 hostiles neutralized but with 6 proposals that violated GREEN ROE (the operator kept approving them).

U_B = 3.000 - 0 - 0.200·6 - ... = 3.000 - 1.200 = +1.800

Roughly halves the win. Operator override is allowed; the cost is logged and the solver learns that strategies relying on ROE violations are fragile to a stricter operator.

Edge cases

  • Negative payoff floor. No floor. A catastrophically bad blue policy can register U_B < -10 (multiple fratricides + ROE violations). The solver's regret normalization handles unbounded payoffs natively.
  • Ties. If c.neutralizedHostiles + c.fratricideEvents == 0, payoff is the sum of small negatives (fuel + decision time) — always negative, always small. A solver that just sits there earns ≈ -0.01 per episode.
  • Counter extraction is offline. scoreReplay.ts in the eval-harness reads the bus log + audit feed for an episode and produces a PayoffCounters. Solver-daemon's online estimate uses the same function over its worldMirror-derived counters.

InfoSet key encoding

The regret table's key. Two info-sets that present identical observations to a player must map to the same bigint key, deterministically, with collision probability vanishing on a tactical-scale game tree.

Canonical encoding (Blue viewer)

Bytes Field Encoding
0 Version 0x01 (BELIEF_V × INFOSET_V)
1 Viewer 0x00 Blue / 0x01 Red
2 ROE band 0x00 GREEN / 0x01 AMBER / 0x02 RED
3 Comms degrade bucket 0x00 none / 0x01 light (drop<30%) / 0x02 heavy (drop≥30%)
4 Track count uint8
5..(5+T·5) Per-track block, T = trackCount. Sorted by trackId lexicographically See below
(5+T·5) Effector count uint8
(5+T·5+1).. Per-effector block See below
last 16 Recent-actions ring (last 8, 2 bytes each) (actionTypeId<<8) \| actionParamHash

Per-track block (5 bytes):

Bits Field Encoding
0..3 Identity-belief most-likely bucket Index into IDENTITIES (0..5)
4..7 Identity-belief confidence decile floor(maxIdentityProb · 10) clipped to 9
8..15 ThreatType bucket 0=N/A, 1=JAMMER, 2=MANNED_AIRCRAFT, 3=MISSILE, 4=AAA, 5=IED, 6=UNKNOWN_THREAT, 7=OTHER
16..23 Severity bucket INFO=0 / ADV=1 / CAUTION=2 / WARNING=3
24..31 Range bucket min(floor(rangeM/500), 31) (0-31 buckets of 500m each, capped at 15.5km)
32..39 Closing-rate bucket signed: 128+floor(closingMps/2) clipped

Per-effector block (1 byte): fuel band (CRITICAL=0 / LOW=1 / NORMAL=2 / FULL=3) in low 2 bits; remaining bits reserved.

Hash

const FNV_OFFSET = 0xcbf29ce484222325n;
const FNV_PRIME  = 0x100000001b3n;
const MASK_64    = 0xffffffffffffffffn;

export function fnv1a64(bytes: Uint8Array): bigint {
  let h = FNV_OFFSET;
  for (let i = 0; i < bytes.length; i++) {
    h = (h ^ BigInt(bytes[i]!)) & MASK_64;
    h = (h * FNV_PRIME) & MASK_64;
  }
  return h;
}

export function infoSetKey(state: GameState, viewer: Player): bigint {
  const bytes = encodeInfoSet(state, viewer);  // returns Uint8Array per the table above
  return fnv1a64(bytes);
}

Why FNV-1a-64

  • Deterministic across V8 versions and ARM/x64: pure integer math; no floating-point intermediate.
  • Cheap: one XOR + one BigInt-multiply per byte; ~30-50 bytes per call ≈ 60-100 ns.
  • Acceptable distribution for our use: FNV-1a passes SMHasher's bias tests for short keys. We are not using it as a cryptographic primitive or as a randomness source; it just needs to be a uniform-enough mapping for our ~10³-10⁴ tactical info-sets.
  • Collision analysis: with N info-sets, expected first collision at N≈√(2·2⁶⁴) ≈ 6·10⁹. Tactical (10³): collision probability ≈ 10⁶/2⁶⁴ ≈ 5·10⁻¹⁴ per pair, ≈ 5·10⁻⁸ across the full table. Negligible.

Bucketing target

The plan named this: "Bucketing holds tactical info-set count near 10³." The encoding above achieves it by:

  • 6 identity buckets × 10 confidence deciles = 60 identity states per track
  • 8 threat-type buckets × 4 severity buckets = 32 platform states per track
  • 32 range buckets × 256 closing-rate buckets = 8192 kinematic states per track

Tactical scenario: ~5 tracks at once, but the regret table only sees distinct info-sets the solver actually reaches in self-play. Empirical upper bound from Tripwire-shaped abstractions: 1500-3000 reached info-sets per scenario. Within the 10³-10⁴ planning range.

If the count blows past 10⁴ in scaling experiments, the first bucketing coarsening is on range (drop to 16 buckets of 1000m each) and closing-rate (drop to 32 signed buckets). The infoset version byte will bump to 0x02 when this happens.


GameDynamics

export interface GameDynamics {
  /** Legal actions for the current decision player. Empty if state.turn.phase.kind === "chance" — caller handles chance via Nature. */
  legalActions(state: GameState): readonly Action[];
  /** Deterministic transition for decision nodes; for chance nodes the caller supplies the resolved outcome. */
  apply(state: GameState, action: Action): GameState;
  /** Resolve a chance node by sampling from its named distribution. */
  resolveChance(state: GameState, rand: () => number): GameState;
  /** Terminal predicate. True at end of scenario loop or when payoff-counter sum has converged. */
  isTerminal(state: GameState): boolean;
  /** Final payoff counters when isTerminal(state). */
  finalCounters(state: GameState): PayoffCounters;
}

The dynamics function reads from PublicState + HiddenState and emits a new GameState. It does not publish to the bus; that's the world-sim service's job in the live demo. The solver-daemon and the eval-harness call apply() directly without going through MQTT, so the same dynamics power both online self-play and offline scoring.

For the live demo (one process, one operator), apply() is a derived projection over the bus message stream that already flows — the world-sim publishes, the world-mirror in copilot/eval-harness ingests, and the GameState snapshot is built on demand from the mirror.


TypeScript module layout

packages/uci-game/src/
├── index.ts              # re-exports
├── types.ts              # existing — Identity, EffectType, AssetSnapshot, TrackSnapshot, WorldSnapshot
├── game.ts               # NEW — GameState, GameTurn, GamePhase, Player, Action, ActionRecord
├── worldMirror.ts        # existing — createWorldMirror, DEMO_ASSET_ROSTER, WorldMirror
├── belief.ts             # NEW — IdentityDistribution, SEVERITY_LIKELIHOOD, updateIdentityBelief, UNIFORM_IDENTITY
├── payoff.ts             # NEW — PayoffCounters, bluePayoff, redPayoff, PAYOFF_V
├── hash.ts               # NEW — fnv1a64, encodeInfoSet, infoSetKey, INFOSET_V
├── dynamics.ts           # NEW — GameDynamics interface + the tactical impl built from scenario events + worldMirror state
└── publicState.ts        # NEW — derive PublicState / HiddenState from a worldMirror + scenario truth

Tests under packages/uci-game/test/:

  • belief.test.ts — every cell of the likelihood matrix; worked example from this memo; degenerate likelihood (sum=0) preserves prior; monotone-update property (additional WARNING observation never decreases HOSTILE mass).
  • payoff.test.ts — each of the three worked examples; zero-sum property (bluePayoff + redPayoff = 0); monotonicity of every weight.
  • hash.test.tsfnv1a64 reproduces a fixed test vector; the same GameState viewed by the same player always produces the same key; permuting track-id order does not change the key (canonical sort); changing one bucketed field changes the key.
  • dynamics.test.tsapply(legalActions(s)[0], s) advances the turn counter; isTerminal is stable (idempotent); resolveChance with a fixed seed produces a deterministic next state.

Tests use only synthetic fixtures — no MQTT, no codec. The package remains a pure-domain workspace member.


Acceptance criteria for the fill-out PR(s)

  1. All new modules typecheck.
  2. pnpm --filter @uci-demo/game test passes with the suites above.
  3. BELIEF_V = 1, PAYOFF_V = 1, INFOSET_V = 1 constants are exported and present in the test fixtures.
  4. The package's public API (packages/uci-game/src/index.ts) exports exactly the types and functions referenced from this memo — no surprises for the upcoming @uci-demo/solver consumer.
  5. pnpm up smoke unaffected. The copilot still consumes only createWorldMirror from this package; the new game/dynamics/belief surface is consumed by the solver-daemon (next workstream), not by the copilot in week 2-4.

What this memo deliberately does not specify

  • Strategy bank softmax weighting. Lives in @uci-demo/solver (week 3-6), not in @uci-demo/game. The game package owns the substrate; the solver owns the search policy on top.
  • Subroutine implementations. Same — packages/uci-solver/src/subroutines/ is solver-side code that consumes a WorldSnapshot + IdentityDistribution and emits action preferences. Out of scope here.
  • Action-space pruning heuristics. Performance work that lives with the solver. The game package's legalActions returns all legal actions; the solver prunes if it wants to.
  • Operational-scale (T3) representation. This memo is for tactical T1 only. T3 (30 effectors / 100 tracks / 6h) needs subgame-resolving and is addressed in a separate memo when week 8-10 scaling work lands.

Open questions to resolve before code lands

  1. Closing-rate sign convention. The scenario YAML uses closingMps < 0 for closing-toward-FOB. The infoset key encoding above assumes the same convention. Confirm during code review.
  2. Threat-type likelihood matrix. The severity matrix is specified above; the threat-type matrix is not. Best path: derive from services/world-sim/src/sim.ts:621-680 — what threatType the world-sim actually emits per identity. One worked-out matrix per scenario family (Tripwire, Vanguard, Stillwater) probably suffices.
  3. Chance-node enumeration. Sensor confidence is currently sampled from a per-spawn YAML value. For solver use, we need a small discrete distribution (e.g. {0.3, 0.5, 0.7, 0.9} weighted). The exact weights are calibration parameters; first cut is uniform-over-buckets and refine via empirical fit.
  4. Action canonicalization for the recent-actions ring. The 2-byte per-action encoding above is shorthand for (actionTypeId, paramHash). The paramHash needs to be defined — probably a small FNV-1a-16 over the (effect, effector) tuple for Blue and (trackId, threatType) for Red. Confirm during code review.

These are knowable; they just need a half-hour of empirical fitting and one round of code review. None block writing the rest of the package.