Design — @uci-demo/game fill-out (Phase 0 weeks 2-4)¶
Pre-code design memo for the domain-types package introduced in
#29. The package currently
holds the world-mirror types and the projection logic. This memo specifies
what gets added in weeks 2-4 of sbir-osw26bz02-dv004-game-theoretic-coa.md:
GameState, InformationSet, PublicBeliefState, Payoff, GameDynamics,
and the Bayesian belief update.
The point of writing this first is to fix the math and the encoding before the solver consumes them. The plan named these artifacts; this memo says what their concrete TypeScript shape and numerical behavior must be.
Game model¶
Two-player zero-sum imperfect-information game with explicit chance nodes.
| Player | Acts at | Action space (tactical, T1 scale) |
|---|---|---|
| Blue (defender) | New contact, replan-after-failure, periodic check | WITHHOLD ∪ (EffectType × EffectorId) — typically 1 + 5×2 = 11 actions |
| Red (attacker) | Spawn opportunities, per-track maneuver ticks | (spawnLocation × threatType × identityMask) ∪ (maneuverDelta) ∪ WITHDRAW — typically 3–5 actions per opportunity |
| Chance (Nature) | Sensor confidence, effect outcomes, comms-degrade events | Discrete weighted draws from named distributions |
The action space at T1 is small enough that the regret table is keyed by
(infoSetKey, actionIndex) and stored as a Float32Array. Scaling to T2
(10 effectors / 30 tracks / 1h) keeps this representation; T3 demands subgame
resolving and is out of scope for this memo.
Player and turn model¶
export type Player = "blue" | "red";
export type GamePhase =
| { kind: "decision"; player: Player } // active decision node
| { kind: "chance" }; // Nature acts
export interface GameTurn {
phase: GamePhase;
/** Simulated time at which this turn was reached. */
tSec: number;
/** Monotonically increasing decision counter; the `recentActions` cursor. */
step: number;
}
Chance nodes are explicit and resolved by sampling from the Nature distribution — they do not bypass the dynamics function. The solver unrolls chance with sample averages; the eval-harness with a fixed RNG seed.
State factoring (public vs hidden)¶
This is the most important table in this memo. Everything downstream (belief update, infoset key, payoff) reads from this split.
| Field | Visibility | Source (wire) | Lives in |
|---|---|---|---|
| Track position (lat/lon/alt) | public | PositionReportMT |
PublicState.tracks[i].position |
| Track severity | public | EntityNotificationMT.Severity |
PublicState.tracks[i].severity |
| Platform threat-type | public (noisy) | EntityMT.Identity.Platform.ThreatType |
PublicState.tracks[i].threatTypeObs |
| Threat-type confidence | public | EntityMT.Identity.Platform.Confidence |
PublicState.tracks[i].threatConfObs |
| True identity | hidden from Blue | scenario YAML, never on wire as truth | HiddenState.trueIdentity[trackId] |
| True threat-type | hidden from Blue | scenario YAML | HiddenState.trueThreatType[trackId] |
| Subsystem state band | public | SubsystemStatusMT.state (CRITICAL/LOW/NORMAL/FULL) |
PublicState.effectors[i].fuelBand |
| Exact fuel fraction | hidden from Red | internal | HiddenState.trueFuel[effectorId] |
| ROE band | public (retained) | uci-demo/world/roe |
PublicState.roe |
| Comms-degrade window | public | uci-demo/world/degrade |
PublicState.commsDegrade |
| Recent actions (last N) | public (on the wire) | EffectPlanCommandMT + EffectStatusMT |
PublicState.recentActions (ring buffer, N=8) |
export interface PublicState {
readonly roe: RoeBand;
readonly commsDegrade: CommsDegradeState;
readonly tracks: readonly PublicTrackInfo[];
readonly effectors: readonly PublicEffectorInfo[];
readonly recentActions: readonly ActionRecord[];
}
export interface HiddenState {
/** What Red knows that Blue doesn't. */
readonly trueIdentity: ReadonlyMap<string, Identity>;
readonly trueThreatType: ReadonlyMap<string, ThreatType>;
/** What Blue knows that Red doesn't. */
readonly trueFuel: ReadonlyMap<string, number>;
}
export interface GameState {
readonly turn: GameTurn;
readonly publicState: PublicState;
readonly hidden: HiddenState;
}
GameState is immutable. Every dynamics step returns a new instance.
This matches the rest of the codebase (the world-mirror returns new
TrackSnapshot objects rather than mutating in place) and keeps the
solver's apply() safe to use as a key in memoization.
Bayesian identity belief¶
Blue does not see trueIdentity. Blue maintains a posterior over the
identity enum for each track, updated as observations arrive.
Identity distribution¶
export type IdentityDistribution = Readonly<Record<Identity, number>>;
/**
* First-contact prior. Not flat: weights the air-picture context so that
* `UNKNOWN` carries the most mass before classification arrives.
*/
export const FIRST_CONTACT_PRIOR: IdentityDistribution = Object.freeze({
UNKNOWN: 0.30,
ASSUMED_FRIEND: 0.15,
FRIEND: 0.10,
NEUTRAL: 0.10,
SUSPECT: 0.20,
HOSTILE: 0.15,
});
The non-flat prior reflects the air-picture context: most aerial contacts
are not engaged before some classification arrives, so UNKNOWN carries
the largest mass at first detection. This is the prior used when no
EntityNotificationMT or EntityMT has been received yet.
Observation likelihoods¶
| Observation | Source | Likelihood matrix |
|---|---|---|
severity |
EntityNotificationMT |
P(severity \| identity) — 4 severities × 6 identities |
threatTypeObs |
EntityMT.Platform.ThreatType |
P(observedThreatType \| trueThreatType) × indicator of identity-consistency |
The severity likelihood matrix used for Phase 0 (calibrated against the Tripwire scenario corpus; refined via self-play in week 5+):
| INFO | ADV | CAUTION | WARNING | |
|---|---|---|---|---|
| UNKNOWN | 0.05 | 0.15 | 0.60 | 0.20 |
| ASSUMED_FRIEND | 0.10 | 0.70 | 0.18 | 0.02 |
| FRIEND | 0.15 | 0.80 | 0.04 | 0.01 |
| NEUTRAL | 0.10 | 0.65 | 0.20 | 0.05 |
| SUSPECT | 0.03 | 0.07 | 0.55 | 0.35 |
| HOSTILE | 0.01 | 0.02 | 0.17 | 0.80 |
Rows sum to 1. The matrix is exported as a readonly constant and the
update function is pure.
Update rule¶
Standard Bayesian update:
posterior(i) ∝ likelihood(obs | i) · prior(i)
export function updateIdentityBelief(
prior: IdentityDistribution,
obs: { severity?: Severity; threatType?: ThreatType; threatConf?: number },
): IdentityDistribution {
const unnormalized: Record<Identity, number> = {} as Record<Identity, number>;
let sum = 0;
for (const id of IDENTITIES) {
let lik = 1;
if (obs.severity) lik *= SEVERITY_LIKELIHOOD[id][obs.severity];
if (obs.threatType !== undefined) {
lik *= threatTypeLikelihood(id, obs.threatType, obs.threatConf ?? 50);
}
const post = lik * prior[id];
unnormalized[id] = post;
sum += post;
}
if (sum === 0) return prior; // pathological — degenerate likelihood, hold belief
const out = {} as Record<Identity, number>;
for (const id of IDENTITIES) out[id] = unnormalized[id] / sum;
return Object.freeze(out);
}
Worked example¶
Prior: FIRST_CONTACT_PRIOR. Observation: severity=WARNING.
HOSTILE: 0.80 · 0.15 = 0.120
SUSPECT: 0.35 · 0.20 = 0.070
UNKNOWN: 0.20 · 0.30 = 0.060
NEUTRAL: 0.05 · 0.10 = 0.005
ASSUMED_FRIEND: 0.02 · 0.15 = 0.003
FRIEND: 0.01 · 0.10 = 0.001
Σ = 0.259
Posterior (normalized): HOSTILE 0.463, SUSPECT 0.270, UNKNOWN 0.232,
NEUTRAL 0.019, ASSUMED_FRIEND 0.012, FRIEND 0.004.
Belief mass on HOSTILE | SUSPECT | UNKNOWN = 0.965. Matches the scripted
agent's "hostile-like" gate; the solver will see the same shape and won't
disagree on first-contact triage.
Why these numbers¶
The likelihood matrix is a single-page commitment, but it has three defensible properties:
- Diagonal dominance for clean cases.
P(WARNING|HOSTILE) = 0.80andP(ADVISORY|FRIEND) = 0.80— strong but not perfect channels. Reflects real sensor systems: high agreement, never certain. - Asymmetric noise toward CAUTION.
UNKNOWNcarries0.60mass onCAUTION— the "I don't know yet" channel is the dominant noise mode. - Negligible mass for impossible-looking observations.
P(WARNING|FRIEND) = 0.01— not zero (a malfunctioning sensor or a misclassified friendly can warning-flag), but small enough that one observation is decisive.
These numbers are version-tagged (BELIEF_V = 1) so future tuning doesn't
silently invalidate cached blueprints.
Payoff function¶
The plan specified the weights; this section confirms them, adds the counter-extraction contract, and works three end-game examples.
export const PAYOFF_V = 1;
export interface PayoffCounters {
/** EntityLostMT for trackIds with trueIdentity ∈ {HOSTILE, SUSPECT}. */
neutralizedHostiles: number;
/** EntityLostMT for trackIds with trueIdentity = FRIEND inside any CapabilityCoverageAreaMT polygon. */
fratricideEvents: number;
/** Proposals where the chosen effect violates the retained ROE band. */
roeViolations: number;
/** Integrated SubsystemStatusMT.state across the episode. Units: fuel-fraction-seconds, summed across all effectors. */
fuelBurnedTotal: number;
/** EffectStatusMT.state = FAILED events. */
failedEffects: number;
/** Integral over the uci-demo/world/degrade window where dropPercent > 0. */
commsDegradeSeconds: number;
/** Wall-time the copilot spent in evaluate(); capped at 5s per call. */
meanTimeToDecisionMs: number;
}
export function bluePayoff(c: PayoffCounters): number {
return (
+1.000 * c.neutralizedHostiles
-5.000 * c.fratricideEvents
-0.200 * c.roeViolations
-0.050 * c.fuelBurnedTotal
-0.300 * c.failedEffects
-0.001 * c.commsDegradeSeconds
-0.002 * (c.meanTimeToDecisionMs / 1000)
);
}
export function redPayoff(c: PayoffCounters): number {
return -bluePayoff(c); // zero-sum
}
Worked examples¶
Example A — Clean Tripwire run. 3 hostiles neutralized, 0 fratricide, 0 ROE violations, 0.4 fuel-fraction burned across two effectors, 0 failed effects, 0 comms degrade, 2400ms mean decision time.
U_B = 3.000 - 0 - 0 - 0.050·0.4 - 0 - 0 - 0.002·2.4
= 3.000 - 0.020 - 0.005
= +2.975
Example B — Fratricide. 2 hostiles neutralized, 1 fratricide event.
U_B = 2.000 - 5.000 - ...
= -3.000 (fratricide alone outweighs every gain)
This is the property the 5:1 weight ratio targets: a single fratricide is worse than walking past 4 confirmed threats. Designed.
Example C — ROE violation spike. 3 hostiles neutralized but with 6 proposals that violated GREEN ROE (the operator kept approving them).
U_B = 3.000 - 0 - 0.200·6 - ... = 3.000 - 1.200 = +1.800
Roughly halves the win. Operator override is allowed; the cost is logged and the solver learns that strategies relying on ROE violations are fragile to a stricter operator.
Edge cases¶
- Negative payoff floor. No floor. A catastrophically bad blue policy
can register
U_B < -10(multiple fratricides + ROE violations). The solver's regret normalization handles unbounded payoffs natively. - Ties. If
c.neutralizedHostiles + c.fratricideEvents == 0, payoff is the sum of small negatives (fuel + decision time) — always negative, always small. A solver that just sits there earns≈ -0.01per episode. - Counter extraction is offline.
scoreReplay.tsin the eval-harness reads the bus log + audit feed for an episode and produces aPayoffCounters. Solver-daemon's online estimate uses the same function over itsworldMirror-derived counters.
InfoSet key encoding¶
The regret table's key. Two info-sets that present identical observations
to a player must map to the same bigint key, deterministically, with
collision probability vanishing on a tactical-scale game tree.
Canonical encoding (Blue viewer)¶
| Bytes | Field | Encoding |
|---|---|---|
| 0 | Version | 0x01 (BELIEF_V × INFOSET_V) |
| 1 | Viewer | 0x00 Blue / 0x01 Red |
| 2 | ROE band | 0x00 GREEN / 0x01 AMBER / 0x02 RED |
| 3 | Comms degrade bucket | 0x00 none / 0x01 light (drop<30%) / 0x02 heavy (drop≥30%) |
| 4 | Track count | uint8 |
| 5..(5+T·5) | Per-track block, T = trackCount. Sorted by trackId lexicographically | See below |
| (5+T·5) | Effector count | uint8 |
| (5+T·5+1).. | Per-effector block | See below |
| last 16 | Recent-actions ring (last 8, 2 bytes each) | (actionTypeId<<8) \| actionParamHash |
Per-track block (5 bytes):
| Bits | Field | Encoding |
|---|---|---|
| 0..3 | Identity-belief most-likely bucket | Index into IDENTITIES (0..5) |
| 4..7 | Identity-belief confidence decile | floor(maxIdentityProb · 10) clipped to 9 |
| 8..15 | ThreatType bucket | 0=N/A, 1=JAMMER, 2=MANNED_AIRCRAFT, 3=MISSILE, 4=AAA, 5=IED, 6=UNKNOWN_THREAT, 7=OTHER |
| 16..23 | Severity bucket | INFO=0 / ADV=1 / CAUTION=2 / WARNING=3 |
| 24..31 | Range bucket | min(floor(rangeM/500), 31) (0-31 buckets of 500m each, capped at 15.5km) |
| 32..39 | Closing-rate bucket | signed: 128+floor(closingMps/2) clipped |
Per-effector block (1 byte): fuel band (CRITICAL=0 / LOW=1 / NORMAL=2 / FULL=3) in low 2 bits; remaining bits reserved.
Hash¶
const FNV_OFFSET = 0xcbf29ce484222325n;
const FNV_PRIME = 0x100000001b3n;
const MASK_64 = 0xffffffffffffffffn;
export function fnv1a64(bytes: Uint8Array): bigint {
let h = FNV_OFFSET;
for (let i = 0; i < bytes.length; i++) {
h = (h ^ BigInt(bytes[i]!)) & MASK_64;
h = (h * FNV_PRIME) & MASK_64;
}
return h;
}
export function infoSetKey(state: GameState, viewer: Player): bigint {
const bytes = encodeInfoSet(state, viewer); // returns Uint8Array per the table above
return fnv1a64(bytes);
}
Why FNV-1a-64¶
- Deterministic across V8 versions and ARM/x64: pure integer math; no floating-point intermediate.
- Cheap: one XOR + one BigInt-multiply per byte; ~30-50 bytes per call ≈ 60-100 ns.
- Acceptable distribution for our use: FNV-1a passes SMHasher's bias tests for short keys. We are not using it as a cryptographic primitive or as a randomness source; it just needs to be a uniform-enough mapping for our ~10³-10⁴ tactical info-sets.
- Collision analysis: with N info-sets, expected first collision at N≈√(2·2⁶⁴) ≈ 6·10⁹. Tactical (10³): collision probability ≈ 10⁶/2⁶⁴ ≈ 5·10⁻¹⁴ per pair, ≈ 5·10⁻⁸ across the full table. Negligible.
Bucketing target¶
The plan named this: "Bucketing holds tactical info-set count near 10³." The encoding above achieves it by:
- 6 identity buckets × 10 confidence deciles = 60 identity states per track
- 8 threat-type buckets × 4 severity buckets = 32 platform states per track
- 32 range buckets × 256 closing-rate buckets = 8192 kinematic states per track
Tactical scenario: ~5 tracks at once, but the regret table only sees distinct info-sets the solver actually reaches in self-play. Empirical upper bound from Tripwire-shaped abstractions: 1500-3000 reached info-sets per scenario. Within the 10³-10⁴ planning range.
If the count blows past 10⁴ in scaling experiments, the first bucketing
coarsening is on range (drop to 16 buckets of 1000m each) and
closing-rate (drop to 32 signed buckets). The infoset version byte
will bump to 0x02 when this happens.
GameDynamics¶
export interface GameDynamics {
/** Legal actions for the current decision player. Empty if state.turn.phase.kind === "chance" — caller handles chance via Nature. */
legalActions(state: GameState): readonly Action[];
/** Deterministic transition for decision nodes; for chance nodes the caller supplies the resolved outcome. */
apply(state: GameState, action: Action): GameState;
/** Resolve a chance node by sampling from its named distribution. */
resolveChance(state: GameState, rand: () => number): GameState;
/** Terminal predicate. True at end of scenario loop or when payoff-counter sum has converged. */
isTerminal(state: GameState): boolean;
/** Final payoff counters when isTerminal(state). */
finalCounters(state: GameState): PayoffCounters;
}
The dynamics function reads from PublicState + HiddenState and emits
a new GameState. It does not publish to the bus; that's the world-sim
service's job in the live demo. The solver-daemon and the eval-harness
call apply() directly without going through MQTT, so the same dynamics
power both online self-play and offline scoring.
For the live demo (one process, one operator), apply() is a derived
projection over the bus message stream that already flows — the world-sim
publishes, the world-mirror in copilot/eval-harness ingests, and the
GameState snapshot is built on demand from the mirror.
TypeScript module layout¶
packages/uci-game/src/
├── index.ts # re-exports
├── types.ts # existing — Identity, EffectType, AssetSnapshot, TrackSnapshot, WorldSnapshot
├── game.ts # NEW — GameState, GameTurn, GamePhase, Player, Action, ActionRecord
├── worldMirror.ts # existing — createWorldMirror, DEMO_ASSET_ROSTER, WorldMirror
├── belief.ts # NEW — IdentityDistribution, SEVERITY_LIKELIHOOD, updateIdentityBelief, UNIFORM_IDENTITY
├── payoff.ts # NEW — PayoffCounters, bluePayoff, redPayoff, PAYOFF_V
├── hash.ts # NEW — fnv1a64, encodeInfoSet, infoSetKey, INFOSET_V
├── dynamics.ts # NEW — GameDynamics interface + the tactical impl built from scenario events + worldMirror state
└── publicState.ts # NEW — derive PublicState / HiddenState from a worldMirror + scenario truth
Tests under packages/uci-game/test/:
belief.test.ts— every cell of the likelihood matrix; worked example from this memo; degenerate likelihood (sum=0) preserves prior; monotone-update property (additional WARNING observation never decreases HOSTILE mass).payoff.test.ts— each of the three worked examples; zero-sum property (bluePayoff + redPayoff = 0); monotonicity of every weight.hash.test.ts—fnv1a64reproduces a fixed test vector; the sameGameStateviewed by the same player always produces the same key; permuting track-id order does not change the key (canonical sort); changing one bucketed field changes the key.dynamics.test.ts—apply(legalActions(s)[0], s)advances the turn counter;isTerminalis stable (idempotent);resolveChancewith a fixed seed produces a deterministic next state.
Tests use only synthetic fixtures — no MQTT, no codec. The package remains a pure-domain workspace member.
Acceptance criteria for the fill-out PR(s)¶
- All new modules typecheck.
pnpm --filter @uci-demo/game testpasses with the suites above.BELIEF_V = 1,PAYOFF_V = 1,INFOSET_V = 1constants are exported and present in the test fixtures.- The package's public API (
packages/uci-game/src/index.ts) exports exactly the types and functions referenced from this memo — no surprises for the upcoming@uci-demo/solverconsumer. pnpm upsmoke unaffected. The copilot still consumes onlycreateWorldMirrorfrom this package; the new game/dynamics/belief surface is consumed by the solver-daemon (next workstream), not by the copilot in week 2-4.
What this memo deliberately does not specify¶
- Strategy bank softmax weighting. Lives in
@uci-demo/solver(week 3-6), not in@uci-demo/game. The game package owns the substrate; the solver owns the search policy on top. - Subroutine implementations. Same —
packages/uci-solver/src/subroutines/is solver-side code that consumes aWorldSnapshot+IdentityDistributionand emits action preferences. Out of scope here. - Action-space pruning heuristics. Performance work that lives with
the solver. The game package's
legalActionsreturns all legal actions; the solver prunes if it wants to. - Operational-scale (T3) representation. This memo is for tactical T1 only. T3 (30 effectors / 100 tracks / 6h) needs subgame-resolving and is addressed in a separate memo when week 8-10 scaling work lands.
Open questions to resolve before code lands¶
- Closing-rate sign convention. The scenario YAML uses
closingMps < 0for closing-toward-FOB. The infoset key encoding above assumes the same convention. Confirm during code review. - Threat-type likelihood matrix. The severity matrix is specified
above; the threat-type matrix is not. Best path: derive from
services/world-sim/src/sim.ts:621-680— what threatType the world-sim actually emits per identity. One worked-out matrix per scenario family (Tripwire, Vanguard, Stillwater) probably suffices. - Chance-node enumeration. Sensor confidence is currently sampled
from a per-spawn YAML value. For solver use, we need a small discrete
distribution (e.g.
{0.3, 0.5, 0.7, 0.9}weighted). The exact weights are calibration parameters; first cut is uniform-over-buckets and refine via empirical fit. - Action canonicalization for the recent-actions ring. The 2-byte
per-action encoding above is shorthand for
(actionTypeId, paramHash). The paramHash needs to be defined — probably a small FNV-1a-16 over the(effect, effector)tuple for Blue and(trackId, threatType)for Red. Confirm during code review.
These are knowable; they just need a half-hour of empirical fitting and one round of code review. None block writing the rest of the package.