Skip to content

Plan — Phase II TDP-Ready Application (post-award 24-month execution)

Companion to sbir-osw26bz02-dv004-game-theoretic-coa.md. That plan covers the pre-proposal 12-week evidence sprint. This plan covers the 24-month Phase II execution that turns the demo + Phase 0 solver into a containerized, TDP-ready, government-deployable application.

Context

The companion plan delivers an MCCFR solver, a Red agent, an eval harness, and a white paper / plot pair / SME-tournament evidence package — enough to make the D2P2 proposal responsive. After award, the SBIR contract obligates the team to:

  1. Integrate the AI engine with a government-designated M&S environment (CPE or AFSIM).
  2. Conduct V&V events in government-provided scenarios of increasing complexity, including multi-domain swarm and joint all-domain operations.
  3. Demonstrate robustness under degraded comms, sensor uncertainty, incomplete information, and novel adversary tactics not present in training.
  4. Deliver a containerized software prototype + Technical Data Package (TDP) "sufficient for government use in wargaming, analysis, and COA development."

The current repo is intentionally a stateless demo. CLAUDE.md is explicit: "the demo is intentionally stateless across runs — the broker is the world; don't add database, persistence, or auth layers." That invariant is correct for the demo and wrong for the TDP. Phase II is where it gets explicitly replaced. The 21 UCI MTs, the validator, the cop-ui, the comms-degrade rig, and the scripted/Claude/Solver agent triplet all carry forward; everything below them (transport, storage, identity, deployment, observability) is greenfield.

Verified gaps vs. TDP target (read on 2026-05-19): - docker-compose.yml runs only Mosquitto. No auth, no DB, no object store, no metrics, no log aggregation. - ops/ contains only the Mosquitto config. - No deploy/, no Helm chart, no Dockerfile per service. - No auth library in any package.json (no JWT/OIDC/Keycloak/Passport). - No persistence library in any package.json (no Postgres/SQLite/Prisma). - Validator audit, copilot reasoning, world-sim engagement state, replay buffer are all in-memory and tab/process-local. - Claude is reached over the public Anthropic API — incompatible with most government environments by default.

The intended outcome of Phase II: a single command-line installation at a government wargaming or planning site brings up the full stack, three named operators with role-separated access plan against a CPE-authored scenario, an immutable audit log captures every decision, the solver operates in any of three doctrinal modes (Blue aide / Red trainer / impartial adjudicator), and an end-of-campaign V&V report is generated automatically and signed for delivery. None of that exists today.


OMS Standard alignment

Open Mission Systems (OMS) v2.5 (released 2026-01-22, governed by the OACWG) is the architectural superset that wraps UCI. UCI defines C2 messages; OMS defines the Mission Package those messages live inside — Platform + Subsystems + Services + Isolators + Data Exchanges + tiered compliance + standardized document templates. OMS explicitly builds on UCI. Where UCI v2.5 governs the XSD-validated payloads on our uci/v2_5/# topic tree, OMS v2.5 governs the system those payloads are exchanged in, and that is the language a government acquisition reviewer speaks. Phase II adopts OMS as the framing standard for the TDP deliverable, with explicit Tier targets and OMS document templates replacing ad-hoc TDP doc structures.

Architecture element mapping (already a free lunch)

The existing repo, almost by accident, maps cleanly onto the OMS Reference Architecture. The naming is the cheapest alignment Phase II will buy:

OMS element Definition (OMSC-STD-001 §3.2, §6) uci-demo realization
Mission Package Composition of Platform + Subsystems + Services + Isolators delivering a mission capability The full pnpm up stack + Phase II auth/persistence/audit/admin additions, packaged via deploy/compose/ and deploy/helm/
Platform Infrastructure (HW + SW) hosting the package: Abstract Service Bus (ASB), Critical Abstraction Layer (CAL), Open Computing Environment (OCE), data storage, Other Mission Processing Mosquitto (ASB), packages/uci-bus + packages/uci-codec (CAL), Chainguard FIPS container set on chainguard-base-fips + compose/k8s host (OCE), postgres-fips + minio-fips (data storage)
Service Software-only UoR with unique capability, lifecycle states, service contract services/copilot, services/world-sim, services/validator, services/solver-daemon, services/red-agent, services/campaign-runner, services/audit, services/auth, services/db, services/blob-store
Subsystem Payload/sensor UoR with dedicated HW + SW N/A in pure-software demo. World-sim simulates subsystems (radar, EO/IR, ESM, effectors); a real deployment would wire actual subsystems via the same UCI MT contract
Isolator Isolation layer between Mission Package and an external system services/adsb-bridge (already), services/cpe-bridge (M6-9), services/afsim-bridge (Year 2). Already the pattern — naming it "Isolator" in the SDD is a one-line change
OMS Messaging Pub/sub message exchange across ASB Our uci/v2_5/# (UCI payload over OMS-shape ASB) + uci-demo/... side channels
Data Transfer Bulk / streaming product exchange via URI (sockets / RDMA / streaming) MinIO blob references for blueprint snapshots, eval timelines, V&V report PDFs (already in §2). Phase II: URI-conformant references per CAL spec §6.3.2.4
Security Information Exchanges Standardized security envelope on every message Existing <SecurityInformation> element in every UCI envelope (packages/uci-codec/src/envelope.ts). Already compliant in shape

Tier target (the contract we sign in the proposal)

OMS Tier 1 / 2 / 3 is the orderly adoption ladder (OMSC-STD-001 §0). Phase II commits to a demonstrable Tier 2 by M24, with Tier 1 conformance proven at the M12 V&V event and Tier 3 as a Year-3 / Phase II Option stretch:

Tier Phase II milestone What it means concretely
Tier 1 M12 Year-1 V&V acceptance Tier 1 Required Subsystem Message Set implemented on every Service (state machine, status reporting, version reporting). Tier 1 Minimum Message Set on the ASB. Legacy-adoption-friendly: this is the rung most acquisition programs land on first.
Tier 2 M24 Year-2 V&V acceptance Adds Tier 2 Minimum Message Set + full Service Contracts per services/* + MPDD + PDD + per-service SDD + Checklists passed. The TDP delivers as a Tier 2 OMS Mission Package.
Tier 3 Phase III / Phase II Option Full OMS message set conformance + CAL CERT verification (OMSC-SPC-001 §6) + language-bindings parity for the CAL API. Not in baseline budget.

The proposal text claims Tier 2 by Phase II close, identifies Tier 3 as a Phase III or exercised-Option scope, and avoids any claim that the current demo is Tier-compliant (it isn't — the OMS Service state machine and Tier 1 required messages are net-new work).

OMS spec documents that govern Phase II work

These live under docs/oms/ as vendored references (parallel to schema/UCI_v2_5/), tracked by SHA-256 in a new docs/oms/PROVENANCE.md. Do not modify; they are the contract.

  • OMSC-STD-001 RevM — OMS Standard. Governs Service / Subsystem / Isolator / Platform behavior, state machines, Tier message sets, ASB semantics.
  • OMSC-STD-002 RevM — Abbreviations & Glossary. Source of truth for terminology in TDP docs.
  • OMSC-SPC-001 RevL — CAL Specification (the canonical API surface). Governs packages/uci-bus if we claim CAL conformance.
  • OMSC-SPC-013 RevB — Language-Agnostic CAL Specification. Governs the cross-language CAL behavior our TS implementation must match in spirit.
  • OMSC-SPC-005 RevJ — OS Façade Specification. Governs Service ↔ OS interaction; affects FIPS/STIG container choices in §10.
  • OMSC-GDE-003 RevD — Cybersecurity Guide. Maps directly onto the §1 (auth) + §3 (audit) work; the TDP Security Model cross-references it line-by-line alongside CMMC L1.

How OMS reshapes Phase II deliverables

Three concrete things change versus the pre-OMS plan:

  1. TDP doc structure adopts OMS templates verbatim (MPW / MPDD / PDD / SDD / Service Contracts + the five Checklists). See revised TDP composition section below. This means government reviewers grade against forms they recognize, not bespoke architecture docs.
  2. A new packages/uci-cal-bindings/ package publishes a CAL-conformant veneer over packages/uci-bus so any OMS-CAL-targeting integrator can consume our ASB without writing to Mosquitto directly. See new subsystem §12.
  3. services/cal-conformance/ runs the OMS CAL CERT-style verification harness as a CI gate. Phase II Year 1 ships Tier 1 conformance; Year 2 ships Tier 2 conformance; both are part of the M12 / M24 acceptance criteria.

The OACWG governance plan accepts Change Requests for spec gaps. Phase II reserves a small budget for one or two CRs if the solver or game-theoretic engine surfaces a genuine OMS gap (e.g., a mission-package-level "campaign" abstraction OMS doesn't currently model).


Government environment constraints (the hard ones)

These shape every subsystem below. Surface them in the proposal so reviewers see the design already accounts for them.

Constraint Source Design implication
CMMC Level 1 RFP topic header 17 basic safeguarding controls (FAR 52.204-21). Minimal but real: covers physical access, identification/authentication, system/comms protection. Drives the auth subsystem, audit logging, and the "no third-party telemetry" rule.
ITAR-controlled tech data RFP ITAR clause Source, builds, and the TDP are USML Cat XI(d) software. No public cloud builds, no foreign-national maintainers without licensing, no public GitHub Actions runners for ITAR artifacts (private/self-hosted only).
Air-gap default DoD wargaming center reality Operates without internet. Public Anthropic API not reachable from air-gapped sites. Solution: the model-agnostic LLM client (see §11 below) selects a local backend (Ollama / vLLM / llama.cpp) for air-gapped prod, Bedrock-on-GovCloud Claude where the deployment is cleared GovCloud, and the Anthropic API where reachable. Claude stays first-class wherever the deployment can reach it; local models cover the rest. The same LanguageModelClient interface drives all paths.
No vendor telemetry DoD policy No phone-home, no analytics, no error reporting to vendor servers. All telemetry is local and operator-readable.
CAC / PIV auth Operator reality OIDC with x509 mutual-TLS option; Keycloak supports both. SAML for downstream integration with site SSO.
Reproducible builds V&V requirement Every Phase II V&V run must be re-runnable bit-for-bit. Forces deterministic build pipeline (locked toolchain, locked deps, locked image digests) and seeded simulation.
Hardened, minimal, signed base images Government accreditation pathway Use Chainguard FIPS images (cgr.dev/chainguard/*-fips) for every container. Distroless, daily-rebuilt, cosign-signed, with build-time CycloneDX SBOMs. Single base: chainguard-base-fips. Application bases: node-fips, keycloak-fips, postgres-fips, minio-fips, prometheus-fips, grafana-fips, loki-fips, tempo-fips, opentelemetry-collector-fips, ollama-fips, cosign-fips. STIG-equivalent hardening is intrinsic; STIG mapping is documented per image in the PDD rather than inherited from a RHEL benchmark.
FIPS 140-⅔ cryptography Government default Solved at the image layer by Chainguard's glibc-openssl-fips substrate: FIPS OpenSSL is preconfigured at image build time, applicable CMVP certificate listed on Chainguard's FIPS commitment page. node-fips returns crypto.getFips() === 1 out of the box — no Node compile work, no --openssl-legacy-provider toggling, no Boring/AWS-LC fork. Keycloak / Postgres / MinIO *-fips variants ship their own FIPS-validated stacks. Userspace FIPS is kernel-agnostic — Chainguard FIPS containers run on any host kernel, so the air-gapped target VM does not have to be RHEL or any other specific distro.
OMS Tier 2 compliance Acquisition expectation for non-legacy systems Forces OMS Service state machine on every Service (OMSC-STD-001 §6.1.1), Tier 2 Minimum Message Set on the ASB, MPDD + PDD + SDD + Service Contract documents per the OMS templates, all five Checklists passed. Drives the new packages/uci-cal-bindings/ and services/cal-conformance/ work.

Three operating modes (RFP-required)

Phase III text in the RFP names three modes the engine must support. These are surfaced as explicit operator-selectable modes in the cop-ui top strip and in CLI flags; each is implementable as a different copilot agent-selection branch + a different UX skin on top of the existing approval card.

Mode Selected agent UX Audit-log marker
Blue planning aide SolverAgent (with optional Claude narration) Approval card on every proposal; operator APPROVE/DENY/MODIFY; reasoning rail streams subroutine decomposition. This is what the demo already shows. mode=PLANNING_AIDE
Red trainer SolverAgent running on Red side; operator drives Blue manually Operator's COA proposals are evaluated against the solver-driven Red; live exploitability score and per-decision counter-strategy shown. Inverts the approval card: operator authors, solver counters. mode=RED_TRAINER
Impartial adjudicator Both Blue and Red are operator-driven; solver scores both No approval gate. Both teams' COAs are logged; solver computes Nash equilibrium baseline and a Δ-utility score per decision against optimal. Used for after-action review. mode=ADJUDICATOR

Mode is set at launch (MODE=PLANNING_AIDE|RED_TRAINER|ADJUDICATOR) and surfaced in retained MQTT topic uci-demo/system/mode so all services pick it up.


Major new subsystems

Each subsystem is a new workspace member or new directory; ordered roughly by Phase II month when work begins.

1. Identity & access (packages/uci-auth/ + services/auth/)

Why: CMMC L1 minimal auth, multi-operator separation, audit-log subject IDs, CAC/PIV support.

Choice: Keycloak 26 self-hosted via cgr.dev/chainguard/keycloak-fips, OIDC primary, x509 mTLS for CAC, SAML exposed for site SSO. JWTs verified at every service entry point.

Roles (RBAC): - viewer — read-only access to cop-ui and audit log. - operator — APPROVE/DENY/MODIFY/CANCEL proposals. - planner — author and run campaigns (eval harness), edit scenarios. - analyst — generate V&V reports, query audit log, export. - admin — manage users, blueprint snapshots, deployment config.

Files: packages/uci-auth/ (TS lib: token verification, role checks, OIDC client), services/auth/ (Keycloak deployment + realm config + theming), deploy/auth/realm.json (declarative realm import).

Wire impact: every MQTT subscription requires a JWT; Mosquitto's HTTP backend auth plugin verifies against Keycloak. Side-channel publishes (uci-demo/operator/..., uci-demo/copilot/...) carry a subjectId field; copilot rejects publishes with no JWT claim match.

2. Persistence layer (packages/uci-persistence/ + services/db/ + services/blob-store/)

Why: TDP needs durable mission logs, scenario libraries, blueprint snapshots, eval reports, audit trails. None of this can be in-memory.

Choice: - Postgres 16 via cgr.dev/chainguard/postgres-fips for relational: users, scenarios metadata, campaigns, V&V reports, audit log shards, blueprint snapshot index. - MinIO via cgr.dev/chainguard/minio-fips for blob: blueprint snapshots (gzipped Float32Array regret tables, ~10–100MB each), eval timeline NDJSON, message-buffer dumps, video captures of V&V events.

Schema lives in packages/uci-persistence/migrations/ (numbered, forward-only). Prisma 6 or Kysely 0.27 for the query layer; Prisma if the team values type generation, Kysely if SQL fidelity matters more. Phase II default: Kysely (simpler footprint, easier to audit, no codegen at build time, no runtime overhead).

Key tables: - users — synced from Keycloak claims on first login. - scenarios — registry of scenario YAML/JSON files with metadata (name, classification, source, domain set). - campaigns — campaign-runner job records (status, params, started/finished, requested-by). - campaign_episodes — per-episode results (scenario, agent, degrade, payoff breakdown, blueprint ref). - audit_log_shards — immutable, hash-chained audit-log shards (see §3). - blueprint_snapshots — index of MinIO blueprint files (iteration count, exploitability, taken-by, taken-at). - vandv_reports — final report metadata + MinIO PDF/HTML ref.

Files: packages/uci-persistence/ (lib), services/db/ (Postgres deployment + migration runner), services/blob-store/ (MinIO deployment + bucket policies).

3. Sealed audit log (packages/uci-audit/ + services/audit/)

Why: V&V acceptance and forensic review demand a tamper-evident record of every operator decision and every solver proposal. Bus messages alone are not enough — they're ephemeral on Mosquitto and only partially captured by the validator.

Design: append-only hash-chained shards (Merkle DAG, every shard hash includes prior shard hash). Sealed every 5 minutes; the seal is signed with an HSM-or-software key. Shard storage in Postgres audit_log_shards; payloads in MinIO. Query API: time-range scan + hash-verify.

Captures: - Every operator action (APPROVE/DENY/MODIFY/CANCEL) with subjectId and requestId. - Every solver proposal with the SubroutineTrace[] decomposition. - Every mode change, every scenario load, every degrade injection. - Every Keycloak login/logout. - Every config change.

Files: packages/uci-audit/ (lib: hash-chain, seal/verify, query builder), services/audit/ (subscriber that materializes shards from bus traffic + auth events).

4. Campaign-runner (services/campaign-runner/ — productizes the Phase 0 eval-harness)

Why: Phase 0's services/eval-harness/ is a one-shot CLI for the proposal sprint. Phase II needs a long-running service that authors campaigns, schedules them on a worker pool, persists results, and serves the V&V report UI.

API: - POST /campaigns — create campaign (scenarios × agents × degrades × seeds, episodes per cell). - POST /campaigns/<id>/run — queue for execution. - GET /campaigns/<id> — status + partial results. - GET /campaigns/<id>/report — final V&V report (HTML + PDF). - WS /campaigns/<id>/stream — live episode events.

Internally schedules episodes to a worker pool that boots an isolated copilot + world-sim + red-agent per episode (in-process for speed, or container-per-episode for full isolation — behind a flag). Results stream to Postgres + MinIO.

V&V Report is the centerpiece deliverable. Generated automatically from campaign data: - Per-cell payoff distributions with CI95. - Cross-cell aggregate tables. - Robustness curves (payoff vs degrade level). - Per-decision interpretability appendix (top-N decisions by Δ-utility, with subroutine decomposition). - All artifact hashes for reproducibility. - Audit-log seal cross-references.

5. M&S bridges

Primary: Command: Professional Edition (services/cpe-bridge/)

CPE has a Lua scripting API; the bridge runs CPE under a controller process, exposes a UCI v2.5 facade over MQTT (translating CPE scenario state → UCI MTs and UCI commands → CPE inputs). Mirrors services/adsb-bridge/ structurally:

CPE process ↔ CPE Lua hooks ↔ services/cpe-bridge/ ↔ MQTT (uci/v2_5/#) ↔ rest of stack

CPE seat license budgeted at $3–10k; purchase one before proposal submission so the bridge's first commit can land in Phase II M1.

Secondary: AFSIM (services/afsim-bridge/)

AFSIM is ITAR-controlled (Cat XI(d)). Realistic access timeline 6–12 months post-award via JCP/DD2345 sponsorship. JCP application must be in flight before proposal submission; proposal text should say "JCP application submitted on YYYY-MM-DD; expected access M6 of performance period."

If access slips: hardened services/world-sim/ is the fallback environment for Year-2 V&V. Multi-domain scenarios in scenarios/ can be authored to match the kinds of joint-all-domain operations AFSIM would otherwise host. This fallback is also called out explicitly in the proposal so a reviewer doesn't fixate on the AFSIM risk.

6. Multi-domain scenario authoring (services/world-sim/ + apps/cop-ui/admin/)

Why: RFP demands "thousands of assets across multiple domains (air, sea, land)" and "extended time horizons." Today's scenarios are air-only, 3 effectors, ~5 tracks, 120–200s loop.

Schema extension (services/world-sim/src/scenario.ts): - New asset kinds: NAVAL_SURFACE, NAVAL_SUBSURFACE, GROUND_VEHICLE, GROUND_INFANTRY, SPACE_TRACK. - New event types: runtime_spawn (already implicit via Red-agent), relocate_asset, comms_event (richer than current degrade), weather_event (sensor confidence modifier). - Time-horizon extension: scenarios up to 24h sim-time; world-sim wall-clock loop separated from sim-clock advancement.

Scenario authoring UI lives under apps/cop-ui/admin/scenarios/. WYSIWYG drag-and-place on the MapLibre canvas, save/load to scenario library in Postgres, export to YAML for offline review.

Government-authored scenarios land in services/world-sim/ as the same YAML format; an import-cpe-scenario CLI converts CPE-native scenarios to UCI-native YAML.

7. Scaling work (packages/uci-solver/ + packages/uci-game/)

Tactical (3 effectors / 5 tracks / 120s) is in Phase 0. Phase II Year 2 must demonstrate the abstraction + subgame-solving pipeline at three problem sizes spanning 1.5 orders of magnitude: - T1: 3 effectors / 5 tracks / 120s (Phase 0 baseline) - T2: 10 effectors / 30 tracks / 1h - T3: 30 effectors / 100 tracks / 6h

Operational (1000+ assets, joint all-domain) is the stretch claim, projected with stated assumptions, not demonstrated. Don't promise it as a Year-1 deliverable.

When the TS regret-table impl saturates (likely at T3), swap RegretTable for a napi-rs Rust impl behind the existing interface in packages/uci-solver/src/regret.ts. The interface was specifically designed in Phase 0 for this swap. Rust impl lives in packages/uci-solver-native/ as a new optional dep, falls back to TS impl on platforms without prebuilds.

8. Observability (deploy/observability/ + packages/uci-otel/)

Why: V&V acceptance demands traceable, correlated logs across services. Government sites also typically require local dashboards (no SaaS observability).

Stack (all Chainguard FIPS images): - OpenTelemetry collector (opentelemetry-collector-fips) receives traces + metrics + logs from every service via OTLP. - Tempo (tempo-fips) for traces. - Prometheus (prometheus-fips) for metrics. - Loki (loki-fips) for logs. - Grafana (grafana-fips) for dashboards — preinstalled dashboards for: solver convergence, bus throughput, validator audit rate, per-MT counts, copilot decision latency, campaign runner job state, auth events.

packages/uci-otel/ is a thin wrapper around the OTel JS SDK that every service imports; auto-instruments MQTT publishes and subscribes with topic-as-attribute + bus-message-size + schema-validation-outcome.

9. Admin UI (apps/admin-ui/ — new Next.js app)

Why: cop-ui is operator-facing (tactical). Admin/planner/analyst roles need a different surface: campaign authoring, blueprint snapshot management, audit-log query, V&V report viewing, scenario library, user management.

Stack: Next.js 16, same Tailwind v4 theme tokens as cop-ui (DRY via a new packages/uci-ui-theme/), keycloak-js for OIDC, server components for the heavy reports.

Pages: - /campaigns — list, create, monitor, drill-down. - /scenarios — library browser + WYSIWYG authoring + import/export. - /blueprints — solver blueprint snapshots; promote/rollback, exploitability dashboard. - /audit — sealed-log query with hash-chain verification. - /reports — V&V reports (live + archived). - /users — admin only.

11. Model-agnostic LLM client (packages/uci-llm/)

Why: The operator must be able to point the stack at any LLM backend they want — Anthropic API (current dev default), Bedrock-on-GovCloud Claude (cleared GovCloud), Azure-AI Anthropic, Ollama / vLLM / llama.cpp (air-gapped local), Azure OpenAI Gov, custom on-prem inference, anything OpenAI-compatible. Phase II target deployments span all of these and Claude must stay first-class wherever it's reachable.

Design: one LanguageModelClient interface in packages/uci-llm/; each backend is a single ≤200-LOC TS adapter under packages/uci-llm/src/clients/<name>.ts. Adding a new backend is a ~50-LOC PR. No vendor SDK is imported anywhere outside packages/uci-llm/src/clients/.

Shipped backends (Phase 0): - anthropic — Claude via the Anthropic SDK. Preferred default when reachable. Prompt-caching enabled. Current services/copilot/src/claudeAgent.ts:1-207 semantics preserved verbatim. - ollama — local model via Ollama HTTP. Preferred default in air-gapped deployments. - bedrock — AWS Bedrock (Claude / Nova / Llama / Mistral). Works on GovCloud regions for cleared environments. - openai-compat — single client adapter that covers OpenAI, Azure OpenAI, Together, Groq, Fireworks, vLLM with the OpenAI-compatible endpoint, llama.cpp server, and any OpenAI-shaped HTTP service. Configured via LLM_BASE_URL + LLM_API_KEY.

Selection at runtime: env-driven (LLM_PROVIDER, LLM_BASE_URL, LLM_MODEL, LLM_API_KEY). Default selection logic: LLM_PROVIDER if set; else anthropic when ANTHROPIC_API_KEY is present (preserves current demo behavior exactly); else ollama. The operator can override every aspect per-deployment.

Capability flags (supportsToolUse, supportsPromptCache, supportsStreaming, supportsGrammar) are declared per client. Where a backend lacks native tool-use, the interface transparently falls back to JSON-mode + parsed grammar or instruction-prompted structured output. Same agent contract works against every backend.

Registry: registerClient(name, factory) lets downstream consumers add a custom backend (private fork of Bedrock, classified inference endpoint, etc.) without forking packages/uci-llm/. The "anything I want" extensibility primitive.

Parity test (packages/uci-llm/test/parity.test.ts): runs the same prompt + tool-schema against every shipped backend and asserts structurally equivalent output. Swapping backends never silently changes agent behavior.

Phase II additions: - LLM_PROVIDER is surfaced in the admin UI (apps/admin-ui/settings/llm) as a runtime-configurable site setting; admins can switch backends without redeploying. - A health probe (packages/uci-llm/src/healthCheck.ts) verifies the configured backend on boot; failure surfaces as a top-strip alert rather than a silent fallback. - Air-gap deployment bundle (deploy/compose/prod.yml) ships a default Ollama sidecar based on cgr.dev/chainguard/ollama-fips preloaded with a Llama-class model so the stack is self-contained out of the box. (vLLM via vllm-openai-fips is an alternative for higher-throughput sites.) - Cleared-GovCloud deployment bundle defaults to bedrock with Claude-on-GovCloud.

12. OMS conformance & CAL bindings (packages/uci-cal-bindings/ + services/cal-conformance/ + docs/oms/)

Why: Phase II commits to OMS Tier 2 by M24 (see "OMS Standard alignment" above). That commitment requires three concrete artifacts: (a) a CAL-conformant API surface so third-party OMS integrators can consume our ASB through the standard interface; (b) a conformance test harness that verifies our Services implement the OMS Service state machine and Tier 1 / Tier 2 message sets; © vendored OMS spec docs tracked by hash, same way schema/UCI_v2_5/ is tracked. None of these exist today.

packages/uci-cal-bindings/ — TypeScript CAL veneer: - Implements the CAL API shape from OMSC-SPC-001 RevL (CAL Client, CAL Implementation, CAL Instance, Readers/Writers/Factories, QoS settings, ASB Connection Status state diagram). - Sits on top of packages/uci-bus — the bus package is the substrate; the CAL bindings are the standard-conformant surface integrators write against. Internal Services keep using connectBus directly for the side channels that aren't CAL-relevant. - One ≤300 LOC adapter; not a rewrite. The CAL spec is mostly a naming + lifecycle contract, not a transport replacement — Mosquitto remains the wire. - Lives parallel to packages/uci-llm/'s shape: clean interface, capability flags (supportsRDMA, supportsStreamingDataProducts — false; sufficient for Tier ½ messaging).

services/cal-conformance/ — CI-gated conformance verifier: - Validates that every Service implements the OMS Service state transition behaviors (OMSC-STD-001 §6.1.1): INITIALIZINGOPERATESHUTDOWN, with the required state reporting messages on the ASB. - Validates that every Subsystem-shaped emitter (world-sim simulated subsystems, eventually real hardware) implements the Subsystem state machine (OMSC-STD-001 §6.2.1) including Startup, Erase, and the four Shutdown cases. - Validates Tier 1 Required Subsystem Message Set (M12) and Tier 2 Minimum Message Set (M24) presence and well-formedness across a representative campaign. - Validates Isolator boundary behavior (OMSC-STD-001 §6.4) on services/adsb-bridge, services/cpe-bridge, and services/afsim-bridge: every message crossing the boundary is externally-standardized inbound and OMS-conformant outbound. - Outputs a signed conformance report (tdp/acceptance/oms-conformance-<tier>.json) consumed by the V&V report generator.

docs/oms/ — vendored OMS Definition & Documentation (D&D) set: - Copy of the six governing documents listed in "OMS Standard alignment" above. - docs/oms/PROVENANCE.md tracks SHA-256 of every file vs. open-arsenal/oms main. - Same vendor-and-pin pattern as schema/UCI_v2_5/ (which the existing CLAUDE.md forbids modifying). Do not edit; OMS Change Requests go upstream through OACWG.

Adopting OMS document templates as TDP doc sources of truth: - The five Checklists (Mission Package, Platform, Subsystem, Service, Isolator) get filled out as part of the TDP and pass-graded by services/cal-conformance/ where automatable. - The MPW / MPDD / PDD / SDD / Service Contract templates replace the bespoke "Architecture spec / Operator manual / Administrator guide" structure in the previous TDP composition. See revised "TDP composition" section below.

Files: packages/uci-cal-bindings/src/{index,types,client,writer,reader,qos,status}.ts, packages/uci-cal-bindings/test/{shape,qos,status}.test.ts, services/cal-conformance/src/{main,serviceState,subsystemState,messageSet,isolator}.ts, services/cal-conformance/test/, docs/oms/{01..20}_*.md + PROVENANCE.md, tdp/oms-templates/{mpw,mpdd,pdd,sdd,service-contract,checklists}/ (filled-out templates).

10. Deployment & packaging (deploy/)

Single-node default for Phase II V&V events; team installs at the government site and runs.

Targets: - deploy/compose/ — multi-service docker-compose for a single workstation. Default for workgroup-scale V&V events. - deploy/helm/ — Helm chart for k8s deployment. Multi-host scaling option. - deploy/install/ — bash installer that bootstraps an air-gapped VM from an offline bundle (Chainguard FIPS images preloaded, scenario library seeded, default admin password).

Image build pipeline: - One Dockerfile per service under each service directory. - Multi-stage builds: build stage on cgr.dev/chainguard/node-fips:latest-dev (has shell, npm, pnpm); final runtime stage on cgr.dev/chainguard/node-fips:latest (distroless, nonroot node user) for Services that ship our TS code, or the appropriate *-fips runtime image (postgres-fips, keycloak-fips, etc.) for third-party Services. The single OS base for our own images is chainguard-base-fips. - Reproducible via locked Node version, locked pnpm lockfile, locked Chainguard image digest per stage. Chainguard images are apko-built and reproducible upstream. - All images cosign-signed at release time (cgr.dev/chainguard/cosign-fips for the signer); Chainguard upstream signatures are verified before our re-sign so the chain of trust is intact. - High-quality CycloneDX SBOM is shipped by Chainguard with every base image (attestation available via cosign download attestation); our build step concatenates that with our own application-layer SBOM and bundles the merged document in the TDP.


Phase II execution — 24-month plan

Year 1 (months 1-12)

Month Workstreams
M1 Phase 0 work merged to main. Solver, Red agent, eval harness, SolverAgent. Phase 0 carry-over: any rough edges from the sprint get polished. Team onboards new engineers; security clearances initiated. OMS D&D vendored into docs/oms/ with PROVENANCE.md; team reads OMSC-STD-001 RevM + OMSC-SPC-001 RevL end-to-end.
M2-3 Production foundation. Chainguard FIPS base images adopted across every container; reproducible build pipeline pinned to Chainguard image digests; first cosign-signed image release of the existing stack (mosquitto + validator + world-sim + copilot + cop-ui) built on node-fips + chainguard-base-fips. No new features yet — proving the bake. OMS Service state machine (OMSC-STD-001 §6.1.1) added to every Service main.ts; retained uci-demo/oms/service/<id>/state topic publishes state transitions.
M3-5 Identity & auth (§1). Keycloak deployment; OIDC integration in every service; mTLS option for CAC. Roles defined, role gates added at every service entry. Cop-ui requires login. JWT claims flow through MQTT publishes. OMS Cybersecurity Guide (OMSC-GDE-003) controls mapping drafted in tdp/security/oms-controls.md alongside CMMC L1.
M4-7 Persistence (§2). Postgres + MinIO deployment. packages/uci-persistence/ lib. Initial schema: users, scenarios, campaigns, blueprint_snapshots. Migration runner. World-sim, copilot, solver-daemon, eval-harness all gain optional persistence (env-flag controlled; demo mode remains stateless). OMS Data Transfer URI scheme (OMSC-STD-001 §6.3.2.4.1) used for every MinIO blob reference so blueprint snapshots / report PDFs are Tier-2 conformant.
M5-8 Sealed audit log (§3). packages/uci-audit/ + services/audit/. Captures operator actions, solver proposals, mode changes, auth events. Hash-chain verification CLI. Stored in Postgres + MinIO.
M6-9 CPE bridge (§5 primary) as OMS Isolator. Lua hooks, UCI-facade translation, first end-to-end run of a CPE-authored scenario through the existing copilot + solver. SDD authored using OMS Isolator template (tdp/oms-templates/sdd/cpe-bridge.md); Isolator Checklist filled out and pass-graded. This is the Year-1 government-readiness milestone.
M7-10 Campaign-runner (§4). Productize Phase 0 eval harness. REST + WebSocket API. Worker pool. V&V report generator (HTML + PDF).
M8-11 Multi-domain scenario authoring (§6). YAML schema extension; admin-UI scenario editor (page only — full UI in Year 2); naval + ground asset kinds; new MT builders for joint-domain wire signals; world-sim handlers.
M9-12 Observability (§8) + OMS conformance harness (§12). OTel instrumentation; preinstalled dashboards; smoke-tested on the full stack. packages/uci-cal-bindings/ + services/cal-conformance/ ship with OMS Tier 1 verification passing on every Service. SDD authored per Service. Service Contract template filled out for each services/*.
M12 Year-1 V&V event. Government-provided multi-domain scenario, run through CPE bridge into solver-augmented copilot, full audit log captured, V&V report generated. OMS Tier 1 conformance report signed and attached. Acceptance criteria below.

Year 2 (months 13-24)

Month Workstreams
M13-15 AFSIM bridge (§5 secondary) as OMS Isolator, assuming JCP cleared. If slipped: fall back to hardened world-sim multi-domain scenarios; document the fallback in the Year-2 TDP supplement. Same Isolator SDD + Checklist treatment as CPE bridge.
M14-17 Three operating modes formalized (§"Three operating modes"). Red trainer + impartial adjudicator UX in cop-ui; solver-side adjustments (Red-driven blueprint; Δ-utility scoring). Audit-log markers. Mode-switch acceptance test.
M15-18 Scaling work (§7). T2 (10/30/1h) demonstrated. T3 (30/100/6h) demonstrated with PBS subgame-resolving. Rust regret-table swap if needed. Scaling-curve V&V artifact updated.
M16-19 Admin UI (§9). Full Next.js app. Campaign-runner control surface, scenario library, blueprint manager, audit-log query, V&V report viewer, user management.
M17-21 Robustness V&V campaigns at scale + OMS Tier 2 conformance. Systematic matrix: scenarios × degrade levels × sensor-noise levels × novel-Red variants. Campaigns auto-run nightly on the campaign-runner; reports archived; regression gates wired into eval workflow. services/cal-conformance/ extended with Tier 2 Minimum Message Set verification (OMSC-STD-001 §6.3.1.5); CI gate flips from Tier 1 to Tier 2.
M19-22 MPDD + PDD + Mission Package Checklist authored. Mission Package Design Description names every UoR, every data exchange, every Isolator, every Service Contract; Platform Design Description covers ASB (Mosquitto), CAL (packages/uci-cal-bindings/), OCE (Chainguard FIPS image set + compose/helm), data storage (postgres-fips + minio-fips). Mission Package Checklist and Platform Checklist pass-graded against services/cal-conformance/ output.
M20-23 Packaging & deployment (§10). Compose + Helm + air-gapped installer. SBOM generation; cosign signing; install rehearsal on a clean VM. OMS Mission Package distribution bundle assembled per MPDD instructions.
M22-24 TDP composition & delivery. All OMS documents complete (MPW, MPDD, PDD, every SDD, every Service Contract, all five Checklists); final V&V event with Tier 2 conformance report; acceptance walkthrough with government PM.

TDP composition (what gets delivered at M24)

The Technical Data Package is the single most important Phase II artifact. RFP says it must be "sufficient for government use in wargaming, analysis, and COA development." That's a high bar. The TDP doc structure is the OMS Definition & Documentation (D&D) template set, not a bespoke organization — reviewers grade against forms they already use to evaluate Tier 2 Mission Packages. Bespoke content (algorithm spec, interpretability case-book) attaches as appendices to the relevant OMS document. This costs nothing and dramatically reduces reviewer friction.

Software bundle (tdp/software/)

  • All container images (cosign-signed, Chainguard *-fips base).
  • deploy/compose/ + deploy/helm/ + deploy/install/.
  • Offline image archive (~3 GB) for air-gap installation.
  • SBOM per image (CycloneDX JSON).
  • Source-code release (Apache 2.0 already — verify ITAR scoping for source distribution).

Documentation bundle (tdp/docs/ — organized per OMS D&D template set)

OMS-template documents (sources of truth, government-recognizable):

  • MPW — Mission Package Worksheet (per 10_1_OMSC-TMP-006_RevH). Configuration record of the delivered mission package: which Services, which Isolators, which Platform configuration, which Tier targeted, which OMS messages used.
  • MPDD — Mission Package Design Description (per 11_1_OMSC-TMP-007_RevI). Top-level design doc. Replaces the previous "Architecture spec" bullet. Lists every UoR, every data exchange, every service contract reference. Algorithm spec attaches as an appendix here.
  • PDD — Platform Design Description (per 12_1_OMSC-TMP-001_RevM). ASB (Mosquitto config), CAL (packages/uci-cal-bindings/ interface), OCE (Chainguard FIPS container set + compose/helm), data storage (Postgres-FIPS + MinIO-FIPS), Other Mission Processing. FIPS configuration
  • per-image STIG-equivalence mapping documented here, referencing the Chainguard CMVP certificate from their published FIPS commitment.
  • SDD (one per Service) (per 13_1_OMSC-TMP-002_RevM). One Service Design Description per services/* (copilot, world-sim, validator, solver-daemon, red-agent, campaign-runner, audit, auth, db, blob-store, cpe-bridge, afsim-bridge). The "Integration guide" content lives inside the relevant SDD as an integration appendix.
  • Service Contract (one per Service) (per 14_1_OMSC-TMP-003_RevM). Formal interface contract: subscribed topics, published topics, message types, QoS expectations, error semantics. Replaces ad-hoc bus contract docs.
  • Mission Package / Platform / Subsystem / Service / Isolator Checklists (per 15_1 / 16_1 / 17_1 / 18_1 / 19_1 Rev M). All five filled out and pass-graded; Subsystem checklist applies to world-sim simulated subsystems and any real subsystems wired via Isolator.

Bespoke supporting documents (attached or referenced from the OMS templates above):

  • Operator manual — three operating modes, scenario loading, campaign authoring, approval workflow, audit-log query. Cross-referenced from the cop-ui SDD.
  • Administrator guide — installation, user management, backup/restore, observability. Cross-referenced from the PDD.
  • Interpretability case-book — 10 worked decisions from Year-2 V&V campaigns showing regret decomposition, mixed strategy, and outcome. Attached as an MPDD appendix.
  • V&V report — full Year-2 campaign matrix results with statistical analysis, robustness curves, scaling curves, and acceptance-test pass log. Includes the signed OMS Tier 2 conformance report from services/cal-conformance/.
  • Security model — CMMC L1 controls mapping + OMS Cybersecurity Guide (OMSC-GDE-003) controls mapping, FIPS configuration, audit-log seal verification procedure, threat model.
  • Standards lineage — short doc that names the standards basis (OMS v2.5 RevM, UCI v2.5) with vendored-document hashes from docs/oms/PROVENANCE.md and schema/UCI_v2_5/PROVENANCE.md.

Acceptance test suite (tdp/acceptance/)

  • Reproducible scenario set with expected outcomes (hash-locked).
  • Bit-reproducible solver convergence test (seeded MCCFR on Tripwire → exploitability < ε within N iterations).
  • End-to-end smoke test (full stack boot → scenario load → campaign run → V&V report generated → audit log sealed and verified).
  • Performance benchmark (single-workstation, named CPU, wall-clock targets per scenario size).
  • Negative tests (auth bypass attempt, invalid scenario, malformed audit-log shard, etc.).

Reproducibility bundle (tdp/reproduce/)

  • Locked pnpm-lockfile, Chainguard FIPS image digests (per-tag SHA256 pins from cgr.dev/<org>/*-fips).
  • Build instructions for the toolchain.
  • Hash manifest for every TDP artifact.

Critical files / new directories

New top-level directories

  • deploy/ — compose, helm, install scripts, observability stacks.
  • docs/oms/ — vendored OMS D&D set (the six governing documents) + PROVENANCE.md. Do not modify, same rules as schema/UCI_v2_5/.
  • tdp/ — staging area for TDP artifacts (gitignored except for templates).
  • tdp/oms-templates/ — filled-out OMS document templates (MPW, MPDD, PDD, SDDs, Service Contracts, Checklists) — these are the TDP docs.
  • plan/ — already exists; Phase II planning + design notes.

New workspace libraries (packages/)

  • packages/uci-llm/ — model-agnostic LanguageModelClient (Phase 0; expanded in Phase II with admin-UI settings, health probe, capability fallbacks).
  • packages/uci-auth/ — OIDC client, JWT verification, role checks, mTLS helpers.
  • packages/uci-persistence/ — Postgres + MinIO query layer (Kysely-based), migrations.
  • packages/uci-audit/ — hash-chained sealed audit log, seal/verify, query API.
  • packages/uci-otel/ — OpenTelemetry auto-instrumentation wrapper.
  • packages/uci-cal-bindings/ — CAL API surface (OMSC-SPC-001) over packages/uci-bus; the OMS-conformant integrator-facing veneer.
  • packages/uci-ui-theme/ — Tailwind v4 theme tokens shared by cop-ui + admin-ui.
  • packages/uci-solver-native/ — optional Rust regret-table impl (Year 2, demand-driven).

New workspace services (services/)

  • services/auth/ — Keycloak deployment + realm config.
  • services/db/ — Postgres deployment + migration runner.
  • services/blob-store/ — MinIO deployment + bucket policies.
  • services/audit/ — bus subscriber → audit-log materializer.
  • services/campaign-runner/ — REST + WebSocket API, worker pool, V&V report generator.
  • services/cal-conformance/ — OMS Service / Subsystem / Isolator state-machine + Tier message-set conformance verifier; CI-gated.
  • services/cpe-bridge/ — CPE Lua-hook ↔ UCI MQTT translator. OMS Isolator per SDD template.
  • services/afsim-bridge/ — AFSIM ↔ UCI MQTT translator (Year 2, JCP-gated). OMS Isolator per SDD template.

New apps (apps/)

  • apps/admin-ui/ — Next.js 16 admin/planner/analyst surface.

Existing repo touch-points

File Change
docker-compose.yml Promoted to a multi-service compose; full deployment moves to deploy/compose/dev.yml and deploy/compose/prod.yml.
apps/cop-ui/ Adds Keycloak login wall; reasoning panel gains audit-log seal indicator; mode switcher in top strip.
services/copilot/src/main.ts Adds mode-aware agent selection (PLANNING_AIDE/RED_TRAINER/ADJUDICATOR); JWT verification on operator action subscriptions; audit-log emit on every decision.
services/world-sim/src/scenario.ts Schema extension for naval/ground/space asset kinds + new event types.
services/validator/src/main.ts Audit feed becomes optional Postgres-backed mode; VALIDATOR_FULL_AUDIT=1 for V&V runs; per-message latency tracking added.
package.json (root) pnpm up becomes a thin wrapper around deploy/compose/dev.yml. Production deployment via separate commands.
CLAUDE.md Updated to remove the "no persistence/auth/DB" constraint; replaced with "the demo mode (pnpm up:demo) is stateless; the production mode (pnpm up:prod) is not." Adds "OMS v2.5 is the framing standard; do not modify docs/oms/; treat OMS Service state machine as load-bearing for every Service main.ts."
BUILD.md Day 13+ entries documenting Phase II milestones.
Every services/*/src/main.ts Gains an OMS Service state machine (INITIALIZINGOPERATESHUTDOWN) and retained state-publish on uci-demo/oms/service/<id>/state. Existing connect/disconnect logging stays; the state machine wraps it.

Existing utilities to reuse (do not duplicate)

All Phase 0 reuses still apply. Additional Phase II reuses:

  • services/adsb-bridge/src/bridge.ts:120-275 — pattern reused by services/cpe-bridge/ and services/afsim-bridge/.
  • services/validator/src/main.ts audit ring buffer — pattern reused by packages/uci-audit/ shard builder, but with Postgres backing.
  • apps/cop-ui/lib/busSubscriber.ts MQTT subscriber + JWT extraction — pattern reused by apps/admin-ui/.
  • apps/cop-ui/components/CommsDegrade.tsx — UX pattern reused by admin-ui campaign-config form.
  • apps/cop-ui/lib/replay.ts — same code drives the V&V report timeline view.
  • packages/uci-codec/ + packages/uci-bus/ — unchanged. All new services consume.

Verification & acceptance

Year-1 V&V event acceptance criteria (M12)

Pass conditions: 1. Government-provided multi-domain scenario loads through services/cpe-bridge/ (as an OMS Isolator) without manual intervention. 2. Copilot generates ≥10 distinct proposals over the scenario, each with a logged SubroutineTrace[]. 3. Operator (with role operator) authenticates via OIDC, APPROVES/DENIES/MODIFIES proposals through the cop-ui. 4. Audit log captures every proposal + operator action; hash-chain seal verifies clean. 5. V&V report generated automatically; reviewer can drill from aggregate payoff to per-decision subroutine decomposition. 6. Solver-daemon retains a valid blueprint across a forced restart (persistence works). 7. Comms-degrade injection at dropPercent: 60 for 60s does not cause data loss or audit-log gaps. 8. services/cal-conformance/ produces a signed OMS Tier 1 conformance report. Every Service implements the OMS Service state machine; the Tier 1 Required Subsystem Message Set is present on the ASB; the CPE bridge passes the Isolator Checklist.

Year-2 V&V event acceptance criteria (M24)

Pass conditions (additive to Year 1): 1. All three operating modes (PLANNING_AIDE, RED_TRAINER, ADJUDICATOR) demonstrated on the same scenario in a single session. 2. T3 problem size (30 effectors / 100 tracks / 6h sim-time) solved to a target exploitability on a 64-core workstation in <2h wall-clock. 3. Robustness V&V matrix: 3 scenarios × 4 degrade levels × 3 sensor-noise levels × 3 novel-Red variants × 20 episodes each = 2,160 episodes, completed nightly on the campaign-runner with <5% flake rate. 4. Full air-gapped installation from offline bundle to fully-running stack in <60 minutes on a clean Linux VM. Userspace FIPS is delivered by the Chainguard image set, so the target VM is not distro-locked; the M24 acceptance run is rehearsed on both a stock Ubuntu LTS server and a stock RHEL 9 server to prove kernel-agnosticism. 5. SBOM + signed images verified by government independent reviewer; CMMC L1 controls mapping and OMS Cybersecurity Guide controls mapping accepted. 6. TDP delivered on signed offline media + secure transfer. 7. OMS Tier 2 conformance report signed and attached. Tier 2 Minimum Message Set present; all five OMS Checklists (Mission Package, Platform, Subsystem, Service, Isolator) filled out and pass-graded; MPDD + PDD + every SDD + every Service Contract delivered as part of the TDP.

Acceptance test suite (continuous CI gate)

Lives in tdp/acceptance/ and runs in the existing GitHub Actions plus a self-hosted ITAR-cleared runner. Fail conditions break Phase II milestone payments — these are the contractual gates.

Refactor smoke-test gate (every refactor PR)

Phase II involves substantial refactor work: extracting startCopilot() / startWorldSim() service factories, promoting worldState.ts to @uci-demo/game/worldMirror, renaming claudeAgent.tsllmAgent.ts, extracting runWithConcurrency, splitting persistence-aware vs stateless modes of every service. Every refactor PR — regardless of how mechanical it looks — must run an end-to-end smoke test before merge:

  1. pnpm install && pnpm -r build && pnpm -r typecheck && pnpm -r test — all green.
  2. pnpm up (or the equivalent demo bring-up for the touched services) — full stack boots.
  3. Validator audit at http://127.0.0.1:7700/audit?n=50 reports 100% valid messages.
  4. Affected user-visible path verified manually: approval card appears within 1 cycle; APPROVE / DENY / MODIFY each round-trip cleanly; comms-degrade injection lands; replay reconstructs state correctly.
  5. If the refactor touches a service with persistence (Phase II+), pnpm up:prod and verify Postgres + MinIO + audit log all still flow.

Typecheck + unit tests alone do not gate refactor merges. Document the smoke-test in the PR body as a checklist; reviewers should be able to repeat it independently. Automate where feasible via the campaign-runner regression suite. Refactors that pass typecheck but fail behavioral smoke tests are the single highest source of preventable Phase II regressions — they break MQTT subscription wiring, lifecycle ordering, or retained-topic invariants in ways that unit tests cannot catch. The smoke-test gate is non-negotiable.


Risks specific to government delivery

  1. AFSIM access slippage (likely). JCP can take 6–12 months even with a strong sponsor. Mitigation: lead with CPE; document AFSIM as Year-2 stretch; have a hardened services/world-sim/ multi-domain fallback ready by M12.
  2. CMMC scope creep to Level 2. If the contracting officer reinterprets the topic header mid-performance, L2 adds ~110 controls and ~$200–500k of compliance work. Mitigation: Confirm L1 scope in writing at contract kickoff; if L2 is required, scope it as a Phase II option, not an in-budget surprise.
  3. LLM backend portability across deployment contexts. Target deployments span air-gapped wargaming centers (no Anthropic API), cleared GovCloud (Bedrock Claude), commercial customers with vendor-neutrality requirements, and Shebash dev (Anthropic API directly). Mitigation: the model-agnostic LanguageModelClient (§11) covers all of these behind one interface. Claude stays the preferred default wherever reachable — Anthropic API in dev/non-air-gap, Bedrock Claude in cleared GovCloud. Local Ollama / vLLM is the default in air-gapped sites. The same agent code, same system prompt, same tool schemas, different backend. No proposal sentence implies Claude is mandatory; no proposal sentence implies Claude is excluded. Phase II adds capability-flag fallbacks so even backends without native tool-use (most local models today) produce equivalent structured output.
  4. Multi-domain scaling reality. "Thousands of assets" is hard. T3 (30/100/6h) is demonstrable; 1000+ is not, on the contracted hardware. Mitigation: Be explicit in the proposal about the demonstrated tier vs the projected tier. Tie operational-scale claims to Year-2 Year-end + stated infrastructure assumptions, not Year-1.
  5. Personnel security clearances. ITAR + likely Secret-level for some V&V events. Lead time 6–18 months. Mitigation: Identify cleared personnel before proposal; budget sponsorship for one additional clearance during Phase II Year 1.
  6. CPE seat licensing & EULA review. Matrix Games / WarfareSims EULA may restrict government redistribution or modification. Mitigation: Confirm CPE EULA terms before proposal; explore an OEM/SDK license if standard seat license is too restrictive.
  7. Chainguard commercial subscription dependency. The full *-fips enterprise image set requires a paid Chainguard subscription with per-organization registry access (the cgr.dev/<org>/<image> pattern). The Starter tier is insufficient for the FIPS lineup. Mitigation: budget a Chainguard enterprise subscription in the labor/licenses line (typical SBIR-scale pricing fits within the software licenses budget below); negotiate a multi-year prepay at contract kickoff for cost certainty. If a government program contractually mandates Iron Bank instead, the build is portable — same Dockerfiles, swap the FROM line for the registry1.dso.mil/ironbank/... equivalent. Document this swap path in the PDD as a contingency.
  8. STIG mapping for non-RHEL base. Chainguard images are built on Wolfi/Chainguard OS, not RHEL/UBI. The DISA STIG benchmarks are written against named distros; a literal "RHEL 9 STIG pass" doesn't apply. Mitigation: per-image STIG-equivalence mapping documented in the PDD, showing each STIG control's enforcement mechanism in Chainguard's distroless hardening. Government reviewers familiar with current acquisition practice (FedRAMP High, CMMC L2 reviewers) generally accept this; if a specific program mandates Iron Bank/RHEL STIG, see risk 7 mitigation.
  9. FIPS provider variance across upstream releases. Chainguard rebuilds glibc-openssl-fips regularly; minor version bumps in OpenSSL's FIPS provider can change available algorithm sets (e.g. MD4 unavailability surfaces in WebPack 4). Mitigation: pin to specific Chainguard image digests for any release that goes to a government V&V event; document the digest in tdp/reproduce/; CI smoke-test pins refresh on a deliberate cadence, not on every nightly.
  10. OMS message set divergence from UCI. OMS v2.5 defines its own Tier 1 / Tier 2 / Tier 3 message sets (OMSC-STD-001 §6.3.1); UCI v2.5 defines its own MTs (packages/uci-codec). The standards are aligned by intent (OMS builds on UCI) but the wire-level message catalogs are not 1:1. Phase II must publish both on the ASB where the Tier requires it, not just one. Mitigation: packages/uci-codec gains an OMS-message builder set in parallel with UCI; services/cal-conformance/ validates both catalogs. Budget M9-12 explicitly accounts for this.
  11. OACWG Change Request lead time. If the solver / game-theoretic engine needs a mission-package-level abstraction (e.g., "campaign") that OMS doesn't model, a Change Request through OACWG can take 6+ months. Mitigation: Identify any OMS gaps in M1-3 during the OMS read-through; file CRs early; design the campaign-runner so the campaign concept rides cleanly on top of OMS Mission Package + Services without requiring spec extension (preferred path).
  12. OMS Tier scope creep. Acquisition officer pushes from Tier 2 target to Tier 3. Tier 3 adds CAL CERT verification + full message set conformance + multi-language CAL bindings (Java + C++ per OMSC-SPC-007 / SPC-008). That's ~$300-500k of additional work. Mitigation: Tier 2 is the contractual baseline in the proposal; Tier 3 is a scoped Phase II Option, not in-budget.

Rough budget envelope

SBIR Phase II SCO awards typically run $1.5–2.0 M over 24 months (sometimes $2.5M with options). Rough Phase II allocation:

Category % of total Notes
Engineering labor (3–4 FTE) 60–65% Solver hardening, persistence/auth/audit, M&S bridges, UI, deployment.
SME tournament + V&V personnel 8–12% Year-2 expanded SME panel for adjudicator validation; cleared operator availability.
Government V&V event travel + setup 5–8% Two on-site V&V events (M12, M24) + reciprocal team travel.
Software licenses 3–5% CPE seats, AFSIM-adjacent tooling, Chainguard enterprise FIPS image subscription (replaces line-item for hand-rolled FIPS crypto libs).
Cleared facility + secure storage 2–4% If team doesn't already have an ITAR-cleared facility.
Security clearance sponsorship 2–3% One incremental Secret clearance during Year 1.
Documentation & TDP composition 5–8% Year-2 heavy; technical writer + reviewer hours.
Contingency 8–10% Standard hedge.

Phase II Option (if exercised, typical extra \(500k–\)1M): on-prem-inference for Claude narration, multi-site deployment hardening, additional joint-all-domain scenarios.


Key claims for the proposal (Phase II readiness narrative)

When defending the Phase II execution plan in the proposal:

  1. We have a working, open prototype today. The repo is verifiable; 21/596 UCI MTs on the wire; CI-green; documented architecture. Most competitors do not.
  2. The architecture already separates concerns the TDP requires. Agent interface + service mesh + schema-bound bus + side-channel JSON means auth/persistence/audit can be added at the seams without rearchitecting.
  3. UCI-native AND OMS-shaped = transition-ready twice over. Anything we build plugs into the forthcoming USAF C2 ecosystem with zero translation (UCI v2.5 on the wire). The architecture and TDP doc set are organized as an OMS v2.5 Mission Package (Platform + Services + Isolators + Tier 2 conformance, MPDD/PDD/SDD/Service Contracts/Checklists), so the package transitions cleanly into any OMS Adopting Program acquisition pipeline. Phase III force multiplier.
  4. CPU-only by design. Single-workstation deployment matches the RFP's "modest computational footprint" requirement. No GPU procurement is a Phase II delivery risk we eliminate.
  5. Interpretability is architectural, not bolted on. The strategy bank is TS code reviewable line-by-line; the regret table is keyed by subroutine ID, not opaque weights. The "why" is computable from the running system, not narrated.
  6. CPE primary, AFSIM secondary, world-sim fallback. Three-deep M&S strategy, with the most-cited risk (AFSIM access) explicitly hedged.
  7. Air-gap by default. No phone-home, no SaaS dependency, no Anthropic in prod path.

That positioning makes the Phase II execution credible without overpromising.