Skip to content

Design — Kubernetes deploy + GHCR publication (Phase 0 follow-up)

Pre-code design memo for packaging the demo's eight runtime artifacts as container images, publishing them to the repo's GitHub Container Registry, and shipping a Helm chart so an operator can helm install the entire stack on a Kubernetes cluster.

This is the engineering substrate behind the "Phase II transition" story in plan/sbir-phase-ii-tdp-delivery.md: the demo runs on commodity Kubernetes, signs and attests every artifact, and is built from FIPS-validated base images.

The PR landing this memo bundles memo + Dockerfiles + Helm chart + release workflow + kind smoke script so a reviewer can clone the branch, run ./deploy/kind-smoke.sh, and see all eight components healthy in a local cluster in under five minutes.

The headline acceptance gate: a fresh GitHub Actions run on a v* tag produces eight signed container images and one signed Helm chart in ghcr.io/<owner>/uci-demo/..., each with attached SBOMs and SLSA provenance attestations, and a reviewer can install the chart in a kind cluster via helm install uci-demo oci://ghcr.io/<owner>/uci-demo/charts/uci-demo --version <tag> and watch every pod go Ready.


Strand 0 — What we ship in this PR

Container images (8):

Image Source Runtime kind
validator services/validator/ long-lived Deployment (HTTP :7700)
world-sim services/world-sim/ long-lived Deployment
copilot services/copilot/ long-lived Deployment
solver-daemon services/solver-daemon/ long-lived Deployment
red-agent services/red-agent/ long-lived Deployment
adsb-bridge services/adsb-bridge/ long-lived Deployment
eval-harness services/eval-harness/ one-shot Job template
cop-ui apps/cop-ui/ long-lived Deployment (HTTP :3000)

services/demo-publisher/ is intentionally NOT shipped — it's a local-only smoke test that publishes intentionally broken payloads to exercise the validator's negative cases.

Helm chartdeploy/helm/uci-demo/, published as an OCI artifact at oci://ghcr.io/<owner>/uci-demo/charts/uci-demo.

Release workflow.github/workflows/release-images.yml runs on push to main (tags as main + <sha>) and on v* tags (tags as the semver). Each image is multi-arch (linux/amd64 + linux/arm64), SBOM-attached (syft), cosign-signed (keyless via GitHub OIDC), and SLSA-attested (build-provenance for SLSA L3).

kind smoke scriptdeploy/kind-smoke.sh boots a local kind cluster, builds all eight images locally, loads them into kind, helm-installs the chart, and asserts every pod reaches Ready within 90 seconds.

What does NOT ship in this PR (named so reviewers don't ask):

  • Cloud-provider-specific Terraform / Pulumi for EKS / GKE / AKS
  • cert-manager / external-dns / ingress-nginx — the chart leaves Ingress annotations stubbed; operators bring their own controller
  • A second chart for the Mosquitto broker — Mosquitto is part of the uci-demo chart, not a sub-chart (intentional: keeps demo install one command)
  • Horizontal autoscaling — the demo is single-replica; HPA is a Phase II concern
  • Network policies — opinionated allowlists belong in a Phase II hardening pass

Strand 1 — Base-image strategy (Chainguard FIPS)

Decision

Build against cgr.dev/<chainguard-org>/node-fips (Production-tier Chainguard image, FIPS-validated OpenSSL). Default fallback to cgr.dev/chainguard/node (free, non-FIPS) when no Chainguard org is configured — so the demo builds out-of-the-box for anyone who clones the repo, and "flips to FIPS" by setting one repository variable.

Critical finding

cgr.dev/chainguard/node-fips:latest is not publicly pullable. Pulling that image without authentication returns HTTP 401. The Chainguard documentation directs FIPS users to pull from their org-scoped registry path: cgr.dev/<your-chainguard-org>/node-fips. Access requires a Chainguard Production subscription.

The implication for this repo:

  • The Dockerfiles must NOT hard-code cgr.dev/chainguard/node-fips. They must accept a BASE_IMAGE build-arg.
  • The release workflow must NOT assume setup-chainctl will succeed without configuration. It runs the Chainguard auth step only when the CHAINGUARD_IDENTITY repository variable is set, and falls back to the free cgr.dev/chainguard/node otherwise.
  • The Helm chart references image.repository per service. Operators override via values.yaml.

Multi-stage Dockerfile pattern

Every Node service Dockerfile follows the same shape:

# syntax=docker/dockerfile:1.9

ARG BASE_IMAGE_DEV=cgr.dev/chainguard/node:latest-dev
ARG BASE_IMAGE_RUNTIME=cgr.dev/chainguard/node:latest

# ── builder ───────────────────────────────────────────────────────
FROM ${BASE_IMAGE_DEV} AS builder
USER root
WORKDIR /work

# Install pnpm via corepack (bundled with Node 22).
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate

# Copy the workspace boundary files first so this layer caches.
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json tsconfig.base.json ./
COPY packages packages
COPY services services
COPY apps apps
COPY scenarios scenarios

RUN pnpm install --frozen-lockfile --prefer-offline

# Build the entire dep graph for this service (tsc emit).
ARG SERVICE_PKG
RUN pnpm --filter "${SERVICE_PKG}..." build

# Produce a self-contained pruned tree under /deploy.
RUN pnpm --filter "${SERVICE_PKG}" deploy --prod /deploy

# ── runtime ───────────────────────────────────────────────────────
FROM ${BASE_IMAGE_RUNTIME} AS runtime
WORKDIR /app
COPY --from=builder /deploy ./

# Chainguard Node images run as user `node` (nonroot) by default.
# Entrypoint is /usr/bin/node.
USER node
CMD ["dist/main.js"]

The services/*/Dockerfile files override: - SERVICE_PKG — the workspace package name (@uci-demo/world-sim) - CMD — the compiled entry point

tsc emit, not tsx runtime

The current services use tsx src/main.ts for dev / start. For shipping, each service grows a small build script that emits to dist/:

"build": "tsc -p tsconfig.json"

The runtime image runs node dist/main.js. tsx is left out of the runtime — saves dependency surface and dispatches faster on cold start. The existing pnpm dev / pnpm start workflows keep working locally; the build script is additive.

Why corepack over npm install -g pnpm

Chainguard's Node image bundles corepack. corepack prepare is the recommended pnpm installation path for hermetic builds — it fetches a specific version, caches it under the image's /usr/local/share/corepack, and doesn't touch system npm state.

Why pnpm deploy --prod

pnpm deploy produces a self-contained directory with: - The target service's dist/ (compiled TS) - All dependencies (and transitive workspace package dist/) - A flat node_modules/ (no symlinks back to the workspace) - NO devDependencies

This is what gets copied to the runtime image. The pruned tree is typically 30-80 MB depending on the service (most of it is the @anthropic-ai/sdk for the copilot image, the rest stay around 20 MB).

cop-ui specifics

Next.js 16 standalone output. Requires output: "standalone" in next.config.ts (additive edit). The runtime image copies .next/standalone + .next/static + public/ and runs node server.js. ~150 MB image (Next.js has a substantial dependency footprint that can't be pruned further).

# apps/cop-ui/Dockerfile (excerpt)

FROM ${BASE_IMAGE_DEV} AS builder
USER root
WORKDIR /work
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json tsconfig.base.json ./
COPY packages packages
COPY apps apps
RUN pnpm install --frozen-lockfile --prefer-offline
RUN pnpm --filter @uci-demo/cop-ui build

FROM ${BASE_IMAGE_RUNTIME} AS runtime
WORKDIR /app
COPY --from=builder /work/apps/cop-ui/.next/standalone ./
COPY --from=builder /work/apps/cop-ui/.next/static ./apps/cop-ui/.next/static
COPY --from=builder /work/apps/cop-ui/public ./apps/cop-ui/public
USER node
EXPOSE 3000
CMD ["apps/cop-ui/server.js"]

Mosquitto

Stays as eclipse-mosquitto:2.0.21 from Docker Hub. Not built from this repo — the image is already battle-tested. The Helm chart's mosquitto.image.repository value lets operators override to a Chainguard variant if they have one.


Strand 2 — Helm chart (deploy/helm/uci-demo/)

Layout

deploy/helm/uci-demo/
├── Chart.yaml
├── values.yaml
├── values.schema.json          # JSON schema for IDE autocomplete
├── README.md
├── templates/
│   ├── _helpers.tpl            # naming + label helpers
│   ├── mosquitto-statefulset.yaml
│   ├── mosquitto-service.yaml
│   ├── mosquitto-configmap.yaml
│   ├── validator-deployment.yaml
│   ├── validator-service.yaml
│   ├── world-sim-deployment.yaml
│   ├── copilot-deployment.yaml
│   ├── copilot-secret.yaml      # optional Anthropic key
│   ├── solver-daemon-deployment.yaml
│   ├── red-agent-deployment.yaml
│   ├── adsb-bridge-deployment.yaml
│   ├── cop-ui-deployment.yaml
│   ├── cop-ui-service.yaml
│   ├── cop-ui-ingress.yaml
│   ├── eval-harness-job.yaml    # rendered only when .Values.evalHarness.enabled
│   └── NOTES.txt
└── tests/
    └── connection-test.yaml      # helm test target

values.yaml shape

# values.yaml — top-level structure
global:
  imageRegistry: ghcr.io/shebashio/uci-demo
  imageTag: ""                # overridden by Chart.AppVersion when empty
  pullPolicy: IfNotPresent
  imagePullSecrets: []        # add when pulling FIPS images from private cgr.dev

mosquitto:
  image:
    repository: eclipse-mosquitto
    tag: "2.0.21"
  persistence:
    enabled: false            # demo is stateless; true requires a StorageClass
    size: 1Gi
  resources:
    requests: { cpu: 50m, memory: 64Mi }
    limits:   { cpu: 200m, memory: 256Mi }

validator:
  enabled: true
  replicas: 1
  service:
    port: 7700
  resources:
    requests: { cpu: 100m, memory: 128Mi }
    limits:   { cpu: 500m, memory: 512Mi }

worldSim:
  enabled: true
  replicas: 1
  scenarioPath: "/scenarios/counter-uas-tripwire.yaml"
  resources: { ... }

copilot:
  enabled: true
  replicas: 1
  agent:
    use: "scripted"           # scripted | solver | llm
    llmProvider: "anthropic"  # only when agent.use == llm
  anthropicApiKey:
    enabled: false            # toggle to create the Secret + env binding
    value: ""                 # use --set-string copilot.anthropicApiKey.value=...
                              # OR use existingSecret below
    existingSecret: ""        # name of a pre-created Secret with key ANTHROPIC_API_KEY

solverDaemon:
  enabled: false              # opt-in; off by default to match demo defaults
  replicas: 1
  resources: { ... }

redAgent:
  enabled: false              # opt-in
  replicas: 1
  seed: 42
  intervalSec: 45
  maxConcurrent: 4
  policy: "scripted"          # scripted | solver-driven

adsbBridge:
  enabled: true
  replicas: 1
  feedUrl: "https://opensky-network.org/api/states/all"
  resources: { ... }

copUi:
  enabled: true
  replicas: 1
  service:
    type: ClusterIP
    port: 3000
  ingress:
    enabled: false
    className: ""              # set to "nginx" or "traefik" when enabling
    hosts:
      - host: uci-demo.local
        paths:
          - path: /
            pathType: Prefix
    tls: []
  resources: { ... }
  env:
    NEXT_PUBLIC_BROKER_WS_URL: "ws://mosquitto:9001"

evalHarness:
  enabled: false              # ships as a Job template; enable to create
  scenarios: "counter-uas-tripwire"
  agents: "scripted"
  degrade: "none"
  episodesPerCell: 2
  scenarioSimTimeoutSec: 30
  episodeTimeoutMs: 120000

Mosquitto in K8s

StatefulSet not Deployment for two reasons: 1. Pod has a stable network identity (mosquitto-0.mosquitto.default.svc.cluster.local) — useful when adding a second broker for HA in Phase II. 2. Optional PVC for retained-message durability across pod restarts.

Single replica by default. mosquitto.conf mounts from a ConfigMap that mirrors ops/mosquitto/mosquitto.conf (anonymous allow, listeners on 1883 + 9001). A future hardening pass replaces this with mTLS + ACLs.

Networking

Component Port K8s Service type Exposure
Mosquitto (MQTT) 1883 ClusterIP internal only
Mosquitto (WebSocket) 9001 ClusterIP exposed via cop-ui-ingress second path
Validator 7700 ClusterIP internal; ingress optional
cop-ui 3000 ClusterIP Ingress (TLS by operator)
All other services (no Service) bus-only, no HTTP surface

Browser-side cop-ui needs WebSocket to Mosquitto. Two options: 1. Ingress path-rewrite: cop-ui Ingress proxies /ws to mosquitto:9001. Browser uses NEXT_PUBLIC_BROKER_WS_URL=wss://uci-demo.local/ws. 2. Separate WebSocket service exposure: Mosquitto Service has type: LoadBalancer or a second Ingress. Browser uses wss://mqtt.uci-demo.local.

Chart defaults to option 1 (single ingress, simpler ops). The ingress template has a # nginx.ingress.kubernetes.io/... rewrite annotation block that operators uncomment per their controller.

eval-harness Job

The harness is one-shot. The chart renders a Job (NOT a CronJob — operator triggers manually) when evalHarness.enabled: true:

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "uci-demo.fullname" . }}-eval-{{ now | unixEpoch }}
spec:
  backoffLimit: 0
  ttlSecondsAfterFinished: 3600
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: eval-harness
          image: "{{ .Values.global.imageRegistry }}/eval-harness:{{ ... }}"
          args:
            - --scenarios={{ .Values.evalHarness.scenarios }}
            - --agents={{ .Values.evalHarness.agents }}
            - --degrade={{ .Values.evalHarness.degrade }}
            - --episodes-per-cell={{ .Values.evalHarness.episodesPerCell }}
            - --broker-url=mqtt://{{ include "uci-demo.fullname" . }}-mosquitto:1883
            - --report=/out/eval-report.json
          volumeMounts:
            - name: out
              mountPath: /out
      volumes:
        - name: out
          emptyDir: {}

To run multiple harness sweeps without manual editing, helm upgrade re-renders the Job with a new now | unixEpoch suffix; combined with --reuse-values --set evalHarness.enabled=true, this gives operators a one-liner sweep trigger.

values.schema.json

Hand-curated JSON Schema. Drives IDE autocomplete and gives helm install clear errors on mistyped values (e.g. copilot.agent.use: "soliver" flags as enum mismatch). Schema lists every required key + value enum constraints.


Strand 3 — Release workflow (.github/workflows/release-images.yml)

Triggers

on:
  push:
    branches: [main]            # publishes :main + :<sha>
    tags: ["v*"]                # publishes :<semver> + :latest
  workflow_dispatch:            # manual trigger for fixup runs

Permissions

permissions:
  contents: read
  packages: write               # push to GHCR
  id-token: write               # OIDC for cosign + setup-chainctl
  attestations: write           # SLSA build-provenance

Matrix shape

jobs:
  build:
    strategy:
      fail-fast: false
      matrix:
        component:
          - name: validator
            context: services/validator
            dockerfile: services/validator/Dockerfile
            pkg: "@uci-demo/validator"
          - name: world-sim
            context: services/world-sim
            dockerfile: services/world-sim/Dockerfile
            pkg: "@uci-demo/world-sim"
          # ... (6 more)

Each matrix leg: 1. Checkout 2. (Optional) chainguard-dev/setup-chainctl@v0.3.x when the CHAINGUARD_IDENTITY repository variable is set; sets BASE_IMAGE_DEV + BASE_IMAGE_RUNTIME build-args to FIPS paths. Otherwise sets them to the free Chainguard paths. 3. docker/setup-qemu-action + docker/setup-buildx-action 4. docker/login-action against ghcr.io with ${{ github.token }} 5. docker/build-push-action with multi-arch (amd64+arm64), build-args threaded through, tags computed by docker/metadata-action 6. anchore/sbom-action to generate the SBOM (syft → SPDX-JSON), attach to the image 7. sigstore/cosign-installer + cosign sign --yes <ref>@<digest> (keyless via OIDC) 8. actions/attest-build-provenance to write a SLSA L3 attestation 9. Output the image digest as a job output

Helm chart publish

After the image matrix completes (needs: build):

publish-chart:
  runs-on: ubuntu-latest
  needs: build
  steps:
    - actions/checkout@v6
    - azure/setup-helm@v4
    - run: |
        cd deploy/helm
        # Pin appVersion to the just-built tag.
        sed -i "s/^appVersion:.*/appVersion: \"${VERSION}\"/" uci-demo/Chart.yaml
        helm package uci-demo/
    - run: |
        echo "$GHCR_TOKEN" | helm registry login ghcr.io \
          --username "$GITHUB_ACTOR" --password-stdin
        helm push uci-demo-*.tgz oci://ghcr.io/${{ github.repository }}/charts
    - uses: sigstore/cosign-installer@v3
    - run: |
        DIGEST=$(crane digest oci://ghcr.io/${{ github.repository }}/charts/uci-demo:${VERSION})
        cosign sign --yes "ghcr.io/${{ github.repository }}/charts/uci-demo@${DIGEST}"

Tag scheme

Trigger Image tags Chart tag
push to main :main, :<short-sha> 0.0.0-main.<short-sha>
v1.2.3 git tag :1.2.3, :1.2, :1, :latest 1.2.3
workflow_dispatch (no ref) :dispatch-<short-sha> 0.0.0-dispatch.<sha>

Image visibility

GHCR packages default to private. After the first publish, an operator must (once) set them to public in the GitHub Packages UI. Documented in README + the workflow's NOTES.


Strand 4 — Chainguard authentication (when used)

Without Chainguard subscription (default)

  • Repository has no CHAINGUARD_IDENTITY variable set
  • Workflow uses cgr.dev/chainguard/node:latest-dev (builder) + cgr.dev/chainguard/node:latest (runtime)
  • Both images are publicly pullable, no auth needed
  • The resulting binaries are NOT FIPS-validated but ARE distroless + nonroot + minimal-surface
  • Suitable for the SBIR demo + most evaluator clusters

With Chainguard subscription

Setup (one-time, in the operator's Chainguard console):

  1. Create a Chainguard identity: chainctl iam identities create github-actions-uci-demo
  2. Bind the identity to the repository's OIDC subject: chainctl iam identities update <id> --add-claim-match sub=repo:<owner>/uci-demo:ref:refs/heads/main
  3. Grant the identity pull on the FIPS image: chainctl iam policies create --identity <id> --image cgr.dev/<chainguard-org>/node-fips
  4. In GitHub repo settings → Variables, add CHAINGUARD_IDENTITY=<the-uidp-from-step-1> and CHAINGUARD_ORG=<your-chainguard-org-name>

The workflow's conditional auth step:

- name: Authenticate to Chainguard (FIPS images)
  if: vars.CHAINGUARD_IDENTITY != ''
  uses: chainguard-dev/setup-chainctl@v0.3.1
  with:
    identity: ${{ vars.CHAINGUARD_IDENTITY }}

- name: Configure base image refs
  id: base
  run: |
    if [ -n "${{ vars.CHAINGUARD_IDENTITY }}" ]; then
      ORG="${{ vars.CHAINGUARD_ORG }}"
      echo "dev=cgr.dev/$ORG/node-fips:latest-dev"   >> "$GITHUB_OUTPUT"
      echo "runtime=cgr.dev/$ORG/node-fips:latest"   >> "$GITHUB_OUTPUT"
    else
      echo "dev=cgr.dev/chainguard/node:latest-dev"  >> "$GITHUB_OUTPUT"
      echo "runtime=cgr.dev/chainguard/node:latest"  >> "$GITHUB_OUTPUT"
    fi

Operators can flip a single GitHub Variable to switch the entire release pipeline to FIPS.


Strand 5 — Supply-chain attestations

Three artifacts attached to every image:

SBOM (Software Bill of Materials)

anchore/sbom-action@v0 runs syft against the built image, produces an SPDX-JSON SBOM, attaches it to the image as a referrer:

- name: Generate SBOM
  uses: anchore/sbom-action@v0
  with:
    image: ${{ steps.meta.outputs.tags }}
    format: spdx-json
    upload-artifact: true
    upload-release-assets: ${{ startsWith(github.ref, 'refs/tags/') }}

Cosign signature

Keyless via GitHub OIDC. No private key management — the signature is bound to the GitHub Actions workflow identity:

- uses: sigstore/cosign-installer@v3
- name: Sign image
  run: |
    cosign sign --yes "${IMAGE}@${DIGEST}"
  env:
    COSIGN_EXPERIMENTAL: "true"

Consumers verify with:

cosign verify ghcr.io/<owner>/uci-demo/world-sim:1.2.3 \
  --certificate-identity-regexp '^https://github\.com/<owner>/uci-demo/' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com

SLSA L3 build provenance

actions/attest-build-provenance@v3 writes an in-toto attestation asserting the image was built by THIS workflow, on THIS commit, from THIS source tree:

- uses: actions/attest-build-provenance@v3
  with:
    subject-name: ghcr.io/<owner>/uci-demo/<component>
    subject-digest: ${{ steps.build.outputs.digest }}
    push-to-registry: true

The combination — SBOM + signature + provenance — is what qualifies the image for SLSA Level 3, the bar a federal customer expects for procurement-eligible artifacts. The story aligns with the OpenSSF Scorecard run already in .github/workflows/scorecard.yml.


Strand 6 — kind smoke (deploy/kind-smoke.sh)

End-to-end local validation. Boots a kind cluster, builds all eight images, loads them in, helm-installs, asserts all pods Ready:

#!/usr/bin/env bash
set -euo pipefail

CLUSTER_NAME="uci-demo"
IMAGE_REGISTRY="local"
IMAGE_TAG="dev"

# 1. Bring up the cluster if it doesn't exist.
if ! kind get clusters | grep -qx "${CLUSTER_NAME}"; then
  kind create cluster --name "${CLUSTER_NAME}"
fi

# 2. Build all 8 images locally.
for svc in validator world-sim copilot solver-daemon red-agent adsb-bridge eval-harness; do
  docker build -t "${IMAGE_REGISTRY}/uci-demo/${svc}:${IMAGE_TAG}" \
    --build-arg "SERVICE_PKG=@uci-demo/${svc}" \
    -f "services/${svc}/Dockerfile" .
  kind load docker-image "${IMAGE_REGISTRY}/uci-demo/${svc}:${IMAGE_TAG}" \
    --name "${CLUSTER_NAME}"
done

docker build -t "${IMAGE_REGISTRY}/uci-demo/cop-ui:${IMAGE_TAG}" \
  -f "apps/cop-ui/Dockerfile" .
kind load docker-image "${IMAGE_REGISTRY}/uci-demo/cop-ui:${IMAGE_TAG}" \
  --name "${CLUSTER_NAME}"

# 3. Helm install.
helm upgrade --install uci-demo deploy/helm/uci-demo \
  --set "global.imageRegistry=${IMAGE_REGISTRY}/uci-demo" \
  --set "global.imageTag=${IMAGE_TAG}" \
  --set "global.pullPolicy=Never" \
  --wait --timeout=90s

# 4. Assert pods Ready.
kubectl wait --for=condition=ready pod \
  -l app.kubernetes.io/instance=uci-demo \
  --timeout=60s

echo "✓ All pods Ready. Port-forward to see the COP:"
echo "  kubectl port-forward svc/uci-demo-cop-ui 3000:3000"

This script doubles as the chart's CI gate — a follow-up workflow can run it on every PR that touches deploy/ or services/.


Strand 7 — Files changed by this PR

New files

plan/design-k8s-deploy.md               # this memo

services/validator/Dockerfile
services/world-sim/Dockerfile
services/copilot/Dockerfile
services/solver-daemon/Dockerfile
services/red-agent/Dockerfile
services/adsb-bridge/Dockerfile
services/eval-harness/Dockerfile
apps/cop-ui/Dockerfile

.dockerignore                            # shared exclusion list

deploy/helm/uci-demo/Chart.yaml
deploy/helm/uci-demo/values.yaml
deploy/helm/uci-demo/values.schema.json
deploy/helm/uci-demo/README.md
deploy/helm/uci-demo/templates/_helpers.tpl
deploy/helm/uci-demo/templates/mosquitto-statefulset.yaml
deploy/helm/uci-demo/templates/mosquitto-service.yaml
deploy/helm/uci-demo/templates/mosquitto-configmap.yaml
deploy/helm/uci-demo/templates/validator-deployment.yaml
deploy/helm/uci-demo/templates/validator-service.yaml
deploy/helm/uci-demo/templates/world-sim-deployment.yaml
deploy/helm/uci-demo/templates/copilot-deployment.yaml
deploy/helm/uci-demo/templates/copilot-secret.yaml
deploy/helm/uci-demo/templates/solver-daemon-deployment.yaml
deploy/helm/uci-demo/templates/red-agent-deployment.yaml
deploy/helm/uci-demo/templates/adsb-bridge-deployment.yaml
deploy/helm/uci-demo/templates/cop-ui-deployment.yaml
deploy/helm/uci-demo/templates/cop-ui-service.yaml
deploy/helm/uci-demo/templates/cop-ui-ingress.yaml
deploy/helm/uci-demo/templates/eval-harness-job.yaml
deploy/helm/uci-demo/templates/NOTES.txt

deploy/kind-smoke.sh

.github/workflows/release-images.yml

Edited files (additive)

  • apps/cop-ui/next.config.ts — add output: "standalone"
  • services/validator/package.json — add "build": "tsc -p tsconfig.json"
  • services/world-sim/package.json — same
  • services/copilot/package.json — same
  • services/solver-daemon/package.json — same
  • services/red-agent/package.json — same
  • services/adsb-bridge/package.json — same
  • services/eval-harness/package.json — same
  • README.md — add "Deploying to Kubernetes" section
  • CLAUDE.md — add "Deploy notes" callout pointing at this memo

Strand 8 — Open questions

  1. Should Mosquitto use a Chainguard image? There's no chainguard/eclipse-mosquitto today. Sticking with the official image keeps the broker on the upstream support path. Operators who need fully-Chainguarded infra can swap in a custom build later.

Recommendation: stay on eclipse-mosquitto:2.0.21 for the demo; document the upgrade path.

  1. Do we ship a CronJob for the eval-harness? Operator-triggered one-shot Jobs are simpler. Recurring sweeps belong in CI, not in the cluster. Recommendation: ship Job template only; no CronJob.

  2. WebSocket access for the browser-side cop-ui — same ingress or separate? Single ingress is simpler ops but couples the two paths to one cert. Recommendation: single ingress with path-based routing (/ → cop-ui, /ws → mosquitto:9001); operators override via the rewrite annotation block.

  3. Should the chart bundle Mosquitto or take a dependency on the Bitnami Mosquitto chart? Bundling is simpler and lets us pin exact behavior; depending on Bitnami means an upstream upgrade path. Recommendation: bundle for this PR; Bitnami sub-chart migration is a follow-up if a customer needs it.

  4. Image tagging on tag pushes — :1, :1.2, :1.2.3, :latest all at once? Standard semver mobility lets helm install --version "^1" work. Recommendation: ship all four; the cost is negligible.


Strand 9 — Acceptance gates

For the PR landing this memo + the implementation:

  • ./deploy/kind-smoke.sh boots a kind cluster, builds 8 images, helm-installs the chart, and reaches Ready within 90 seconds.
  • helm lint deploy/helm/uci-demo/ clean.
  • helm template deploy/helm/uci-demo/ | kubectl apply --dry-run=client -f - validates against the schema.
  • pnpm -r typecheck and pnpm -r test green (the additive build scripts must not break existing typecheck).
  • The release workflow runs on push to the PR branch (via workflow_dispatch) and produces 8 image tags + 1 chart tag visible in ghcr.io/<owner>/uci-demo/.
  • Cosign signatures verify against the GitHub OIDC issuer.
  • SBOM is attached and readable via cosign download sbom ghcr.io/<owner>/uci-demo/world-sim:<tag>.
  • README + CLAUDE.md updated.

For the named follow-ups (out of scope for this PR):

  • EKS / GKE / AKS provider modules (terraform under deploy/iac/).
  • Network policy templates.
  • HPA + PodDisruptionBudget templates.
  • Mosquitto mTLS + ACL hardening pass.
  • kind smoke as a CI gate (.github/workflows/k8s-smoke.yml).

Strand 10 — Why this design over alternatives

Why Helm over Kustomize? A single OCI artifact published next to the images gives operators helm install oci://... — one command, no git clone. Kustomize requires source-tree access. For a demo + SBIR proposal, the Helm path reads more professionally and demonstrates the supply-chain story end-to-end (chart is signed, images are signed, both ride GHCR).

Why Chainguard over Alpine / Debian? Distroless + nonroot by default + the FIPS variant ready when a customer needs it. Alpine's musl libc has a track record of subtle Node breakage; Debian is fat. Chainguard is the path forward (PR #27 memo settled this).

Why per-service images instead of a monolith? Decouples deploy cadence (the copilot ships LLM SDKs; world-sim doesn't). Smaller attack surface per image. Allows independent scaling in Phase II. The pnpm workspace makes per-service Dockerfiles tractable via pnpm deploy --prod.

Why tsc emit instead of shipping tsx? Smaller runtime image, no JIT cold-start cost, lets Node enable bytecode caching. The dev / start developer workflows keep using tsx; the production artifact uses compiled JS. Both paths live side-by-side.

Why build-arg with a free fallback for Chainguard? Future-proofs the FIPS migration without making the repo dead-on-arrival for anyone without a Chainguard subscription. The artifacts produced without FIPS credentials are still distroless + nonroot — a real hardening uplift, just not the FIPS-validated variant.

Why cosign keyless + SLSA L3? Federal procurement increasingly requires SLSA Level 3 + verifiable signatures. Keyless cosign + the official GitHub attest-build-provenance action are the lightest implementation path; no HSM, no key rotation, no secret leakage risk.


Strand 11 — What comes next after this PR

In sequence:

  1. kind-smoke CI gate.github/workflows/k8s-smoke.yml runs deploy/kind-smoke.sh on every PR that touches deploy/ or services/*/Dockerfile. Becomes part of the PR review.

  2. Cloud-provider modulesdeploy/iac/eks/, deploy/iac/gke/, deploy/iac/aks/ with Terraform that boots a cluster + an ingress controller + cert-manager, then helm install the chart. Demos against a real cloud cluster live behind a separate workflow.

  3. Hardening pass — Network policies, PodDisruptionBudgets, HPA for cop-ui, mTLS + ACLs on Mosquitto. Probably a dedicated PR once a customer engagement names a target deployment env.

  4. Multi-broker HA — Mosquitto bridge mode or EMQX swap, with the chart selecting one via mosquitto.driver: mosquitto|emqx.

  5. Operator-facing dashboards — Grafana dashboards for the payoff / belief / blueprint metrics published on the bus. Probably a deploy/grafana/ sub-tree with importable JSON.

This memo + its implementation PR is the foundation everything else rests on.