Design — Kubernetes deploy + GHCR publication (Phase 0 follow-up)¶
Pre-code design memo for packaging the demo's eight runtime artifacts
as container images, publishing them to the repo's GitHub Container
Registry, and shipping a Helm chart so an operator can
helm install the entire stack on a Kubernetes cluster.
This is the engineering substrate behind the "Phase II transition"
story in plan/sbir-phase-ii-tdp-delivery.md:
the demo runs on commodity Kubernetes, signs and attests every
artifact, and is built from FIPS-validated base images.
The PR landing this memo bundles memo + Dockerfiles + Helm chart +
release workflow + kind smoke script so a reviewer can clone the
branch, run ./deploy/kind-smoke.sh, and see all eight components
healthy in a local cluster in under five minutes.
The headline acceptance gate: a fresh GitHub Actions run on a v*
tag produces eight signed container images and one signed Helm chart
in ghcr.io/<owner>/uci-demo/..., each with attached SBOMs and SLSA
provenance attestations, and a reviewer can install the chart in a
kind cluster via
helm install uci-demo oci://ghcr.io/<owner>/uci-demo/charts/uci-demo
--version <tag> and watch every pod go Ready.
Strand 0 — What we ship in this PR¶
Container images (8):
| Image | Source | Runtime kind |
|---|---|---|
validator |
services/validator/ |
long-lived Deployment (HTTP :7700) |
world-sim |
services/world-sim/ |
long-lived Deployment |
copilot |
services/copilot/ |
long-lived Deployment |
solver-daemon |
services/solver-daemon/ |
long-lived Deployment |
red-agent |
services/red-agent/ |
long-lived Deployment |
adsb-bridge |
services/adsb-bridge/ |
long-lived Deployment |
eval-harness |
services/eval-harness/ |
one-shot Job template |
cop-ui |
apps/cop-ui/ |
long-lived Deployment (HTTP :3000) |
services/demo-publisher/ is intentionally NOT shipped — it's a
local-only smoke test that publishes intentionally broken payloads
to exercise the validator's negative cases.
Helm chart — deploy/helm/uci-demo/, published as an OCI
artifact at oci://ghcr.io/<owner>/uci-demo/charts/uci-demo.
Release workflow — .github/workflows/release-images.yml runs
on push to main (tags as main + <sha>) and on v* tags
(tags as the semver). Each image is multi-arch (linux/amd64 +
linux/arm64), SBOM-attached (syft), cosign-signed (keyless via
GitHub OIDC), and SLSA-attested (build-provenance for SLSA L3).
kind smoke script — deploy/kind-smoke.sh boots a local kind
cluster, builds all eight images locally, loads them into kind,
helm-installs the chart, and asserts every pod reaches Ready
within 90 seconds.
What does NOT ship in this PR (named so reviewers don't ask):
- Cloud-provider-specific Terraform / Pulumi for EKS / GKE / AKS
- cert-manager / external-dns / ingress-nginx — the chart leaves
Ingressannotations stubbed; operators bring their own controller - A second chart for the Mosquitto broker — Mosquitto is part of the uci-demo chart, not a sub-chart (intentional: keeps demo install one command)
- Horizontal autoscaling — the demo is single-replica; HPA is a Phase II concern
- Network policies — opinionated allowlists belong in a Phase II hardening pass
Strand 1 — Base-image strategy (Chainguard FIPS)¶
Decision¶
Build against cgr.dev/<chainguard-org>/node-fips (Production-tier
Chainguard image, FIPS-validated OpenSSL). Default fallback to
cgr.dev/chainguard/node (free, non-FIPS) when no Chainguard org is
configured — so the demo builds out-of-the-box for anyone who clones
the repo, and "flips to FIPS" by setting one repository variable.
Critical finding¶
cgr.dev/chainguard/node-fips:latest is not publicly pullable.
Pulling that image without authentication returns HTTP 401. The
Chainguard documentation directs FIPS users to pull from their
org-scoped registry path: cgr.dev/<your-chainguard-org>/node-fips.
Access requires a Chainguard Production subscription.
The implication for this repo:
- The Dockerfiles must NOT hard-code
cgr.dev/chainguard/node-fips. They must accept aBASE_IMAGEbuild-arg. - The release workflow must NOT assume
setup-chainctlwill succeed without configuration. It runs the Chainguard auth step only when theCHAINGUARD_IDENTITYrepository variable is set, and falls back to the freecgr.dev/chainguard/nodeotherwise. - The Helm chart references
image.repositoryper service. Operators override viavalues.yaml.
Multi-stage Dockerfile pattern¶
Every Node service Dockerfile follows the same shape:
# syntax=docker/dockerfile:1.9
ARG BASE_IMAGE_DEV=cgr.dev/chainguard/node:latest-dev
ARG BASE_IMAGE_RUNTIME=cgr.dev/chainguard/node:latest
# ── builder ───────────────────────────────────────────────────────
FROM ${BASE_IMAGE_DEV} AS builder
USER root
WORKDIR /work
# Install pnpm via corepack (bundled with Node 22).
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
# Copy the workspace boundary files first so this layer caches.
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json tsconfig.base.json ./
COPY packages packages
COPY services services
COPY apps apps
COPY scenarios scenarios
RUN pnpm install --frozen-lockfile --prefer-offline
# Build the entire dep graph for this service (tsc emit).
ARG SERVICE_PKG
RUN pnpm --filter "${SERVICE_PKG}..." build
# Produce a self-contained pruned tree under /deploy.
RUN pnpm --filter "${SERVICE_PKG}" deploy --prod /deploy
# ── runtime ───────────────────────────────────────────────────────
FROM ${BASE_IMAGE_RUNTIME} AS runtime
WORKDIR /app
COPY --from=builder /deploy ./
# Chainguard Node images run as user `node` (nonroot) by default.
# Entrypoint is /usr/bin/node.
USER node
CMD ["dist/main.js"]
The services/*/Dockerfile files override:
- SERVICE_PKG — the workspace package name (@uci-demo/world-sim)
- CMD — the compiled entry point
tsc emit, not tsx runtime¶
The current services use tsx src/main.ts for dev / start. For
shipping, each service grows a small build script that emits to
dist/:
"build": "tsc -p tsconfig.json"
The runtime image runs node dist/main.js. tsx is left out of the
runtime — saves dependency surface and dispatches faster on cold
start. The existing pnpm dev / pnpm start workflows keep working
locally; the build script is additive.
Why corepack over npm install -g pnpm¶
Chainguard's Node image bundles corepack. corepack prepare is
the recommended pnpm installation path for hermetic builds — it
fetches a specific version, caches it under the image's
/usr/local/share/corepack, and doesn't touch system npm state.
Why pnpm deploy --prod¶
pnpm deploy produces a self-contained directory with:
- The target service's dist/ (compiled TS)
- All dependencies (and transitive workspace package dist/)
- A flat node_modules/ (no symlinks back to the workspace)
- NO devDependencies
This is what gets copied to the runtime image. The pruned tree is
typically 30-80 MB depending on the service (most of it is the
@anthropic-ai/sdk for the copilot image, the rest stay around
20 MB).
cop-ui specifics¶
Next.js 16 standalone output. Requires output: "standalone" in
next.config.ts (additive edit). The runtime image copies
.next/standalone + .next/static + public/ and runs
node server.js. ~150 MB image (Next.js has a substantial
dependency footprint that can't be pruned further).
# apps/cop-ui/Dockerfile (excerpt)
FROM ${BASE_IMAGE_DEV} AS builder
USER root
WORKDIR /work
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json tsconfig.base.json ./
COPY packages packages
COPY apps apps
RUN pnpm install --frozen-lockfile --prefer-offline
RUN pnpm --filter @uci-demo/cop-ui build
FROM ${BASE_IMAGE_RUNTIME} AS runtime
WORKDIR /app
COPY --from=builder /work/apps/cop-ui/.next/standalone ./
COPY --from=builder /work/apps/cop-ui/.next/static ./apps/cop-ui/.next/static
COPY --from=builder /work/apps/cop-ui/public ./apps/cop-ui/public
USER node
EXPOSE 3000
CMD ["apps/cop-ui/server.js"]
Mosquitto¶
Stays as eclipse-mosquitto:2.0.21 from Docker Hub. Not built from
this repo — the image is already battle-tested. The Helm chart's
mosquitto.image.repository value lets operators override to a
Chainguard variant if they have one.
Strand 2 — Helm chart (deploy/helm/uci-demo/)¶
Layout¶
deploy/helm/uci-demo/
├── Chart.yaml
├── values.yaml
├── values.schema.json # JSON schema for IDE autocomplete
├── README.md
├── templates/
│ ├── _helpers.tpl # naming + label helpers
│ ├── mosquitto-statefulset.yaml
│ ├── mosquitto-service.yaml
│ ├── mosquitto-configmap.yaml
│ ├── validator-deployment.yaml
│ ├── validator-service.yaml
│ ├── world-sim-deployment.yaml
│ ├── copilot-deployment.yaml
│ ├── copilot-secret.yaml # optional Anthropic key
│ ├── solver-daemon-deployment.yaml
│ ├── red-agent-deployment.yaml
│ ├── adsb-bridge-deployment.yaml
│ ├── cop-ui-deployment.yaml
│ ├── cop-ui-service.yaml
│ ├── cop-ui-ingress.yaml
│ ├── eval-harness-job.yaml # rendered only when .Values.evalHarness.enabled
│ └── NOTES.txt
└── tests/
└── connection-test.yaml # helm test target
values.yaml shape¶
# values.yaml — top-level structure
global:
imageRegistry: ghcr.io/shebashio/uci-demo
imageTag: "" # overridden by Chart.AppVersion when empty
pullPolicy: IfNotPresent
imagePullSecrets: [] # add when pulling FIPS images from private cgr.dev
mosquitto:
image:
repository: eclipse-mosquitto
tag: "2.0.21"
persistence:
enabled: false # demo is stateless; true requires a StorageClass
size: 1Gi
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 200m, memory: 256Mi }
validator:
enabled: true
replicas: 1
service:
port: 7700
resources:
requests: { cpu: 100m, memory: 128Mi }
limits: { cpu: 500m, memory: 512Mi }
worldSim:
enabled: true
replicas: 1
scenarioPath: "/scenarios/counter-uas-tripwire.yaml"
resources: { ... }
copilot:
enabled: true
replicas: 1
agent:
use: "scripted" # scripted | solver | llm
llmProvider: "anthropic" # only when agent.use == llm
anthropicApiKey:
enabled: false # toggle to create the Secret + env binding
value: "" # use --set-string copilot.anthropicApiKey.value=...
# OR use existingSecret below
existingSecret: "" # name of a pre-created Secret with key ANTHROPIC_API_KEY
solverDaemon:
enabled: false # opt-in; off by default to match demo defaults
replicas: 1
resources: { ... }
redAgent:
enabled: false # opt-in
replicas: 1
seed: 42
intervalSec: 45
maxConcurrent: 4
policy: "scripted" # scripted | solver-driven
adsbBridge:
enabled: true
replicas: 1
feedUrl: "https://opensky-network.org/api/states/all"
resources: { ... }
copUi:
enabled: true
replicas: 1
service:
type: ClusterIP
port: 3000
ingress:
enabled: false
className: "" # set to "nginx" or "traefik" when enabling
hosts:
- host: uci-demo.local
paths:
- path: /
pathType: Prefix
tls: []
resources: { ... }
env:
NEXT_PUBLIC_BROKER_WS_URL: "ws://mosquitto:9001"
evalHarness:
enabled: false # ships as a Job template; enable to create
scenarios: "counter-uas-tripwire"
agents: "scripted"
degrade: "none"
episodesPerCell: 2
scenarioSimTimeoutSec: 30
episodeTimeoutMs: 120000
Mosquitto in K8s¶
StatefulSet not Deployment for two reasons:
1. Pod has a stable network identity (mosquitto-0.mosquitto.default.svc.cluster.local) — useful when adding a second broker for HA in Phase II.
2. Optional PVC for retained-message durability across pod restarts.
Single replica by default. mosquitto.conf mounts from a ConfigMap
that mirrors ops/mosquitto/mosquitto.conf (anonymous allow,
listeners on 1883 + 9001). A future hardening pass replaces this
with mTLS + ACLs.
Networking¶
| Component | Port | K8s Service type | Exposure |
|---|---|---|---|
| Mosquitto (MQTT) | 1883 | ClusterIP | internal only |
| Mosquitto (WebSocket) | 9001 | ClusterIP | exposed via cop-ui-ingress second path |
| Validator | 7700 | ClusterIP | internal; ingress optional |
| cop-ui | 3000 | ClusterIP | Ingress (TLS by operator) |
| All other services | — | (no Service) | bus-only, no HTTP surface |
Browser-side cop-ui needs WebSocket to Mosquitto. Two options:
1. Ingress path-rewrite: cop-ui Ingress proxies /ws to
mosquitto:9001. Browser uses
NEXT_PUBLIC_BROKER_WS_URL=wss://uci-demo.local/ws.
2. Separate WebSocket service exposure: Mosquitto Service has
type: LoadBalancer or a second Ingress. Browser uses
wss://mqtt.uci-demo.local.
Chart defaults to option 1 (single ingress, simpler ops). The
ingress template has a # nginx.ingress.kubernetes.io/... rewrite
annotation block that operators uncomment per their controller.
eval-harness Job¶
The harness is one-shot. The chart renders a Job (NOT a CronJob
— operator triggers manually) when evalHarness.enabled: true:
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "uci-demo.fullname" . }}-eval-{{ now | unixEpoch }}
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 3600
template:
spec:
restartPolicy: Never
containers:
- name: eval-harness
image: "{{ .Values.global.imageRegistry }}/eval-harness:{{ ... }}"
args:
- --scenarios={{ .Values.evalHarness.scenarios }}
- --agents={{ .Values.evalHarness.agents }}
- --degrade={{ .Values.evalHarness.degrade }}
- --episodes-per-cell={{ .Values.evalHarness.episodesPerCell }}
- --broker-url=mqtt://{{ include "uci-demo.fullname" . }}-mosquitto:1883
- --report=/out/eval-report.json
volumeMounts:
- name: out
mountPath: /out
volumes:
- name: out
emptyDir: {}
To run multiple harness sweeps without manual editing,
helm upgrade re-renders the Job with a new
now | unixEpoch suffix; combined with --reuse-values --set
evalHarness.enabled=true, this gives operators a one-liner sweep
trigger.
values.schema.json¶
Hand-curated JSON Schema. Drives IDE autocomplete and gives
helm install clear errors on mistyped values (e.g.
copilot.agent.use: "soliver" flags as enum mismatch). Schema lists
every required key + value enum constraints.
Strand 3 — Release workflow (.github/workflows/release-images.yml)¶
Triggers¶
on:
push:
branches: [main] # publishes :main + :<sha>
tags: ["v*"] # publishes :<semver> + :latest
workflow_dispatch: # manual trigger for fixup runs
Permissions¶
permissions:
contents: read
packages: write # push to GHCR
id-token: write # OIDC for cosign + setup-chainctl
attestations: write # SLSA build-provenance
Matrix shape¶
jobs:
build:
strategy:
fail-fast: false
matrix:
component:
- name: validator
context: services/validator
dockerfile: services/validator/Dockerfile
pkg: "@uci-demo/validator"
- name: world-sim
context: services/world-sim
dockerfile: services/world-sim/Dockerfile
pkg: "@uci-demo/world-sim"
# ... (6 more)
Each matrix leg:
1. Checkout
2. (Optional) chainguard-dev/setup-chainctl@v0.3.x when the
CHAINGUARD_IDENTITY repository variable is set; sets
BASE_IMAGE_DEV + BASE_IMAGE_RUNTIME build-args to FIPS paths.
Otherwise sets them to the free Chainguard paths.
3. docker/setup-qemu-action + docker/setup-buildx-action
4. docker/login-action against ghcr.io with ${{ github.token }}
5. docker/build-push-action with multi-arch (amd64+arm64), build-args
threaded through, tags computed by docker/metadata-action
6. anchore/sbom-action to generate the SBOM (syft → SPDX-JSON),
attach to the image
7. sigstore/cosign-installer + cosign sign --yes <ref>@<digest>
(keyless via OIDC)
8. actions/attest-build-provenance to write a SLSA L3 attestation
9. Output the image digest as a job output
Helm chart publish¶
After the image matrix completes (needs: build):
publish-chart:
runs-on: ubuntu-latest
needs: build
steps:
- actions/checkout@v6
- azure/setup-helm@v4
- run: |
cd deploy/helm
# Pin appVersion to the just-built tag.
sed -i "s/^appVersion:.*/appVersion: \"${VERSION}\"/" uci-demo/Chart.yaml
helm package uci-demo/
- run: |
echo "$GHCR_TOKEN" | helm registry login ghcr.io \
--username "$GITHUB_ACTOR" --password-stdin
helm push uci-demo-*.tgz oci://ghcr.io/${{ github.repository }}/charts
- uses: sigstore/cosign-installer@v3
- run: |
DIGEST=$(crane digest oci://ghcr.io/${{ github.repository }}/charts/uci-demo:${VERSION})
cosign sign --yes "ghcr.io/${{ github.repository }}/charts/uci-demo@${DIGEST}"
Tag scheme¶
| Trigger | Image tags | Chart tag |
|---|---|---|
push to main |
:main, :<short-sha> |
0.0.0-main.<short-sha> |
v1.2.3 git tag |
:1.2.3, :1.2, :1, :latest |
1.2.3 |
workflow_dispatch (no ref) |
:dispatch-<short-sha> |
0.0.0-dispatch.<sha> |
Image visibility¶
GHCR packages default to private. After the first publish, an
operator must (once) set them to public in the GitHub
Packages UI. Documented in README + the workflow's NOTES.
Strand 4 — Chainguard authentication (when used)¶
Without Chainguard subscription (default)¶
- Repository has no
CHAINGUARD_IDENTITYvariable set - Workflow uses
cgr.dev/chainguard/node:latest-dev(builder) +cgr.dev/chainguard/node:latest(runtime) - Both images are publicly pullable, no auth needed
- The resulting binaries are NOT FIPS-validated but ARE distroless + nonroot + minimal-surface
- Suitable for the SBIR demo + most evaluator clusters
With Chainguard subscription¶
Setup (one-time, in the operator's Chainguard console):
- Create a Chainguard identity:
chainctl iam identities create github-actions-uci-demo - Bind the identity to the repository's OIDC subject:
chainctl iam identities update <id> --add-claim-match sub=repo:<owner>/uci-demo:ref:refs/heads/main - Grant the identity pull on the FIPS image:
chainctl iam policies create --identity <id> --image cgr.dev/<chainguard-org>/node-fips - In GitHub repo settings → Variables, add
CHAINGUARD_IDENTITY=<the-uidp-from-step-1>andCHAINGUARD_ORG=<your-chainguard-org-name>
The workflow's conditional auth step:
- name: Authenticate to Chainguard (FIPS images)
if: vars.CHAINGUARD_IDENTITY != ''
uses: chainguard-dev/setup-chainctl@v0.3.1
with:
identity: ${{ vars.CHAINGUARD_IDENTITY }}
- name: Configure base image refs
id: base
run: |
if [ -n "${{ vars.CHAINGUARD_IDENTITY }}" ]; then
ORG="${{ vars.CHAINGUARD_ORG }}"
echo "dev=cgr.dev/$ORG/node-fips:latest-dev" >> "$GITHUB_OUTPUT"
echo "runtime=cgr.dev/$ORG/node-fips:latest" >> "$GITHUB_OUTPUT"
else
echo "dev=cgr.dev/chainguard/node:latest-dev" >> "$GITHUB_OUTPUT"
echo "runtime=cgr.dev/chainguard/node:latest" >> "$GITHUB_OUTPUT"
fi
Operators can flip a single GitHub Variable to switch the entire release pipeline to FIPS.
Strand 5 — Supply-chain attestations¶
Three artifacts attached to every image:
SBOM (Software Bill of Materials)¶
anchore/sbom-action@v0 runs syft against the built image,
produces an SPDX-JSON SBOM, attaches it to the image as a
referrer:
- name: Generate SBOM
uses: anchore/sbom-action@v0
with:
image: ${{ steps.meta.outputs.tags }}
format: spdx-json
upload-artifact: true
upload-release-assets: ${{ startsWith(github.ref, 'refs/tags/') }}
Cosign signature¶
Keyless via GitHub OIDC. No private key management — the signature is bound to the GitHub Actions workflow identity:
- uses: sigstore/cosign-installer@v3
- name: Sign image
run: |
cosign sign --yes "${IMAGE}@${DIGEST}"
env:
COSIGN_EXPERIMENTAL: "true"
Consumers verify with:
cosign verify ghcr.io/<owner>/uci-demo/world-sim:1.2.3 \
--certificate-identity-regexp '^https://github\.com/<owner>/uci-demo/' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com
SLSA L3 build provenance¶
actions/attest-build-provenance@v3 writes an in-toto attestation
asserting the image was built by THIS workflow, on THIS commit,
from THIS source tree:
- uses: actions/attest-build-provenance@v3
with:
subject-name: ghcr.io/<owner>/uci-demo/<component>
subject-digest: ${{ steps.build.outputs.digest }}
push-to-registry: true
The combination — SBOM + signature + provenance — is what
qualifies the image for SLSA Level 3, the bar a federal customer
expects for procurement-eligible artifacts. The story aligns with
the OpenSSF Scorecard run already in .github/workflows/scorecard.yml.
Strand 6 — kind smoke (deploy/kind-smoke.sh)¶
End-to-end local validation. Boots a kind cluster, builds all eight images, loads them in, helm-installs, asserts all pods Ready:
#!/usr/bin/env bash
set -euo pipefail
CLUSTER_NAME="uci-demo"
IMAGE_REGISTRY="local"
IMAGE_TAG="dev"
# 1. Bring up the cluster if it doesn't exist.
if ! kind get clusters | grep -qx "${CLUSTER_NAME}"; then
kind create cluster --name "${CLUSTER_NAME}"
fi
# 2. Build all 8 images locally.
for svc in validator world-sim copilot solver-daemon red-agent adsb-bridge eval-harness; do
docker build -t "${IMAGE_REGISTRY}/uci-demo/${svc}:${IMAGE_TAG}" \
--build-arg "SERVICE_PKG=@uci-demo/${svc}" \
-f "services/${svc}/Dockerfile" .
kind load docker-image "${IMAGE_REGISTRY}/uci-demo/${svc}:${IMAGE_TAG}" \
--name "${CLUSTER_NAME}"
done
docker build -t "${IMAGE_REGISTRY}/uci-demo/cop-ui:${IMAGE_TAG}" \
-f "apps/cop-ui/Dockerfile" .
kind load docker-image "${IMAGE_REGISTRY}/uci-demo/cop-ui:${IMAGE_TAG}" \
--name "${CLUSTER_NAME}"
# 3. Helm install.
helm upgrade --install uci-demo deploy/helm/uci-demo \
--set "global.imageRegistry=${IMAGE_REGISTRY}/uci-demo" \
--set "global.imageTag=${IMAGE_TAG}" \
--set "global.pullPolicy=Never" \
--wait --timeout=90s
# 4. Assert pods Ready.
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/instance=uci-demo \
--timeout=60s
echo "✓ All pods Ready. Port-forward to see the COP:"
echo " kubectl port-forward svc/uci-demo-cop-ui 3000:3000"
This script doubles as the chart's CI gate — a follow-up workflow
can run it on every PR that touches deploy/ or services/.
Strand 7 — Files changed by this PR¶
New files¶
plan/design-k8s-deploy.md # this memo
services/validator/Dockerfile
services/world-sim/Dockerfile
services/copilot/Dockerfile
services/solver-daemon/Dockerfile
services/red-agent/Dockerfile
services/adsb-bridge/Dockerfile
services/eval-harness/Dockerfile
apps/cop-ui/Dockerfile
.dockerignore # shared exclusion list
deploy/helm/uci-demo/Chart.yaml
deploy/helm/uci-demo/values.yaml
deploy/helm/uci-demo/values.schema.json
deploy/helm/uci-demo/README.md
deploy/helm/uci-demo/templates/_helpers.tpl
deploy/helm/uci-demo/templates/mosquitto-statefulset.yaml
deploy/helm/uci-demo/templates/mosquitto-service.yaml
deploy/helm/uci-demo/templates/mosquitto-configmap.yaml
deploy/helm/uci-demo/templates/validator-deployment.yaml
deploy/helm/uci-demo/templates/validator-service.yaml
deploy/helm/uci-demo/templates/world-sim-deployment.yaml
deploy/helm/uci-demo/templates/copilot-deployment.yaml
deploy/helm/uci-demo/templates/copilot-secret.yaml
deploy/helm/uci-demo/templates/solver-daemon-deployment.yaml
deploy/helm/uci-demo/templates/red-agent-deployment.yaml
deploy/helm/uci-demo/templates/adsb-bridge-deployment.yaml
deploy/helm/uci-demo/templates/cop-ui-deployment.yaml
deploy/helm/uci-demo/templates/cop-ui-service.yaml
deploy/helm/uci-demo/templates/cop-ui-ingress.yaml
deploy/helm/uci-demo/templates/eval-harness-job.yaml
deploy/helm/uci-demo/templates/NOTES.txt
deploy/kind-smoke.sh
.github/workflows/release-images.yml
Edited files (additive)¶
apps/cop-ui/next.config.ts— addoutput: "standalone"services/validator/package.json— add"build": "tsc -p tsconfig.json"services/world-sim/package.json— sameservices/copilot/package.json— sameservices/solver-daemon/package.json— sameservices/red-agent/package.json— sameservices/adsb-bridge/package.json— sameservices/eval-harness/package.json— sameREADME.md— add "Deploying to Kubernetes" sectionCLAUDE.md— add "Deploy notes" callout pointing at this memo
Strand 8 — Open questions¶
- Should Mosquitto use a Chainguard image? There's no
chainguard/eclipse-mosquittotoday. Sticking with the official image keeps the broker on the upstream support path. Operators who need fully-Chainguarded infra can swap in a custom build later.
Recommendation: stay on eclipse-mosquitto:2.0.21 for the
demo; document the upgrade path.
-
Do we ship a CronJob for the eval-harness? Operator-triggered one-shot Jobs are simpler. Recurring sweeps belong in CI, not in the cluster. Recommendation: ship Job template only; no CronJob.
-
WebSocket access for the browser-side cop-ui — same ingress or separate? Single ingress is simpler ops but couples the two paths to one cert. Recommendation: single ingress with path-based routing (
/→ cop-ui,/ws→ mosquitto:9001); operators override via the rewrite annotation block. -
Should the chart bundle Mosquitto or take a dependency on the Bitnami Mosquitto chart? Bundling is simpler and lets us pin exact behavior; depending on Bitnami means an upstream upgrade path. Recommendation: bundle for this PR; Bitnami sub-chart migration is a follow-up if a customer needs it.
-
Image tagging on tag pushes —
:1,:1.2,:1.2.3,:latestall at once? Standard semver mobility letshelm install --version "^1"work. Recommendation: ship all four; the cost is negligible.
Strand 9 — Acceptance gates¶
For the PR landing this memo + the implementation:
-
./deploy/kind-smoke.shboots a kind cluster, builds 8 images, helm-installs the chart, and reaches Ready within 90 seconds. -
helm lint deploy/helm/uci-demo/clean. -
helm template deploy/helm/uci-demo/ | kubectl apply --dry-run=client -f -validates against the schema. -
pnpm -r typecheckandpnpm -r testgreen (the additivebuildscripts must not break existing typecheck). - The release workflow runs on push to the PR branch (via
workflow_dispatch) and produces 8 image tags + 1 chart tag visible inghcr.io/<owner>/uci-demo/. - Cosign signatures verify against the GitHub OIDC issuer.
- SBOM is attached and readable via
cosign download sbom ghcr.io/<owner>/uci-demo/world-sim:<tag>. - README + CLAUDE.md updated.
For the named follow-ups (out of scope for this PR):
- EKS / GKE / AKS provider modules (terraform under
deploy/iac/). - Network policy templates.
- HPA + PodDisruptionBudget templates.
- Mosquitto mTLS + ACL hardening pass.
- kind smoke as a CI gate (
.github/workflows/k8s-smoke.yml).
Strand 10 — Why this design over alternatives¶
Why Helm over Kustomize? A single OCI artifact published next
to the images gives operators helm install oci://... — one command,
no git clone. Kustomize requires source-tree access. For a demo +
SBIR proposal, the Helm path reads more professionally and
demonstrates the supply-chain story end-to-end (chart is signed,
images are signed, both ride GHCR).
Why Chainguard over Alpine / Debian? Distroless + nonroot by default + the FIPS variant ready when a customer needs it. Alpine's musl libc has a track record of subtle Node breakage; Debian is fat. Chainguard is the path forward (PR #27 memo settled this).
Why per-service images instead of a monolith? Decouples deploy
cadence (the copilot ships LLM SDKs; world-sim doesn't). Smaller
attack surface per image. Allows independent scaling in Phase II.
The pnpm workspace makes per-service Dockerfiles tractable via
pnpm deploy --prod.
Why tsc emit instead of shipping tsx? Smaller runtime image, no
JIT cold-start cost, lets Node enable bytecode caching. The
dev / start developer workflows keep using tsx; the production
artifact uses compiled JS. Both paths live side-by-side.
Why build-arg with a free fallback for Chainguard? Future-proofs the FIPS migration without making the repo dead-on-arrival for anyone without a Chainguard subscription. The artifacts produced without FIPS credentials are still distroless + nonroot — a real hardening uplift, just not the FIPS-validated variant.
Why cosign keyless + SLSA L3? Federal procurement increasingly
requires SLSA Level 3 + verifiable signatures. Keyless cosign + the
official GitHub attest-build-provenance action are the lightest
implementation path; no HSM, no key rotation, no secret leakage
risk.
Strand 11 — What comes next after this PR¶
In sequence:
-
kind-smoke CI gate —
.github/workflows/k8s-smoke.ymlrunsdeploy/kind-smoke.shon every PR that touchesdeploy/orservices/*/Dockerfile. Becomes part of the PR review. -
Cloud-provider modules —
deploy/iac/eks/,deploy/iac/gke/,deploy/iac/aks/with Terraform that boots a cluster + an ingress controller + cert-manager, thenhelm installthe chart. Demos against a real cloud cluster live behind a separate workflow. -
Hardening pass — Network policies, PodDisruptionBudgets, HPA for cop-ui, mTLS + ACLs on Mosquitto. Probably a dedicated PR once a customer engagement names a target deployment env.
-
Multi-broker HA — Mosquitto bridge mode or EMQX swap, with the chart selecting one via
mosquitto.driver: mosquitto|emqx. -
Operator-facing dashboards — Grafana dashboards for the payoff / belief / blueprint metrics published on the bus. Probably a
deploy/grafana/sub-tree with importable JSON.
This memo + its implementation PR is the foundation everything else rests on.