Files
agatha 1b3468b72d Feat: Add production-grade multi-stage container image for UI
Two-stage build (node:22-slim builder + nginxinc/nginx-unprivileged:alpine
runtime) with SPA fallback routing, long-lived cache headers for fingerprinted
assets, non-root user (UID 101), and no Node.js toolchain in runtime image
(82 MB vs 329 MB+ single-stage). Verified by ui/tests/build/verify_production_image.sh
covering build, health, SPA routing, non-root, stdout logging, cache-control
headers, SIGTERM exit 0, Node.js absent, secret-free layers, and dep-layer
cache hit. 102 integration tests still pass; shellcheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 20:18:55 +00:00

7.9 KiB

Feature Specification: Production-Grade UI Container Image

Feature Branch: 011-ui-prod-dockerfile Created: 2026-05-07 Status: Draft Input: User description: "Production-grade UI container image build"

User Scenarios & Testing (mandatory)

User Story 1 - UI Serves Reliably in Production (Priority: P1)

A production deployment starts the UI container and it serves the compiled application correctly — returning the app shell for all routes, responding quickly, and shutting down cleanly when the orchestrator stops it.

Why this priority: A container that can't serve traffic is not deployable. All other properties (security, build speed) are meaningless without a running service.

Independent Test: Build the image, start the container, and verify the root path returns a 200 response. Stopping the container produces a clean exit. This alone constitutes a deployable MVP.

Acceptance Scenarios:

  1. Given a built production image, When the container starts, Then it serves the application on port 8080 within 30 seconds.
  2. Given the container is running, When a request is made to any client-side route (e.g., /library, /tags), Then the server returns the app shell (200 OK) so client-side routing can take over.
  3. Given the container is running, When a static asset is requested, Then it is returned with appropriate caching headers.
  4. Given a running container, When the orchestrator sends a stop signal, Then the container exits with code 0 within a reasonable timeout.
  5. Given the production image, When a health probe is issued to a designated endpoint, Then the container reports healthy.

User Story 2 - Minimal, Secure Container (Priority: P2)

The production image contains only what is needed to serve static files — no build tools, no source code, no node_modules. It runs as a non-privileged user.

Why this priority: Shipping build tools and source code in production images increases attack surface and image size. Running as root violates least-privilege principles.

Independent Test: Inspect the running container — confirm the process user is non-root; attempt to import or run a Node.js binary inside the image and confirm it is absent.

Acceptance Scenarios:

  1. Given the production image, When the running process user is inspected, Then it is not root (UID ≠ 0).
  2. Given the production image, When the image contents are inspected, Then node_modules/, source TypeScript files, and the Node.js runtime are absent.
  3. Given the production image, When image layer history is inspected, Then no secrets, API keys, or credentials appear in any layer command.
  4. Given the production image, When the image size is measured, Then it is substantially smaller than a single-stage image that includes the Node.js toolchain.

User Story 3 - Fast, Reproducible Builds (Priority: P3)

Rebuilding the image after a source-only change (no dependency changes) reuses the dependency installation layer from cache, completing in seconds rather than minutes.

Why this priority: Slow builds impede the development feedback loop and CI pipeline throughput. Dependency installs are the dominant time cost.

Independent Test: Build once, then change a source file and build again — the build output confirms the dependency layer was served from cache.

Acceptance Scenarios:

  1. Given the image has been built once, When only a source file is changed and the image is rebuilt, Then the dependency installation step is skipped (cache hit).
  2. Given a dependency file is changed, When the image is rebuilt, Then the dependency installation step runs fresh (cache miss is correct behaviour).
  3. Given two successive builds with identical inputs, Then both produce functionally identical output.

Edge Cases

  • What happens when the container starts but the built assets are missing or corrupted?
  • How does the server handle requests for non-existent routes that should fall back to the app shell (SPA routing)?
  • What happens when the container receives a stop signal while actively serving requests?
  • What happens if the port is already in use at startup?

Requirements (mandatory)

Functional Requirements

  • FR-001: The production image MUST be built via a multi-stage process — a build stage compiles the application into static assets, and a separate runtime stage serves only those assets.
  • FR-002: The runtime stage MUST NOT contain the Node.js runtime, npm, source TypeScript, or node_modules/.
  • FR-003: The container MUST serve the application on port 8080. External orchestrators (docker-compose, Kubernetes ingress) map this to port 80 as needed.
  • FR-004: The container MUST handle SPA (single-page application) routing by returning the app shell for any unmatched path, so client-side routing works correctly.
  • FR-005: The container MUST run as a non-root user.
  • FR-006: The container MUST expose a health-check endpoint that returns success when the service is ready to accept traffic.
  • FR-007: The container MUST exit with code 0 when sent a graceful stop signal.
  • FR-008: Static assets MUST be served with cache-control headers that enable client-side caching for fingerprinted assets.
  • FR-009: The Dockerfile MUST structure layers so that dependency installation is cached independently from source code changes.
  • FR-010: The build MUST be reproducible — given the same source and lockfile, successive builds produce equivalent images.
  • FR-011: No credentials, secrets, or API keys MUST appear in any image layer.

Key Entities

  • Build Stage: The intermediate container that installs dependencies and compiles source into static assets; discarded after build.
  • Static Assets: The compiled output (HTML, JS bundles, CSS, fonts, images) that the runtime stage serves.
  • Runtime Stage: The minimal final image containing only a web server and the compiled static assets.
  • Production Image: The tagged, distributable image produced by the build; used directly in deployment.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: The container serves a 200 response on port 8080 within 30 seconds of starting.
  • SC-002: The production image is substantially smaller than a single-stage image that retains the Node.js toolchain. A manual size comparison after the initial build confirms the multi-stage approach delivers a meaningful reduction (expected: >60% reduction).
  • SC-003: A source-only rebuild completes in under 30 seconds (dependency layer served from cache).
  • SC-004: All 11 functional requirements pass automated verification on every build.
  • SC-005: The running container process has UID ≠ 0, confirmed by automated check.
  • SC-006: No existing integration tests regress after the Dockerfile and supporting files are introduced.

Assumptions

  • The Angular application is built for production using the standard build toolchain (ng build --configuration production or equivalent), producing a dist/ output directory.
  • The production web server is responsible for SPA fallback routing (returning the app shell for unmatched paths).
  • Gzip or Brotli compression at the web server layer is desirable but not mandatory for the initial implementation.
  • The UI container does not need to proxy API requests — it communicates with the API directly from the browser (the Angular proxy config is only used in local development).
  • The container listens on port 8080 (non-privileged, enabling non-root operation). External load balancers or ingress controllers map this to port 80. TLS termination occurs upstream.
  • The build context is the ui/ directory; files excluded from the build context (source maps in CI, node_modules/ already present locally) are managed via .dockerignore.
  • The same verification approach used for the API image (a shell script as the TDD artefact) applies here.