Feat: Add production-grade multi-stage container image for UI

Two-stage build (node:22-slim builder + nginxinc/nginx-unprivileged:alpine
runtime) with SPA fallback routing, long-lived cache headers for fingerprinted
assets, non-root user (UID 101), and no Node.js toolchain in runtime image
(82 MB vs 329 MB+ single-stage). Verified by ui/tests/build/verify_production_image.sh
covering build, health, SPA routing, non-root, stdout logging, cache-control
headers, SIGTERM exit 0, Node.js absent, secret-free layers, and dep-layer
cache hit. 102 integration tests still pass; shellcheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-07 20:18:55 +00:00
parent 12176471e1
commit 1b3468b72d
16 changed files with 885 additions and 3 deletions

View File

@@ -0,0 +1,110 @@
# Feature Specification: Production-Grade UI Container Image
**Feature Branch**: `011-ui-prod-dockerfile`
**Created**: 2026-05-07
**Status**: Draft
**Input**: User description: "Production-grade UI container image build"
## User Scenarios & Testing *(mandatory)*
### User Story 1 - UI Serves Reliably in Production (Priority: P1)
A production deployment starts the UI container and it serves the compiled application correctly — returning the app shell for all routes, responding quickly, and shutting down cleanly when the orchestrator stops it.
**Why this priority**: A container that can't serve traffic is not deployable. All other properties (security, build speed) are meaningless without a running service.
**Independent Test**: Build the image, start the container, and verify the root path returns a 200 response. Stopping the container produces a clean exit. This alone constitutes a deployable MVP.
**Acceptance Scenarios**:
1. **Given** a built production image, **When** the container starts, **Then** it serves the application on port 8080 within 30 seconds.
2. **Given** the container is running, **When** a request is made to any client-side route (e.g., `/library`, `/tags`), **Then** the server returns the app shell (200 OK) so client-side routing can take over.
3. **Given** the container is running, **When** a static asset is requested, **Then** it is returned with appropriate caching headers.
4. **Given** a running container, **When** the orchestrator sends a stop signal, **Then** the container exits with code 0 within a reasonable timeout.
5. **Given** the production image, **When** a health probe is issued to a designated endpoint, **Then** the container reports healthy.
---
### User Story 2 - Minimal, Secure Container (Priority: P2)
The production image contains only what is needed to serve static files — no build tools, no source code, no `node_modules`. It runs as a non-privileged user.
**Why this priority**: Shipping build tools and source code in production images increases attack surface and image size. Running as root violates least-privilege principles.
**Independent Test**: Inspect the running container — confirm the process user is non-root; attempt to import or run a Node.js binary inside the image and confirm it is absent.
**Acceptance Scenarios**:
1. **Given** the production image, **When** the running process user is inspected, **Then** it is not root (UID ≠ 0).
2. **Given** the production image, **When** the image contents are inspected, **Then** `node_modules/`, source TypeScript files, and the Node.js runtime are absent.
3. **Given** the production image, **When** image layer history is inspected, **Then** no secrets, API keys, or credentials appear in any layer command.
4. **Given** the production image, **When** the image size is measured, **Then** it is substantially smaller than a single-stage image that includes the Node.js toolchain.
---
### User Story 3 - Fast, Reproducible Builds (Priority: P3)
Rebuilding the image after a source-only change (no dependency changes) reuses the dependency installation layer from cache, completing in seconds rather than minutes.
**Why this priority**: Slow builds impede the development feedback loop and CI pipeline throughput. Dependency installs are the dominant time cost.
**Independent Test**: Build once, then change a source file and build again — the build output confirms the dependency layer was served from cache.
**Acceptance Scenarios**:
1. **Given** the image has been built once, **When** only a source file is changed and the image is rebuilt, **Then** the dependency installation step is skipped (cache hit).
2. **Given** a dependency file is changed, **When** the image is rebuilt, **Then** the dependency installation step runs fresh (cache miss is correct behaviour).
3. **Given** two successive builds with identical inputs, **Then** both produce functionally identical output.
---
### Edge Cases
- What happens when the container starts but the built assets are missing or corrupted?
- How does the server handle requests for non-existent routes that should fall back to the app shell (SPA routing)?
- What happens when the container receives a stop signal while actively serving requests?
- What happens if the port is already in use at startup?
## Requirements *(mandatory)*
### Functional Requirements
- **FR-001**: The production image MUST be built via a multi-stage process — a build stage compiles the application into static assets, and a separate runtime stage serves only those assets.
- **FR-002**: The runtime stage MUST NOT contain the Node.js runtime, npm, source TypeScript, or `node_modules/`.
- **FR-003**: The container MUST serve the application on port 8080. External orchestrators (docker-compose, Kubernetes ingress) map this to port 80 as needed.
- **FR-004**: The container MUST handle SPA (single-page application) routing by returning the app shell for any unmatched path, so client-side routing works correctly.
- **FR-005**: The container MUST run as a non-root user.
- **FR-006**: The container MUST expose a health-check endpoint that returns success when the service is ready to accept traffic.
- **FR-007**: The container MUST exit with code 0 when sent a graceful stop signal.
- **FR-008**: Static assets MUST be served with cache-control headers that enable client-side caching for fingerprinted assets.
- **FR-009**: The Dockerfile MUST structure layers so that dependency installation is cached independently from source code changes.
- **FR-010**: The build MUST be reproducible — given the same source and lockfile, successive builds produce equivalent images.
- **FR-011**: No credentials, secrets, or API keys MUST appear in any image layer.
### Key Entities
- **Build Stage**: The intermediate container that installs dependencies and compiles source into static assets; discarded after build.
- **Static Assets**: The compiled output (HTML, JS bundles, CSS, fonts, images) that the runtime stage serves.
- **Runtime Stage**: The minimal final image containing only a web server and the compiled static assets.
- **Production Image**: The tagged, distributable image produced by the build; used directly in deployment.
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: The container serves a 200 response on port 8080 within 30 seconds of starting.
- **SC-002**: The production image is substantially smaller than a single-stage image that retains the Node.js toolchain. A manual size comparison after the initial build confirms the multi-stage approach delivers a meaningful reduction (expected: >60% reduction).
- **SC-003**: A source-only rebuild completes in under 30 seconds (dependency layer served from cache).
- **SC-004**: All 11 functional requirements pass automated verification on every build.
- **SC-005**: The running container process has UID ≠ 0, confirmed by automated check.
- **SC-006**: No existing integration tests regress after the Dockerfile and supporting files are introduced.
## Assumptions
- The Angular application is built for production using the standard build toolchain (`ng build --configuration production` or equivalent), producing a `dist/` output directory.
- The production web server is responsible for SPA fallback routing (returning the app shell for unmatched paths).
- Gzip or Brotli compression at the web server layer is desirable but not mandatory for the initial implementation.
- The UI container does not need to proxy API requests — it communicates with the API directly from the browser (the Angular proxy config is only used in local development).
- The container listens on port 8080 (non-privileged, enabling non-root operation). External load balancers or ingress controllers map this to port 80. TLS termination occurs upstream.
- The build context is the `ui/` directory; files excluded from the build context (source maps in CI, `node_modules/` already present locally) are managed via `.dockerignore`.
- The same verification approach used for the API image (a shell script as the TDD artefact) applies here.