Feat: Add production-grade multi-stage container image for API
Two-stage build (uv builder + python:3.12-slim runtime) with non-root user (UID 1001), no dev deps, layer-cache-optimised dep install, and graceful SIGTERM shutdown. Verified by api/tests/build/verify_production_image.sh covering build, health endpoint, non-root, stdout logging, secret-free layers, missing-env-var exit, and dep-layer cache hit. All 102 integration tests still pass; shellcheck clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
96
specs/010-api-prod-dockerfile/spec.md
Normal file
96
specs/010-api-prod-dockerfile/spec.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# Feature Specification: Production-Grade API Container Image
|
||||
|
||||
**Feature Branch**: `010-api-prod-dockerfile`
|
||||
**Created**: 2026-05-07
|
||||
**Status**: Draft
|
||||
**Input**: User description: "We need a production-grade Dockerfile for the API to start preparing for a production deployment."
|
||||
|
||||
## User Scenarios & Testing *(mandatory)*
|
||||
|
||||
### User Story 1 — API Runs Reliably in Production (Priority: P1)
|
||||
|
||||
An operator builds and runs the API container in a production environment. The container starts successfully, serves requests, and can be health-checked by an orchestrator (e.g., Kubernetes). When the orchestrator signals shutdown, the container drains in-flight requests before exiting cleanly, avoiding dropped connections.
|
||||
|
||||
**Why this priority**: Without a correctly functioning container, no production deployment is possible. This is the baseline that all other stories depend on.
|
||||
|
||||
**Independent Test**: Build the image from source, run the container with required env vars, call the health endpoint, send SIGTERM, and verify the process exits cleanly with code 0. No other stories are required.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** a built container image and all required env vars, **When** the container starts, **Then** it begins serving requests within 30 seconds and the health endpoint returns a success response.
|
||||
2. **Given** a running container, **When** a SIGTERM is received, **Then** the process finishes any in-flight requests and exits with code 0 within 30 seconds.
|
||||
3. **Given** a running container, **When** a required env var is absent, **Then** the process exits immediately with a non-zero code and logs a clear error message identifying the missing variable.
|
||||
|
||||
---
|
||||
|
||||
### User Story 2 — Minimal, Secure Container (Priority: P2)
|
||||
|
||||
A security-conscious operator audits the container image before promotion to production. They verify the API process does not run as root, the image contains no development tooling or test artefacts, and no credentials are baked into the image layers.
|
||||
|
||||
**Why this priority**: Running as root or including unnecessary tools increases the blast radius of any container breakout. This is a production-readiness requirement, not optional hardening.
|
||||
|
||||
**Independent Test**: Inspect the built image to confirm the runtime user is non-root, confirm no dev/test files are present in the image layers, and scan the image with a standard vulnerability scanner. Passes independently of any deployment environment.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** a built container image, **When** the running process user is inspected, **Then** the API process runs as a non-root user with a numeric UID.
|
||||
2. **Given** a built container image, **When** the image layers are inspected, **Then** no development dependencies, test files, or local configuration are present.
|
||||
3. **Given** a built container image, **When** the image layers are scanned for hardcoded secrets, **Then** no credentials, API keys, or secret values are found embedded in any layer.
|
||||
|
||||
---
|
||||
|
||||
### User Story 3 — Fast, Reproducible Builds (Priority: P3)
|
||||
|
||||
A developer rebuilds the container image after a code change. The build completes quickly because unchanged layers (dependencies) are cached. Given identical source inputs, the resulting image is functionally equivalent across builds, enabling confident CI/CD promotion.
|
||||
|
||||
**Why this priority**: Slow or non-deterministic builds reduce developer confidence and slow deployment pipelines. Important for velocity, but the container already works (P1, P2) before this is optimised.
|
||||
|
||||
**Independent Test**: Build the image twice from the same source; confirm the second build reuses dependency layers from cache and completes significantly faster than the first.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** an image built once, **When** only application source files change and the image is rebuilt, **Then** the dependency installation step is served from cache and the rebuild completes faster than a clean build.
|
||||
2. **Given** two builds from the same source commit, **When** the images are run, **Then** both produce identical API behaviour.
|
||||
|
||||
---
|
||||
|
||||
### Edge Cases
|
||||
|
||||
- What happens when the database is unavailable at container startup?
|
||||
- What happens when the container is sent SIGKILL instead of SIGTERM (hard kill by orchestrator)?
|
||||
- What happens if the container runs out of memory mid-request?
|
||||
- How does the image behave when run read-only filesystem (`--read-only`)?
|
||||
|
||||
## Requirements *(mandatory)*
|
||||
|
||||
### Functional Requirements
|
||||
|
||||
- **FR-001**: The container image MUST start the API service and begin accepting requests without manual intervention after supplying required env vars.
|
||||
- **FR-002**: The container image MUST expose a health check that an orchestrator can poll to determine service readiness.
|
||||
- **FR-003**: The container image MUST handle the SIGTERM signal by completing in-flight requests then exiting cleanly within 30 seconds.
|
||||
- **FR-004**: The container image MUST run the API process as a non-root, non-privileged user.
|
||||
- **FR-005**: The container image MUST NOT contain development dependencies, test files, source control metadata, or local configuration files.
|
||||
- **FR-006**: The container image MUST NOT contain any hardcoded credentials, secrets, or environment-specific values — all configuration MUST be supplied via environment variables at runtime.
|
||||
- **FR-007**: The container image MUST log to standard output and standard error so logs are captured by the container runtime without additional configuration.
|
||||
- **FR-008**: The container image MUST be buildable reproducibly from the same source inputs — a rebuild from the same commit MUST produce a functionally equivalent image.
|
||||
- **FR-009**: Rebuilding the image after a source-only change (no dependency changes) MUST reuse the cached dependency installation layer.
|
||||
|
||||
## Success Criteria *(mandatory)*
|
||||
|
||||
### Measurable Outcomes
|
||||
|
||||
- **SC-001**: The container starts and serves its first successful health-check response within 30 seconds of launch with all required env vars present.
|
||||
- **SC-002**: The container exits cleanly (code 0) within 30 seconds of receiving a SIGTERM, with no in-flight requests dropped.
|
||||
- **SC-003**: The API process inside the container runs as a non-root user (inspectable via container runtime tooling).
|
||||
- **SC-004**: A rebuild after a source-only change completes in under 60 seconds on a warm cache (dependency layer reused).
|
||||
- **SC-005**: The image contains zero hardcoded secrets (verifiable by static layer inspection).
|
||||
- **SC-006**: All API logs appear on stdout/stderr and are captured by the container runtime log driver without additional sidecar or configuration.
|
||||
|
||||
## Assumptions
|
||||
|
||||
- The existing test Dockerfile (used by the integration test stack) is not suitable for production and will remain separate; this feature produces a distinct production image.
|
||||
- All required runtime configuration (database URL, S3 credentials, JWT secret, etc.) will be injected as environment variables by the deployment platform — the image itself carries no environment-specific values.
|
||||
- The deployment target supports OCI-compatible container images (Kubernetes, Docker, etc.).
|
||||
- No persistent local storage is needed by the API container; all state lives in the database and object storage.
|
||||
- The production image does not need to run database migrations; migrations are applied by a separate step in the deployment pipeline.
|
||||
- A single-architecture image (linux/amd64) is sufficient for the initial production target.
|
||||
Reference in New Issue
Block a user