Two-stage build (uv builder + python:3.12-slim runtime) with non-root user (UID 1001), no dev deps, layer-cache-optimised dep install, and graceful SIGTERM shutdown. Verified by api/tests/build/verify_production_image.sh covering build, health endpoint, non-root, stdout logging, secret-free layers, missing-env-var exit, and dep-layer cache hit. All 102 integration tests still pass; shellcheck clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.3 KiB
Research: Production API Container Image
Decision 1 — Use a Separate Dockerfile.prod
Decision: Add api/Dockerfile.prod alongside the existing api/Dockerfile.
Rationale: The existing api/Dockerfile installs dev dependencies (.[dev]), mounts source with --reload, and is used by the Docker Compose integration test stack. Modifying it would break make test-integration. A separate file keeps the two images independent with zero coupling.
Alternatives considered:
- Build-arg flag in a single Dockerfile: adds conditional complexity and makes both files harder to read.
- Rename existing to
Dockerfile.devand makeDockerfilethe production image: would require updatingdocker-compose.test.ymlwith an explicit file reference — a wider change than needed for this feature.
Decision 2 — Multi-Stage Build: uv Builder + python:3.12-slim Runtime
Decision: Two-stage build. Stage 1 (builder) uses ghcr.io/astral-sh/uv:python3.12-bookworm-slim to install production dependencies into a virtual environment. Stage 2 (runtime) uses python:3.12-slim and copies only the .venv and application source from the builder. uv is not present in the final image.
Rationale:
- uv's official Docker image is the fastest, most correct way to produce a pinned, bytecode-compiled venv from
uv.lock. - Keeping uv out of the runtime image reduces attack surface and image size.
python:3.12-slimis a well-maintained, widely scanned base; using it for the runtime stage aligns with existing project images.
Layer caching strategy:
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project ← cache hits when only source changes
COPY app/ ./app/ ← only reaches here on source changes
--no-install-project installs all listed dependencies without the project package itself. The project source is then copied separately. This means a source-only change reuses the dependency layer from cache.
Environment variables for optimal builds:
UV_COMPILE_BYTECODE=1— pre-compile.pycfiles; slightly larger venv but faster cold starts.UV_LINK_MODE=copy— avoids hard-link issues when copying between image layers.UV_PYTHON_DOWNLOADS=never— ensures the builder stage uses the bundled Python, not a downloaded one.
Alternatives considered:
- Installing deps into the system Python (
--system): rejected because it pollutes the base image and makes it harder to copy deps cleanly into the runtime stage. - Using a single
FROM python:3.12-slimwith pip: slower builds, no lockfile pinning, no bytecode compilation step.
Decision 3 — Non-Root User (UID 1001, System User)
Decision: Create a system user appuser with GID/UID 1001 in the runtime stage. All owned files are chown-ed at COPY time using --chown=appuser:appgroup.
Rationale: Running as root inside a container is a container breakout risk. A numeric UID (rather than a named user that might not exist on the host) is required by some Kubernetes pod security admission policies. UID 1001 avoids collision with UID 1000 (the typical first interactive user on a Linux host) while remaining a predictable, inspectable value.
Alternatives considered:
- UID 1000: small risk of collision with host user when bind mounts are involved.
USER nobody:nobody(UID 65534) works but its name and UID are not consistent across distros.
Decision 4 — SIGTERM Graceful Shutdown via uvicorn --timeout-graceful-shutdown
Decision: Use uvicorn's built-in --timeout-graceful-shutdown 30 flag. No process supervisor (tini, s6) is required.
Rationale: uvicorn handles SIGTERM natively when run as PID 1 in single-worker mode (the production Dockerfile runs one worker). On SIGTERM it stops accepting new connections, waits up to --timeout-graceful-shutdown seconds for in-flight requests to complete, then exits with code 0. No additional init system is needed.
Alternatives considered:
- tini: adds a small init shim that reaps zombies and forwards signals. Not necessary with a single uvicorn worker (no child processes to reap).
- Gunicorn + uvicorn workers: more complex; appropriate for multi-worker setups but the deployment platform (Kubernetes) scales horizontally via pod replicas rather than in-process workers.
Decision 5 — curl for HEALTHCHECK
Decision: Install curl (via apt-get --no-install-recommends) in the runtime stage and use it in the HEALTHCHECK directive.
Rationale: The existing dev Dockerfile already installs curl for the same reason. curl -f exits non-zero on HTTP errors, making it a reliable single-command health probe. A Python one-liner adds interpreter startup overhead (~100ms) per check; curl is ~5ms.
Alternatives considered:
wget -q --spider: available on Alpine but not on Debian-slim by default; requires separate install.- Python
urllib.request: no extra install, but slower and adds noise to the process table during health checks.
Decision 6 — TDD Verification via Shell Script
Decision: Write api/tests/build/verify_production_image.sh before Dockerfile.prod. The script builds the image and runs behavioral checks (health endpoint, non-root user, clean SIGTERM exit). It is the "failing test" per §5.1.
Rationale: The production image is a build artifact, not Python business logic. pytest cannot test a Docker image without Docker-in-Docker, which the current CI stack does not support. A shell script run on the host (via make verify-prod) is the appropriate TDD vehicle for this artefact type.
Verification steps the script covers:
docker build -f api/Dockerfile.prod api/→ fails (red) until Dockerfile.prod exists.- Run container with required env vars; wait for health endpoint →
GET /api/v1/healthreturns 200. - Inspect running process user → UID ≠ 0 (non-root).
- Send SIGTERM to container; assert exit code 0 within 30s (graceful shutdown).
- Assert dev packages are absent:
pip show pytestinside container must return non-zero.
Alternatives considered:
- pytest with docker SDK: requires
dockerPython package and DinD in CI; rejected as over-engineered for a single-file build artifact. - Manual verification only: rejected because §5.1 mandates automated failing tests before production code.