Files
agatha 12176471e1 Feat: Add production-grade multi-stage container image for API
Two-stage build (uv builder + python:3.12-slim runtime) with non-root
user (UID 1001), no dev deps, layer-cache-optimised dep install, and
graceful SIGTERM shutdown. Verified by api/tests/build/verify_production_image.sh
covering build, health endpoint, non-root, stdout logging, secret-free
layers, missing-env-var exit, and dep-layer cache hit. All 102 integration
tests still pass; shellcheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 19:59:29 +00:00

12 KiB
Raw Permalink Blame History

Tasks: Production-Grade API Container Image

Input: Design documents from specs/010-api-prod-dockerfile/ Prerequisites: plan.md , spec.md , research.md , contracts/container.md , quickstart.md

Tests: TDD is non-negotiable (§5.1). The "test" for a Docker build artefact is api/tests/build/verify_production_image.sh, written before api/Dockerfile.prod exists. Running the script immediately fails (red) because the build step cannot find the file; writing Dockerfile.prod turns it green.

Organization: Phase 1 sets up Makefile targets and .dockerignore; Phase 3 (US1) writes the verification script and the Dockerfile; Phase 4 (US2) extends the script with security checks; Phase 5 (US3) extends it with a cache-hit check; Phase 6 polishes.

Format: [ID] [P?] [Story] Description

  • [P]: Can run in parallel with other [P] tasks in the same phase
  • [Story]: Which user story this task belongs to
  • Exact file paths included in every task description

Phase 1: Setup

  • T001 Add build-prod and verify-prod targets (and their .PHONY entries) to the root Makefile at /workspace/Makefile: build-prod runs docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:latest; verify-prod runs bash api/tests/build/verify_production_image.sh

  • T002 Update api/.dockerignore at /workspace/api/.dockerignore: append three lines — tests/, alembic/, and alembic.ini — so these are excluded from the production build context (the Dockerfile.prod copies only app/ explicitly, but excluding them from the context keeps the transfer to the Docker daemon fast)


Phase 2: Foundational

  • T003 Create directory api/tests/build/ at /workspace/api/tests/build/ with mkdir -p and add a .gitkeep so the directory is tracked

Checkpoint: Directory structure is ready; Makefile and .dockerignore are updated.


Phase 3: User Story 1 — API Runs Reliably in Production (Priority: P1) 🎯 MVP

Goal: The container builds, starts, serves the health endpoint, and exits cleanly on SIGTERM.

Independent Test: make verify-prod — passes when Dockerfile.prod exists and all US1 checks pass.

Test for User Story 1 (TDD red — write first, confirm failure before T005)

  • T004 [US1] Create api/tests/build/verify_production_image.sh as an executable bash script (chmod +x) with #!/usr/bin/env bash and set -euo pipefail; the script MUST:
    1. Set IMAGE="reactbin-api-prod:verify-$$" and PG_CONTAINER="" and APP_CONTAINER="";
    2. Define a cleanup() function that runs docker rm -f "$APP_CONTAINER" "$PG_CONTAINER" 2>/dev/null || true and docker rmi "$IMAGE" 2>/dev/null || true, and register it with trap cleanup EXIT;
    3. [US1 check 1 — build] Run docker build -f api/Dockerfile.prod api/ -t "$IMAGE" — this is the line that fails red because api/Dockerfile.prod does not yet exist; print [verify] Building $IMAGE... before and [verify] Build OK after;
    4. [US1 check 2 — start with real DB] Launch a throwaway postgres: PG_CONTAINER=$(docker run -d -e POSTGRES_DB=reactbin_verify -e POSTGRES_USER=verify -e POSTGRES_PASSWORD=verify postgres:16-alpine); poll docker exec "$PG_CONTAINER" pg_isready -U verify up to 30 × 1s, fail if timeout; capture PG_IP=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' "$PG_CONTAINER");
    5. Start the production container: APP_CONTAINER=$(docker run -d -p 18000:8000 -e JWT_SECRET_KEY=verify-key -e OWNER_USERNAME=testowner -e OWNER_PASSWORD=testpassword -e DATABASE_URL="postgresql+asyncpg://verify:verify@${PG_IP}:5432/reactbin_verify" -e S3_ENDPOINT_URL=http://noop:9000 -e S3_BUCKET_NAME=noop -e S3_ACCESS_KEY_ID=noop -e S3_SECRET_ACCESS_KEY=noop -e S3_REGION=us-east-1 "$IMAGE"); note — S3 credentials are placeholders; the health endpoint does not require S3;
    6. [US1 check 3 — health endpoint] Poll curl -sf http://localhost:18000/api/v1/health up to 30 × 1s, fail with a message if timeout; print [verify] Health check passed on success;
    7. [US1 check 4 — SIGTERM → exit 0] Run docker stop "$APP_CONTAINER" (sends SIGTERM); capture EXIT_CODE=$(docker wait "$APP_CONTAINER"); assert "$EXIT_CODE" -eq 0, fail with FAIL: non-zero exit $EXIT_CODE otherwise; print [verify] Graceful shutdown OK (exit $EXIT_CODE);
    8. Print [verify] US1 checks passed.
    9. [C3 — missing env var → non-zero exit] Run docker run --rm -e JWT_SECRET_KEY=verify-key "$IMAGE" 2>&1; assert the exit code is non-zero (OWNER_USERNAME is absent so Pydantic settings validation must fail at startup); print [verify] Missing-env-var exit check OK; After writing the script, run make verify-prod and confirm it fails with a Docker build error (red state — Dockerfile.prod does not exist).

Implementation for User Story 1

  • T005 [US1] Create api/Dockerfile.prod at /workspace/api/Dockerfile.prod — a two-stage multi-stage build: Stage 1 (builder): FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder; WORKDIR /app; set ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy UV_PYTHON_DOWNLOADS=never; COPY pyproject.toml uv.lock ./; RUN --mount=type=cache,target=/root/.cache/uv uv sync --frozen --no-dev --no-install-project; COPY app/ ./app/ Stage 2 (runtime): FROM python:3.12-slim; WORKDIR /app; RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*; RUN groupadd --system --gid 1001 appgroup && useradd --system --uid 1001 --gid 1001 --no-create-home appuser; COPY --from=builder --chown=appuser:appgroup /app/.venv /app/.venv; COPY --chown=appuser:appgroup app/ ./app/; USER appuser; ENV PATH="/app/.venv/bin:$PATH"; EXPOSE 8000; HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 CMD curl -f http://localhost:8000/api/v1/health || exit 1; CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--timeout-graceful-shutdown", "30"]

  • T006 [US1] Verify TDD green for US1: run make verify-prod and confirm all four US1 checks pass — build OK, health endpoint returns 200, SIGTERM produces exit code 0, and [verify] US1 checks passed. is printed.

Checkpoint: US1 is complete. Production container builds, starts, serves traffic, and shuts down gracefully.


Phase 4: User Story 2 — Minimal, Secure Container (Priority: P2)

Goal: The production image runs as non-root and contains no dev dependencies or embedded secrets.

Independent Test: US2 checks in make verify-prod — the same script extended with non-root and dev-deps-absent assertions.

Tests for User Story 2 (TDD extension — add checks, confirm they pass against existing Dockerfile.prod)

  • T007 [US2] Extend api/tests/build/verify_production_image.sh with two US2 checks inserted after the SIGTERM check (before the final US1 checks passed line): [US2 check 1 — non-root] After the container is running (before docker stop), run UID_IN_CONTAINER=$(docker exec "$APP_CONTAINER" id -u); assert "$UID_IN_CONTAINER" -ne 0, fail with FAIL: process running as root (UID 0) if violated; print [verify] Non-root user OK (UID $UID_IN_CONTAINER); [US2 check 2 — dev deps absent] After cleanup of APP_CONTAINER but still holding the image, run docker run --rm "$IMAGE" /app/.venv/bin/python -c "import pytest" 2>/dev/null; assert the command returns non-zero (i.e., pytest is NOT importable); if it returns 0, fail with FAIL: pytest importable in production image (dev deps present); print [verify] Dev deps absent OK; [C1 — stdout log capture] Run docker logs "$APP_CONTAINER" 2>&1; assert the output is non-empty and contains Started server or Application startup complete (uvicorn startup lines); fail with FAIL: no startup logs found on stdout/stderr if absent; print [verify] Stdout logging OK; note — insert this check while APP_CONTAINER is still running, before the docker stop call; [C2 — no hardcoded secrets in layers] Run docker history --no-trunc "$IMAGE" 2>&1; pipe through grep -iE "(password|secret_key|api_key|token)" ; assert zero matching lines; if any match, fail with FAIL: potential secret found in image history; print [verify] No secrets in image layers OK; Update the final success line to [verify] All checks passed (US1 + US2).; confirm make verify-prod passes.

Checkpoint: US2 is verified. Image runs as UID 1001 and contains no test tooling.


Phase 5: User Story 3 — Fast, Reproducible Builds (Priority: P3)

Goal: Rebuilding after a source-only change reuses the dependency layer from cache.

Independent Test: US3 check in make verify-prod — a timed second build after touching a source file asserts the dep layer was cached.

Tests for User Story 3 (TDD extension)

  • T008 [US3] Extend api/tests/build/verify_production_image.sh with a US3 cache check appended after all other checks (before final success line): [US3 check — dep layer cached on source-only rebuild] Set IMAGE2="reactbin-api-prod:verify-cache-$$"; touch api/app/main.py; capture the output of docker build --progress=plain -f api/Dockerfile.prod api/ -t "$IMAGE2" 2>&1 (the --progress=plain flag ensures consistent CACHED output regardless of Docker version or TTY settings); assert the output contains the string CACHED; if CACHED is absent, fail with FAIL: dependency layer not reused on source-only rebuild; add docker rmi "$IMAGE2" 2>/dev/null || true to the cleanup() function; print [verify] Dep layer cache hit confirmed (US3 OK); Update the final success line to [verify] All checks passed (US1 + US2 + US3).

  • T009 [US3] Verify TDD green for US3: run make verify-prod and confirm the full script passes including the cache check — the build output for the second image must contain CACHED, and [verify] All checks passed (US1 + US2 + US3). must print.

Checkpoint: All three user stories are verified end-to-end by make verify-prod.


Phase 6: Polish & Cross-Cutting Concerns

  • T010 Run make test-integration from /workspace and confirm all 102 existing tests still pass — verifies that the .dockerignore additions (T002) do not break the existing test Dockerfile build or any integration test (§5.4 regression gate)

  • T011 Run shellcheck api/tests/build/verify_production_image.sh and fix any violations (common: unquoted variables, [ ] vs [[ ]], missing -- before arguments)


Dependencies & Execution Order

Phase Dependencies

  • Phase 1 (Setup): No external dependencies — start immediately
  • Phase 2 (Foundational): No dependencies — start immediately (parallel with Phase 1)
  • Phase 3 (US1): Depends on Phase 1 (Makefile + .dockerignore must exist before make verify-prod can run) and Phase 2 (test directory must exist)
  • Phase 4 (US2): Depends on Phase 3 (US1 script and Dockerfile must exist to extend)
  • Phase 5 (US3): Depends on Phase 4 (full US2 script must exist to extend)
  • Phase 6 (Polish): Depends on all prior phases; T010 (regression test) must precede T011 (shellcheck)

Within Phase 3

  • T004 before T005 (write test script before writing the Dockerfile)
  • T005 after T004 (implement Dockerfile after confirming red state)
  • T006 after T005 (verify green after implementation)

Execution Order Summary

Step 1: T001 ∥ T002 ∥ T003  (setup — parallel, different files)
Step 2: T004                 (write verification script — TDD red)
Step 3: T005                 (write Dockerfile.prod — implementation)
Step 4: T006                 (verify US1 green)
Step 5: T007                 (extend script with US2 checks, verify pass)
Step 6: T008                 (extend script with US3 check)
Step 7: T009                 (verify US3 green)
Step 8: T010                 (make test-integration — regression gate)
Step 9: T011                 (shellcheck polish)

Implementation Strategy

MVP (US1 — reliable production run)

  1. Complete T001T003 (setup)
  2. Complete T004T006 (core blocking: write script → write Dockerfile → verify green)
  3. Validate: make verify-prod passes; make test-integration still passes (no regressions)
  4. US2 and US3 add explicit verification coverage for properties already implemented

Incremental Delivery

  • After Phase 3: Production image builds, starts, and shuts down gracefully — safe to deploy
  • After Phase 4: Security properties (non-root, no dev deps) are explicitly verified
  • After Phase 5: Build efficiency (layer caching) is confirmed by automated check
  • After Phase 6: Script is lint-clean, ready for CI integration