Files
agatha 12176471e1 Feat: Add production-grade multi-stage container image for API
Two-stage build (uv builder + python:3.12-slim runtime) with non-root
user (UID 1001), no dev deps, layer-cache-optimised dep install, and
graceful SIGTERM shutdown. Verified by api/tests/build/verify_production_image.sh
covering build, health endpoint, non-root, stdout logging, secret-free
layers, missing-env-var exit, and dep-layer cache hit. All 102 integration
tests still pass; shellcheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 19:59:29 +00:00

9.9 KiB

Implementation Plan: Production-Grade API Container Image

Branch: 010-api-prod-dockerfile | Date: 2026-05-07 | Spec: spec.md Input: Feature specification from specs/010-api-prod-dockerfile/spec.md

Summary

Produce a production-ready api/Dockerfile.prod using a two-stage build: a uv builder stage that installs lockfile-pinned, production-only dependencies into a virtual environment, and a lean python:3.12-slim runtime stage that contains only the venv, application source, and curl for health checks. The runtime process runs as a non-root user (UID 1001), handles SIGTERM gracefully via uvicorn's built-in drain, and logs exclusively to stdout/stderr. Behavioral verification is automated via a shell script (api/tests/build/verify_production_image.sh) written before the Dockerfile (§5.1 TDD).


Technical Context

Language/Version: Python 3.12 (existing API), Docker multi-stage build
Build tool: uv (lockfile: api/uv.lock, already committed)
Base images: ghcr.io/astral-sh/uv:python3.12-bookworm-slim (builder), python:3.12-slim (runtime)
Testing: Shell verification script (verify_production_image.sh) + make verify-prod target
Target Platform: linux/amd64 container (Kubernetes or Docker host)
Performance Goals: Container starts and passes health check within 30s; rebuild from warm cache in under 60s
Constraints: No root process, no hardcoded secrets, no dev deps in final image, compatible with --read-only filesystem
Scale/Scope: Single-file addition (Dockerfile.prod) + shell test + two Makefile targets; zero changes to existing source code


Constitution Check

GATE: Must pass before Phase 0 research. Re-checked post-design below.

Principle Status Notes
§5.1 TDD non-negotiable COMPLIANT verify_production_image.sh written before Dockerfile.prod; script fails (red) because the build file is absent, then passes (green) after
§5.2 Test pyramid COMPLIANT Shell verification script is the integration-level test for this build artefact; no unit tests applicable (no Python business logic added)
§5.4 CI must pass COMPLIANT make verify-prod target is runnable in host CI (requires Docker on the runner, which the existing make test-integration already requires)
§6 Tech Stack — Docker COMPLIANT Docker + Docker Compose are mandated; this adds a production Docker file within that constraint
§7.1 One-command local start COMPLIANT api/Dockerfile (dev stack) is unchanged; docker compose up is unaffected
§7.2 Environment configuration COMPLIANT Dockerfile.prod contains zero hardcoded env values; all config is injected at runtime
§7.3 Ruff/lint COMPLIANT No new Python files; shell script linted with shellcheck
§2.6 No speculative abstraction COMPLIANT Single Dockerfile, no plugin system or generics
§8 Scope boundaries COMPLIANT Purely infrastructure; no new API routes, data model, or UI changes

Post-design re-check: All gates remain green. No violations.


Project Structure

Documentation (this feature)

specs/010-api-prod-dockerfile/
├── plan.md              # This file
├── research.md          # Phase 0 decisions
├── contracts/
│   └── container.md     # Container interface contract (port, env vars, signals, user)
├── quickstart.md        # Build and verification scenarios
└── tasks.md             # Generated by /speckit-tasks

Source Code Changes

api/
├── Dockerfile           # Existing dev/test image — UNCHANGED
├── Dockerfile.prod      # NEW: production multi-stage image
├── .dockerignore        # Existing — verify test files are excluded from build context
└── tests/
    └── build/
        └── verify_production_image.sh   # NEW: TDD verification script (written first)

Makefile                 # Root Makefile — add build-prod and verify-prod targets

Dockerfile.prod — Annotated Reference

# syntax=docker/dockerfile:1

# ════════════════════════════════════════════════
# Build stage: install production deps via uv
# ════════════════════════════════════════════════
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder

WORKDIR /app

# Pre-compile bytecode; use copy mode for cross-layer compatibility
ENV UV_COMPILE_BYTECODE=1 \
    UV_LINK_MODE=copy \
    UV_PYTHON_DOWNLOADS=never

# ── Layer cache split: deps only (changes rarely) ──
COPY pyproject.toml uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen --no-dev --no-install-project

# ── Layer cache split: source (changes often) ──
COPY app/ ./app/

# ════════════════════════════════════════════════
# Runtime stage: lean image with venv + source
# ════════════════════════════════════════════════
FROM python:3.12-slim

WORKDIR /app

# curl for HEALTHCHECK — only tool added beyond base Python
RUN apt-get update \
    && apt-get install -y --no-install-recommends curl \
    && rm -rf /var/lib/apt/lists/*

# Non-root system user (UID/GID 1001)
RUN groupadd --system --gid 1001 appgroup \
    && useradd --system --uid 1001 --gid 1001 --no-create-home appuser

# Copy venv from builder; copy source directly from build context
COPY --from=builder --chown=appuser:appgroup /app/.venv /app/.venv
COPY --chown=appuser:appgroup app/ ./app/

USER appuser

# Activate the venv by prepending its bin to PATH
ENV PATH="/app/.venv/bin:$PATH"

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD curl -f http://localhost:8000/api/v1/health || exit 1

# uvicorn handles SIGTERM; --timeout-graceful-shutdown gives 30s to drain requests
CMD ["uvicorn", "app.main:app", \
     "--host", "0.0.0.0", \
     "--port", "8000", \
     "--timeout-graceful-shutdown", "30"]

Note on COPY paths: Build context is api/ (as set by the Makefile target). COPY app/ ./app/ in both stages refers to api/app/. The runtime stage copies source directly from the build context, not from the builder stage — this is simpler and avoids an extra intermediate layer.


verify_production_image.sh — Structure

#!/usr/bin/env bash
# TDD verification script for api/Dockerfile.prod
# Fails (red) if Dockerfile.prod does not exist or any check fails.
set -euo pipefail

IMAGE="reactbin-api-prod:verify-$$"

cleanup() { docker rm -f "$CONTAINER" 2>/dev/null || true; docker rmi "$IMAGE" 2>/dev/null || true; }
trap cleanup EXIT

# Step 1: Build — fails red if Dockerfile.prod is absent
docker build -f api/Dockerfile.prod api/ -t "$IMAGE"

# Step 2: Start container with minimal env vars
CONTAINER=$(docker run -d -p 18000:8000 \
  -e JWT_SECRET_KEY=verify-test-key \
  -e OWNER_USERNAME=testowner \
  -e OWNER_PASSWORD=testpassword \
  -e DATABASE_URL=postgresql+asyncpg://noop:noop@noop/noop \
  -e S3_ENDPOINT_URL=http://noop:9000 \
  -e S3_BUCKET_NAME=noop \
  -e S3_ACCESS_KEY_ID=noop \
  -e S3_SECRET_ACCESS_KEY=noop \
  -e S3_REGION=us-east-1 \
  "$IMAGE")

# Step 3: Poll health endpoint (app will fail to connect to DB, but /health is pre-DB)
for i in $(seq 1 30); do
  if curl -sf http://localhost:18000/api/v1/health > /dev/null; then break; fi
  sleep 1
  [[ $i -eq 30 ]] && { echo "FAIL: health check timed out"; exit 1; }
done

# Step 4: Assert non-root user
UID_IN_CONTAINER=$(docker exec "$CONTAINER" id -u)
[[ "$UID_IN_CONTAINER" -ne 0 ]] || { echo "FAIL: process running as root"; exit 1; }

# Step 5: Graceful shutdown
docker stop "$CONTAINER"          # sends SIGTERM
EXIT_CODE=$(docker wait "$CONTAINER")
[[ "$EXIT_CODE" -eq 0 ]] || { echo "FAIL: non-zero exit code $EXIT_CODE"; exit 1; }

# Step 6: Dev deps absent
if docker run --rm "$IMAGE" /app/.venv/bin/python -c "import pytest" 2>/dev/null; then
  echo "FAIL: pytest importable in production image (dev deps present)"; exit 1
fi

echo "All production image checks passed."

Note on health check feasibility: /api/v1/health is a simple JSON response that does not require a database connection (confirmed in api/app/main.py). The verification script can therefore pass even without a real PostgreSQL instance.


Makefile Targets

Add to root Makefile:

.PHONY: build-prod verify-prod

build-prod:
	docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:latest

verify-prod:
	bash api/tests/build/verify_production_image.sh

.dockerignore Review

The existing api/.dockerignore already excludes .venv/, __pycache__/, .env, etc. Two additions improve the production build context:

tests/
*.egg-info/
alembic/
alembic.ini

tests/ and alembic/ are not needed in the production image (we COPY app/ ./app/ explicitly). Excluding them from the build context reduces the data sent to the Docker daemon.

*.egg-info/ is already present in the existing .dockerignore.


Implementation Order

Tasks are generated by /speckit-tasks, but the logical dependency order is:

  1. Write verify_production_image.sh (TDD red — build fails because Dockerfile.prod absent)
  2. Add Makefile targets (build-prod, verify-prod) — references the script
  3. Write api/Dockerfile.prod (implement to make TDD pass)
  4. Update api/.dockerignore (exclude tests/, alembic/ from build context)
  5. Run make verify-prod (TDD green — all 6 checks pass)
  6. Run shellcheck on verify_production_image.sh

No existing tests are modified. make test-integration continues to use api/Dockerfile unchanged.