Feat: Add production-grade multi-stage container image for API

Two-stage build (uv builder + python:3.12-slim runtime) with non-root
user (UID 1001), no dev deps, layer-cache-optimised dep install, and
graceful SIGTERM shutdown. Verified by api/tests/build/verify_production_image.sh
covering build, health endpoint, non-root, stdout logging, secret-free
layers, missing-env-var exit, and dep-layer cache hit. All 102 integration
tests still pass; shellcheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-07 19:59:29 +00:00
parent 7a835d3172
commit 12176471e1
15 changed files with 1067 additions and 3 deletions

View File

@@ -0,0 +1,242 @@
# Implementation Plan: Production-Grade API Container Image
**Branch**: `010-api-prod-dockerfile` | **Date**: 2026-05-07 | **Spec**: [spec.md](spec.md)
**Input**: Feature specification from `specs/010-api-prod-dockerfile/spec.md`
## Summary
Produce a production-ready `api/Dockerfile.prod` using a two-stage build: a uv builder stage that installs lockfile-pinned, production-only dependencies into a virtual environment, and a lean `python:3.12-slim` runtime stage that contains only the venv, application source, and `curl` for health checks. The runtime process runs as a non-root user (UID 1001), handles SIGTERM gracefully via uvicorn's built-in drain, and logs exclusively to stdout/stderr. Behavioral verification is automated via a shell script (`api/tests/build/verify_production_image.sh`) written before the Dockerfile (§5.1 TDD).
---
## Technical Context
**Language/Version**: Python 3.12 (existing API), Docker multi-stage build
**Build tool**: uv (lockfile: `api/uv.lock`, already committed)
**Base images**: `ghcr.io/astral-sh/uv:python3.12-bookworm-slim` (builder), `python:3.12-slim` (runtime)
**Testing**: Shell verification script (`verify_production_image.sh`) + `make verify-prod` target
**Target Platform**: linux/amd64 container (Kubernetes or Docker host)
**Performance Goals**: Container starts and passes health check within 30s; rebuild from warm cache in under 60s
**Constraints**: No root process, no hardcoded secrets, no dev deps in final image, compatible with `--read-only` filesystem
**Scale/Scope**: Single-file addition (`Dockerfile.prod`) + shell test + two Makefile targets; zero changes to existing source code
---
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-checked post-design below.*
| Principle | Status | Notes |
|-----------|--------|-------|
| §5.1 TDD non-negotiable | **COMPLIANT** | `verify_production_image.sh` written before `Dockerfile.prod`; script fails (red) because the build file is absent, then passes (green) after |
| §5.2 Test pyramid | **COMPLIANT** | Shell verification script is the integration-level test for this build artefact; no unit tests applicable (no Python business logic added) |
| §5.4 CI must pass | **COMPLIANT** | `make verify-prod` target is runnable in host CI (requires Docker on the runner, which the existing `make test-integration` already requires) |
| §6 Tech Stack — Docker | **COMPLIANT** | Docker + Docker Compose are mandated; this adds a production Docker file within that constraint |
| §7.1 One-command local start | **COMPLIANT** | `api/Dockerfile` (dev stack) is unchanged; `docker compose up` is unaffected |
| §7.2 Environment configuration | **COMPLIANT** | `Dockerfile.prod` contains zero hardcoded env values; all config is injected at runtime |
| §7.3 Ruff/lint | **COMPLIANT** | No new Python files; shell script linted with `shellcheck` |
| §2.6 No speculative abstraction | **COMPLIANT** | Single Dockerfile, no plugin system or generics |
| §8 Scope boundaries | **COMPLIANT** | Purely infrastructure; no new API routes, data model, or UI changes |
**Post-design re-check**: All gates remain green. No violations.
---
## Project Structure
### Documentation (this feature)
```text
specs/010-api-prod-dockerfile/
├── plan.md # This file
├── research.md # Phase 0 decisions
├── contracts/
│ └── container.md # Container interface contract (port, env vars, signals, user)
├── quickstart.md # Build and verification scenarios
└── tasks.md # Generated by /speckit-tasks
```
### Source Code Changes
```text
api/
├── Dockerfile # Existing dev/test image — UNCHANGED
├── Dockerfile.prod # NEW: production multi-stage image
├── .dockerignore # Existing — verify test files are excluded from build context
└── tests/
└── build/
└── verify_production_image.sh # NEW: TDD verification script (written first)
Makefile # Root Makefile — add build-prod and verify-prod targets
```
---
## Dockerfile.prod — Annotated Reference
```dockerfile
# syntax=docker/dockerfile:1
# ════════════════════════════════════════════════
# Build stage: install production deps via uv
# ════════════════════════════════════════════════
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder
WORKDIR /app
# Pre-compile bytecode; use copy mode for cross-layer compatibility
ENV UV_COMPILE_BYTECODE=1 \
UV_LINK_MODE=copy \
UV_PYTHON_DOWNLOADS=never
# ── Layer cache split: deps only (changes rarely) ──
COPY pyproject.toml uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev --no-install-project
# ── Layer cache split: source (changes often) ──
COPY app/ ./app/
# ════════════════════════════════════════════════
# Runtime stage: lean image with venv + source
# ════════════════════════════════════════════════
FROM python:3.12-slim
WORKDIR /app
# curl for HEALTHCHECK — only tool added beyond base Python
RUN apt-get update \
&& apt-get install -y --no-install-recommends curl \
&& rm -rf /var/lib/apt/lists/*
# Non-root system user (UID/GID 1001)
RUN groupadd --system --gid 1001 appgroup \
&& useradd --system --uid 1001 --gid 1001 --no-create-home appuser
# Copy venv from builder; copy source directly from build context
COPY --from=builder --chown=appuser:appgroup /app/.venv /app/.venv
COPY --chown=appuser:appgroup app/ ./app/
USER appuser
# Activate the venv by prepending its bin to PATH
ENV PATH="/app/.venv/bin:$PATH"
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD curl -f http://localhost:8000/api/v1/health || exit 1
# uvicorn handles SIGTERM; --timeout-graceful-shutdown gives 30s to drain requests
CMD ["uvicorn", "app.main:app", \
"--host", "0.0.0.0", \
"--port", "8000", \
"--timeout-graceful-shutdown", "30"]
```
> **Note on COPY paths**: Build context is `api/` (as set by the Makefile target). `COPY app/ ./app/` in both stages refers to `api/app/`. The runtime stage copies source directly from the build context, not from the builder stage — this is simpler and avoids an extra intermediate layer.
---
## verify_production_image.sh — Structure
```sh
#!/usr/bin/env bash
# TDD verification script for api/Dockerfile.prod
# Fails (red) if Dockerfile.prod does not exist or any check fails.
set -euo pipefail
IMAGE="reactbin-api-prod:verify-$$"
cleanup() { docker rm -f "$CONTAINER" 2>/dev/null || true; docker rmi "$IMAGE" 2>/dev/null || true; }
trap cleanup EXIT
# Step 1: Build — fails red if Dockerfile.prod is absent
docker build -f api/Dockerfile.prod api/ -t "$IMAGE"
# Step 2: Start container with minimal env vars
CONTAINER=$(docker run -d -p 18000:8000 \
-e JWT_SECRET_KEY=verify-test-key \
-e OWNER_USERNAME=testowner \
-e OWNER_PASSWORD=testpassword \
-e DATABASE_URL=postgresql+asyncpg://noop:noop@noop/noop \
-e S3_ENDPOINT_URL=http://noop:9000 \
-e S3_BUCKET_NAME=noop \
-e S3_ACCESS_KEY_ID=noop \
-e S3_SECRET_ACCESS_KEY=noop \
-e S3_REGION=us-east-1 \
"$IMAGE")
# Step 3: Poll health endpoint (app will fail to connect to DB, but /health is pre-DB)
for i in $(seq 1 30); do
if curl -sf http://localhost:18000/api/v1/health > /dev/null; then break; fi
sleep 1
[[ $i -eq 30 ]] && { echo "FAIL: health check timed out"; exit 1; }
done
# Step 4: Assert non-root user
UID_IN_CONTAINER=$(docker exec "$CONTAINER" id -u)
[[ "$UID_IN_CONTAINER" -ne 0 ]] || { echo "FAIL: process running as root"; exit 1; }
# Step 5: Graceful shutdown
docker stop "$CONTAINER" # sends SIGTERM
EXIT_CODE=$(docker wait "$CONTAINER")
[[ "$EXIT_CODE" -eq 0 ]] || { echo "FAIL: non-zero exit code $EXIT_CODE"; exit 1; }
# Step 6: Dev deps absent
if docker run --rm "$IMAGE" /app/.venv/bin/python -c "import pytest" 2>/dev/null; then
echo "FAIL: pytest importable in production image (dev deps present)"; exit 1
fi
echo "All production image checks passed."
```
> **Note on health check feasibility**: `/api/v1/health` is a simple JSON response that does not require a database connection (confirmed in `api/app/main.py`). The verification script can therefore pass even without a real PostgreSQL instance.
---
## Makefile Targets
Add to root `Makefile`:
```makefile
.PHONY: build-prod verify-prod
build-prod:
docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:latest
verify-prod:
bash api/tests/build/verify_production_image.sh
```
---
## `.dockerignore` Review
The existing `api/.dockerignore` already excludes `.venv/`, `__pycache__/`, `.env`, etc. Two additions improve the production build context:
```
tests/
*.egg-info/
alembic/
alembic.ini
```
`tests/` and `alembic/` are not needed in the production image (we `COPY app/ ./app/` explicitly). Excluding them from the build context reduces the data sent to the Docker daemon.
> `*.egg-info/` is already present in the existing `.dockerignore`.
---
## Implementation Order
Tasks are generated by `/speckit-tasks`, but the logical dependency order is:
1. **Write `verify_production_image.sh`** (TDD red — build fails because `Dockerfile.prod` absent)
2. **Add `Makefile` targets** (`build-prod`, `verify-prod`) — references the script
3. **Write `api/Dockerfile.prod`** (implement to make TDD pass)
4. **Update `api/.dockerignore`** (exclude `tests/`, `alembic/` from build context)
5. **Run `make verify-prod`** (TDD green — all 6 checks pass)
6. **Run `shellcheck`** on `verify_production_image.sh`
No existing tests are modified. `make test-integration` continues to use `api/Dockerfile` unchanged.