Two-stage build (uv builder + python:3.12-slim runtime) with non-root user (UID 1001), no dev deps, layer-cache-optimised dep install, and graceful SIGTERM shutdown. Verified by api/tests/build/verify_production_image.sh covering build, health endpoint, non-root, stdout logging, secret-free layers, missing-env-var exit, and dep-layer cache hit. All 102 integration tests still pass; shellcheck clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
159 lines
12 KiB
Markdown
159 lines
12 KiB
Markdown
# Tasks: Production-Grade API Container Image
|
||
|
||
**Input**: Design documents from `specs/010-api-prod-dockerfile/`
|
||
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, contracts/container.md ✅, quickstart.md ✅
|
||
|
||
**Tests**: TDD is non-negotiable (§5.1). The "test" for a Docker build artefact is `api/tests/build/verify_production_image.sh`, written before `api/Dockerfile.prod` exists. Running the script immediately fails (red) because the build step cannot find the file; writing `Dockerfile.prod` turns it green.
|
||
|
||
**Organization**: Phase 1 sets up Makefile targets and `.dockerignore`; Phase 3 (US1) writes the verification script and the Dockerfile; Phase 4 (US2) extends the script with security checks; Phase 5 (US3) extends it with a cache-hit check; Phase 6 polishes.
|
||
|
||
## Format: `[ID] [P?] [Story] Description`
|
||
|
||
- **[P]**: Can run in parallel with other [P] tasks in the same phase
|
||
- **[Story]**: Which user story this task belongs to
|
||
- Exact file paths included in every task description
|
||
|
||
---
|
||
|
||
## Phase 1: Setup
|
||
|
||
- [X] T001 Add `build-prod` and `verify-prod` targets (and their `.PHONY` entries) to the root `Makefile` at `/workspace/Makefile`: `build-prod` runs `docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:latest`; `verify-prod` runs `bash api/tests/build/verify_production_image.sh`
|
||
|
||
- [X] T002 Update `api/.dockerignore` at `/workspace/api/.dockerignore`: append three lines — `tests/`, `alembic/`, and `alembic.ini` — so these are excluded from the production build context (the Dockerfile.prod copies only `app/` explicitly, but excluding them from the context keeps the transfer to the Docker daemon fast)
|
||
|
||
---
|
||
|
||
## Phase 2: Foundational
|
||
|
||
- [X] T003 Create directory `api/tests/build/` at `/workspace/api/tests/build/` with `mkdir -p` and add a `.gitkeep` so the directory is tracked
|
||
|
||
**Checkpoint**: Directory structure is ready; Makefile and .dockerignore are updated.
|
||
|
||
---
|
||
|
||
## Phase 3: User Story 1 — API Runs Reliably in Production (Priority: P1) 🎯 MVP
|
||
|
||
**Goal**: The container builds, starts, serves the health endpoint, and exits cleanly on SIGTERM.
|
||
|
||
**Independent Test**: `make verify-prod` — passes when `Dockerfile.prod` exists and all US1 checks pass.
|
||
|
||
### Test for User Story 1 (TDD red — write first, confirm failure before T005)
|
||
|
||
- [X] T004 [US1] Create `api/tests/build/verify_production_image.sh` as an executable bash script (`chmod +x`) with `#!/usr/bin/env bash` and `set -euo pipefail`; the script MUST:
|
||
1. Set `IMAGE="reactbin-api-prod:verify-$$"` and `PG_CONTAINER=""` and `APP_CONTAINER=""`;
|
||
2. Define a `cleanup()` function that runs `docker rm -f "$APP_CONTAINER" "$PG_CONTAINER" 2>/dev/null || true` and `docker rmi "$IMAGE" 2>/dev/null || true`, and register it with `trap cleanup EXIT`;
|
||
3. **[US1 check 1 — build]** Run `docker build -f api/Dockerfile.prod api/ -t "$IMAGE"` — this is the line that fails **red** because `api/Dockerfile.prod` does not yet exist; print `[verify] Building $IMAGE...` before and `[verify] Build OK` after;
|
||
4. **[US1 check 2 — start with real DB]** Launch a throwaway postgres: `PG_CONTAINER=$(docker run -d -e POSTGRES_DB=reactbin_verify -e POSTGRES_USER=verify -e POSTGRES_PASSWORD=verify postgres:16-alpine)`; poll `docker exec "$PG_CONTAINER" pg_isready -U verify` up to 30 × 1s, fail if timeout; capture `PG_IP=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' "$PG_CONTAINER")`;
|
||
5. Start the production container: `APP_CONTAINER=$(docker run -d -p 18000:8000 -e JWT_SECRET_KEY=verify-key -e OWNER_USERNAME=testowner -e OWNER_PASSWORD=testpassword -e DATABASE_URL="postgresql+asyncpg://verify:verify@${PG_IP}:5432/reactbin_verify" -e S3_ENDPOINT_URL=http://noop:9000 -e S3_BUCKET_NAME=noop -e S3_ACCESS_KEY_ID=noop -e S3_SECRET_ACCESS_KEY=noop -e S3_REGION=us-east-1 "$IMAGE")`; note — S3 credentials are placeholders; the health endpoint does not require S3;
|
||
6. **[US1 check 3 — health endpoint]** Poll `curl -sf http://localhost:18000/api/v1/health` up to 30 × 1s, fail with a message if timeout; print `[verify] Health check passed` on success;
|
||
7. **[US1 check 4 — SIGTERM → exit 0]** Run `docker stop "$APP_CONTAINER"` (sends SIGTERM); capture `EXIT_CODE=$(docker wait "$APP_CONTAINER")`; assert `"$EXIT_CODE" -eq 0`, fail with `FAIL: non-zero exit $EXIT_CODE` otherwise; print `[verify] Graceful shutdown OK (exit $EXIT_CODE)`;
|
||
8. Print `[verify] US1 checks passed.`
|
||
9. **[C3 — missing env var → non-zero exit]** Run `docker run --rm -e JWT_SECRET_KEY=verify-key "$IMAGE" 2>&1`; assert the exit code is **non-zero** (OWNER_USERNAME is absent so Pydantic settings validation must fail at startup); print `[verify] Missing-env-var exit check OK`;
|
||
After writing the script, run `make verify-prod` and confirm it **fails** with a Docker build error (red state — `Dockerfile.prod` does not exist).
|
||
|
||
### Implementation for User Story 1
|
||
|
||
- [X] T005 [US1] Create `api/Dockerfile.prod` at `/workspace/api/Dockerfile.prod` — a two-stage multi-stage build:
|
||
**Stage 1 (builder)**: `FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder`; `WORKDIR /app`; set `ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy UV_PYTHON_DOWNLOADS=never`; `COPY pyproject.toml uv.lock ./`; `RUN --mount=type=cache,target=/root/.cache/uv uv sync --frozen --no-dev --no-install-project`; `COPY app/ ./app/`
|
||
**Stage 2 (runtime)**: `FROM python:3.12-slim`; `WORKDIR /app`; `RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*`; `RUN groupadd --system --gid 1001 appgroup && useradd --system --uid 1001 --gid 1001 --no-create-home appuser`; `COPY --from=builder --chown=appuser:appgroup /app/.venv /app/.venv`; `COPY --chown=appuser:appgroup app/ ./app/`; `USER appuser`; `ENV PATH="/app/.venv/bin:$PATH"`; `EXPOSE 8000`; `HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 CMD curl -f http://localhost:8000/api/v1/health || exit 1`; `CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--timeout-graceful-shutdown", "30"]`
|
||
|
||
- [X] T006 [US1] Verify TDD green for US1: run `make verify-prod` and confirm all four US1 checks pass — build OK, health endpoint returns 200, SIGTERM produces exit code 0, and `[verify] US1 checks passed.` is printed.
|
||
|
||
**Checkpoint**: US1 is complete. Production container builds, starts, serves traffic, and shuts down gracefully.
|
||
|
||
---
|
||
|
||
## Phase 4: User Story 2 — Minimal, Secure Container (Priority: P2)
|
||
|
||
**Goal**: The production image runs as non-root and contains no dev dependencies or embedded secrets.
|
||
|
||
**Independent Test**: US2 checks in `make verify-prod` — the same script extended with non-root and dev-deps-absent assertions.
|
||
|
||
### Tests for User Story 2 (TDD extension — add checks, confirm they pass against existing Dockerfile.prod)
|
||
|
||
- [X] T007 [US2] Extend `api/tests/build/verify_production_image.sh` with two US2 checks inserted after the SIGTERM check (before the final `US1 checks passed` line):
|
||
**[US2 check 1 — non-root]** After the container is running (before `docker stop`), run `UID_IN_CONTAINER=$(docker exec "$APP_CONTAINER" id -u)`; assert `"$UID_IN_CONTAINER" -ne 0`, fail with `FAIL: process running as root (UID 0)` if violated; print `[verify] Non-root user OK (UID $UID_IN_CONTAINER)`;
|
||
**[US2 check 2 — dev deps absent]** After cleanup of APP_CONTAINER but still holding the image, run `docker run --rm "$IMAGE" /app/.venv/bin/python -c "import pytest" 2>/dev/null`; assert the command returns **non-zero** (i.e., pytest is NOT importable); if it returns 0, fail with `FAIL: pytest importable in production image (dev deps present)`; print `[verify] Dev deps absent OK`;
|
||
**[C1 — stdout log capture]** Run `docker logs "$APP_CONTAINER" 2>&1`; assert the output is non-empty and contains `Started server` or `Application startup complete` (uvicorn startup lines); fail with `FAIL: no startup logs found on stdout/stderr` if absent; print `[verify] Stdout logging OK`; note — insert this check while APP_CONTAINER is still running, before the `docker stop` call;
|
||
**[C2 — no hardcoded secrets in layers]** Run `docker history --no-trunc "$IMAGE" 2>&1`; pipe through `grep -iE "(password|secret_key|api_key|token)" `; assert zero matching lines; if any match, fail with `FAIL: potential secret found in image history`; print `[verify] No secrets in image layers OK`;
|
||
Update the final success line to `[verify] All checks passed (US1 + US2).`; confirm `make verify-prod` passes.
|
||
|
||
**Checkpoint**: US2 is verified. Image runs as UID 1001 and contains no test tooling.
|
||
|
||
---
|
||
|
||
## Phase 5: User Story 3 — Fast, Reproducible Builds (Priority: P3)
|
||
|
||
**Goal**: Rebuilding after a source-only change reuses the dependency layer from cache.
|
||
|
||
**Independent Test**: US3 check in `make verify-prod` — a timed second build after touching a source file asserts the dep layer was cached.
|
||
|
||
### Tests for User Story 3 (TDD extension)
|
||
|
||
- [X] T008 [US3] Extend `api/tests/build/verify_production_image.sh` with a US3 cache check appended after all other checks (before final success line):
|
||
**[US3 check — dep layer cached on source-only rebuild]** Set `IMAGE2="reactbin-api-prod:verify-cache-$$"`; `touch api/app/main.py`; capture the output of `docker build --progress=plain -f api/Dockerfile.prod api/ -t "$IMAGE2" 2>&1` (the `--progress=plain` flag ensures consistent `CACHED` output regardless of Docker version or TTY settings); assert the output contains the string `CACHED`; if `CACHED` is absent, fail with `FAIL: dependency layer not reused on source-only rebuild`; add `docker rmi "$IMAGE2" 2>/dev/null || true` to the `cleanup()` function; print `[verify] Dep layer cache hit confirmed (US3 OK)`;
|
||
Update the final success line to `[verify] All checks passed (US1 + US2 + US3).`
|
||
|
||
- [X] T009 [US3] Verify TDD green for US3: run `make verify-prod` and confirm the full script passes including the cache check — the build output for the second image must contain `CACHED`, and `[verify] All checks passed (US1 + US2 + US3).` must print.
|
||
|
||
**Checkpoint**: All three user stories are verified end-to-end by `make verify-prod`.
|
||
|
||
---
|
||
|
||
## Phase 6: Polish & Cross-Cutting Concerns
|
||
|
||
- [X] T010 Run `make test-integration` from `/workspace` and confirm all 102 existing tests still pass — verifies that the `.dockerignore` additions (T002) do not break the existing test Dockerfile build or any integration test (§5.4 regression gate)
|
||
|
||
- [X] T011 Run `shellcheck api/tests/build/verify_production_image.sh` and fix any violations (common: unquoted variables, `[ ]` vs `[[ ]]`, missing `--` before arguments)
|
||
|
||
---
|
||
|
||
## Dependencies & Execution Order
|
||
|
||
### Phase Dependencies
|
||
|
||
- **Phase 1 (Setup)**: No external dependencies — start immediately
|
||
- **Phase 2 (Foundational)**: No dependencies — start immediately (parallel with Phase 1)
|
||
- **Phase 3 (US1)**: Depends on Phase 1 (Makefile + .dockerignore must exist before `make verify-prod` can run) and Phase 2 (test directory must exist)
|
||
- **Phase 4 (US2)**: Depends on Phase 3 (US1 script and Dockerfile must exist to extend)
|
||
- **Phase 5 (US3)**: Depends on Phase 4 (full US2 script must exist to extend)
|
||
- **Phase 6 (Polish)**: Depends on all prior phases; T010 (regression test) must precede T011 (shellcheck)
|
||
|
||
### Within Phase 3
|
||
|
||
- T004 before T005 (write test script before writing the Dockerfile)
|
||
- T005 after T004 (implement Dockerfile after confirming red state)
|
||
- T006 after T005 (verify green after implementation)
|
||
|
||
### Execution Order Summary
|
||
|
||
```
|
||
Step 1: T001 ∥ T002 ∥ T003 (setup — parallel, different files)
|
||
Step 2: T004 (write verification script — TDD red)
|
||
Step 3: T005 (write Dockerfile.prod — implementation)
|
||
Step 4: T006 (verify US1 green)
|
||
Step 5: T007 (extend script with US2 checks, verify pass)
|
||
Step 6: T008 (extend script with US3 check)
|
||
Step 7: T009 (verify US3 green)
|
||
Step 8: T010 (make test-integration — regression gate)
|
||
Step 9: T011 (shellcheck polish)
|
||
```
|
||
|
||
---
|
||
|
||
## Implementation Strategy
|
||
|
||
### MVP (US1 — reliable production run)
|
||
|
||
1. Complete T001–T003 (setup)
|
||
2. Complete T004–T006 (core blocking: write script → write Dockerfile → verify green)
|
||
3. **Validate**: `make verify-prod` passes; `make test-integration` still passes (no regressions)
|
||
4. US2 and US3 add explicit verification coverage for properties already implemented
|
||
|
||
### Incremental Delivery
|
||
|
||
- After Phase 3: Production image builds, starts, and shuts down gracefully — safe to deploy
|
||
- After Phase 4: Security properties (non-root, no dev deps) are explicitly verified
|
||
- After Phase 5: Build efficiency (layer caching) is confirmed by automated check
|
||
- After Phase 6: Script is lint-clean, ready for CI integration
|