Chore: Revert initContainer command after successful migration

Chore: Bump manifests and add migration init container sequence
Fix: Include scripts/ in production Docker image
2026-05-09 20:39:22 -04:00 · 2026-05-09 20:26:51 -04:00 · 2026-05-10 00:18:48 +00:00 · 2026-05-10 00:13:55 +00:00 · 2026-05-09 18:43:33 -04:00 · 2026-05-09 22:42:23 +00:00
150 changed files with 8660 additions and 183 deletions
--- a/.env.example
+++ b/.env.example
@@ -11,6 +11,10 @@ S3_REGION=us-east-1
 # Angular SPA — injected at build or runtime
 API_BASE_URL=http://localhost:8000

+# CDN base URL for serving images (e.g. https://cdn.example.com).
+# Leave empty in local dev to use API proxy fallback.
+S3_PUBLIC_BASE_URL=
+
 # Upload size limit in bytes (default 50 MiB)
 MAX_UPLOAD_BYTES=52428800

@@ -19,3 +23,15 @@ JWT_SECRET_KEY=change-me-to-a-long-random-string
 JWT_EXPIRY_SECONDS=86400
 OWNER_USERNAME=owner
 OWNER_PASSWORD=change-me
+
+# Login brute-force protection
+LOGIN_MAX_FAILURES=5
+LOGIN_WINDOW_SECONDS=300
+LOGIN_COOLDOWN_SECONDS=900
+# Comma-separated IPs/CIDRs of trusted upstream proxies (e.g. nginx ingress pod CIDR).
+# Leave empty when not behind a reverse proxy.
+LOGIN_TRUSTED_PROXY_IPS=
+
+# API documentation endpoints (Swagger UI, ReDoc, OpenAPI schema)
+# Set to false in production to avoid exposing the API surface publicly.
+API_DOCS_ENABLED=true
--- a/.env.test.example
+++ b/.env.test.example
@@ -27,3 +27,10 @@ OWNER_PASSWORD=testpassword
 # API
 API_BASE_URL=http://localhost:8000
 MAX_UPLOAD_BYTES=52428800
+
+# Login brute-force protection
+LOGIN_MAX_FAILURES=5
+LOGIN_WINDOW_SECONDS=300
+LOGIN_COOLDOWN_SECONDS=900
+# Comma-separated IPs/CIDRs of trusted upstream proxies; leave empty for direct connections.
+LOGIN_TRUSTED_PROXY_IPS=
--- a/.gitignore
+++ b/.gitignore
@@ -16,6 +16,8 @@ venv/
 *.egg-info/
 dist/
 build/
+!api/tests/build/
+!ui/tests/build/
 .pytest_cache/
 .ruff_cache/
 .coverage
--- a/.img/reactbin-ui.png
+++ b/.img/reactbin-ui.png
--- a/.specify/feature.json
+++ b/.specify/feature.json
@@ -1 +1 @@
-{"feature_directory":"specs/008-postgres-integration-tests"}
+{"feature_directory":"specs/017-short-id-migration"}
--- a/.specify/memory/constitution.md
+++ b/.specify/memory/constitution.md
@@ -1,8 +1,8 @@
 <!--
 SYNC IMPACT REPORT
 ==================
-Version change: 1.2.0 → 1.3.0
-Ratified: 2026-05-01 | Last amended: 2026-05-06
+Version change: 1.3.0 → 1.4.0
+Ratified: 2026-05-01 | Last amended: 2026-05-08

 Principles introduced (first population from docs/CONSTITUTION.md):
  - §2  Architecture Principles (6 sub-principles)
@@ -171,11 +171,14 @@ OR/NOT logic is explicitly out of scope until the constitution is revised.

 ## 5. Testing Discipline

-### 5.1 TDD is non-negotiable
+### 5.1 Tests are required alongside every implementation task

-No production code MAY be written before a failing test exists for it. This
-applies to both API and UI. Tasks MUST include a "write failing test" step
-before any implementation step.
+Every implementation task MUST be accompanied by tests covering its behaviour.
+The ideal is red-green-refactor: write a failing test, then make it pass. In
+practice, tests written in the same task as the implementation are acceptable;
+what is non-negotiable is that no implementation task is marked done without
+corresponding test coverage. Tasks MUST NOT be split such that implementation
+is complete but tests are deferred to a later task.

 ### 5.2 Test pyramid

@@ -194,10 +197,15 @@ Unit and integration tests are required. E2E tests are best-effort in v1.
 API tests in `api/tests/`, UI tests colocated with their components. No
 separate top-level `tests/` directory that mirrors the source tree.

-### 5.4 CI must pass before any task is considered done
+### 5.4 The test suite must pass before any task is considered done

 "Done" means: all tests pass, linter passes, type checker passes. A task MUST
-NOT be marked complete while CI is failing.
+NOT be marked complete while any of these are failing.
+
+The acceptance gate is `make test-unit && make test-integration` plus `ruff
+check` / `ruff format --check` for the API. A formal CI pipeline is planned
+but not yet in place; until one exists, passing the above commands locally is
+the required gate. When CI is introduced it MUST enforce the same checks.

 ---

@@ -214,6 +222,9 @@ NOT be marked complete while CI is failing.
 | UI framework     | Angular (latest stable)                   | Job-relevant, learning goal               |
 | UI language      | TypeScript strict mode                    | No `any`, no implicit types               |
 | Containerisation | Docker + Docker Compose                   | Local dev must start with one command     |
+| Production runtime | k3s (Kubernetes)                        | Manifests in `k8s/`; see deployment docs  |
+| Ingress          | nginx ingress controller + cert-manager   | TLS via Let's Encrypt (`letsencrypt-prod` ClusterIssuer) |
+| Secret management | HashiCorp Vault + VSO (Vault Secrets Operator) | Secrets never committed; VSO syncs Vault KV v2 → K8s Secrets |

 ---

@@ -251,6 +262,15 @@ revised:
 - Mobile-native app
 - OIDC auth (planned Phase 3)

+**Known gaps carried forward from v1** — these are not out of scope; they are
+acknowledged deficiencies that MUST be resolved before the affected area is
+expanded:
+
+- **Password hashing**: The owner password is currently stored and compared in
+  plaintext. Hashing (bcrypt or Argon2) MUST be implemented before any
+  additional authentication work (e.g. OIDC, additional accounts) is started.
+  Specs that touch credential storage MUST address this first.
+
 ---

 ## 9. Governance
@@ -289,7 +309,8 @@ Phase 1 design is complete.
 | 1.1.1   | 2026-05-03 | Clarify that the only acceptable form of image transformation or editing is thumbnail generation                                |
 | 1.2.0   | 2026-05-03 | §2.4: Mark Phase 2 (JWT bearer auth) complete, reword phase status; §6: Add PyJWT to tech stack table; §8: Remove username/password auth from out-of-scope (now shipped) |
 | 1.3.0   | 2026-05-06 | §2.5: Remove planned PostgreSQL → SQLite refactor note; prohibit alternative database engines in integration tests. §5.2: Explicitly require PostgreSQL for integration tests; prohibit SQLite — a production HAVING/GROUP BY bug was masked by SQLite's permissive dialect. |
+| 1.4.0   | 2026-05-08 | §5.1: Soften strict TDD wording to reflect actual practice — tests alongside implementation are acceptable; deferring tests to a later task is not. §5.4: Replace "CI must pass" with local test suite gate; note CI is planned but not yet in place. §6: Add production runtime rows (k3s, nginx ingress + cert-manager, Vault + VSO). §8: Add "known gaps" subsection; document plaintext password storage as a deficiency that must be resolved before further auth work. |

 ---

-**Version**: 1.3.0 | **Ratified**: 2026-05-01 | **Last Amended**: 2026-05-06
+**Version**: 1.4.0 | **Ratified**: 2026-05-01 | **Last Amended**: 2026-05-08
--- a/.yamllint.yml
+++ b/.yamllint.yml
@@ -0,0 +1,4 @@
+extends: relaxed
+rules:
+  line-length:
+    max: 120
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,5 +1,5 @@
 <!-- SPECKIT START -->
 For additional context about technologies to be used, project structure,
 shell commands, and other important information, read the current plan at
-`specs/008-postgres-integration-tests/plan.md`.
+`specs/017-short-id-migration/plan.md`.
 <!-- SPECKIT END -->
--- a/20
+++ b/20
@@ -1,7 +1,25 @@
-.PHONY: test-unit test-integration
+.PHONY: test-unit test-integration build-prod verify-prod build-ui-prod verify-ui-prod validate-k8s

 test-unit:
 	cd api && python -m pytest tests/unit/ -v

 test-integration:
+	docker compose -f docker-compose.test.yml build api-test
 	docker compose -f docker-compose.test.yml run --rm api-test
+
+build-prod:
+	docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:latest
+
+verify-prod:
+	bash api/tests/build/verify_production_image.sh
+
+build-ui-prod:
+	docker build -f ui/Dockerfile.prod ui/ -t reactbin-ui-prod:latest
+
+verify-ui-prod:
+	bash ui/tests/build/verify_production_image.sh
+
+# Offline: yamllint only. Online (requires kubeconfig): kubectl apply --dry-run=client -f k8s/
+validate-k8s:
+	yamllint -d relaxed k8s/
+	kubectl apply --dry-run=client -f k8s/
--- a/README.md
+++ b/README.md
@@ -2,3 +2,141 @@
 _Organize your reaction images._

 ![Reactbin UI](.img/reactbin-ui.png)
+
+A self-hosted reaction image board. Single owner account, tag-based browsing, S3-compatible image storage.
+
+---
+
+## Local development
+
+```bash
+cp .env.example .env
+# Edit .env — defaults work out of the box for local dev
+docker compose up
+```
+
+- UI: http://localhost:4200
+- API: http://localhost:8000
+- MinIO console: http://localhost:9001 (minioadmin / minioadmin)
+
+The API serves on port 8000 directly in dev. In production the nginx ingress routes `/api/` there.
+
+### Running tests
+
+```bash
+make test-unit          # pytest unit tests (no Docker)
+make test-integration   # builds api-test image, runs full suite against Postgres + MinIO
+```
+
+### Production image builds
+
+```bash
+make build-prod         # builds reactbin-api-prod:latest from api/Dockerfile.prod
+make verify-prod        # smoke-tests the production image
+make build-ui-prod      # builds reactbin-ui-prod:latest from ui/Dockerfile.prod
+make verify-ui-prod     # smoke-tests the production UI image
+```
+
+---
+
+## Production deployment (k3s)
+
+### Cluster prerequisites
+
+- nginx ingress controller
+- cert-manager with a `letsencrypt-prod` ClusterIssuer
+- Vault Secrets Operator (VSO) installed and connected to Vault
+- Vault KV v2 secrets populated (see below)
+
+### Vault secrets
+
+Two KV v2 paths. VSO syncs these into Kubernetes Secrets automatically.
+
+**`reactbin/api/config`** → K8s Secret `api-env`
+
+| Key | Notes |
+|-----|-------|
+| `DATABASE_URL` | `postgresql+asyncpg://user:pass@host:5432/db` |
+| `JWT_SECRET_KEY` | Long random string — `openssl rand -base64 48` |
+| `OWNER_USERNAME` | Login username |
+| `OWNER_PASSWORD` | Login password |
+| `S3_ENDPOINT_URL` | `http://minio.reactbin.svc.cluster.local:9000` |
+| `S3_BUCKET_NAME` | `reactbin` |
+| `S3_ACCESS_KEY_ID` | Same value as `MINIO_ROOT_USER` |
+| `S3_SECRET_ACCESS_KEY` | Same value as `MINIO_ROOT_PASSWORD` |
+| `API_BASE_URL` | `https://<your-domain>` |
+| `LOGIN_TRUSTED_PROXY_IPS` | Pod CIDR of nginx ingress pods, e.g. `10.42.0.0/16` — needed for per-client login rate limiting behind the ingress |
+
+**`reactbin/minio/credentials`** → K8s Secret `minio-credentials`
+
+| Key | Notes |
+|-----|-------|
+| `MINIO_ROOT_USER` | MinIO admin username |
+| `MINIO_ROOT_PASSWORD` | `openssl rand -base64 32` |
+
+### Apply order
+
+```bash
+# 1. Namespace first
+kubectl apply -f k8s/namespace.yaml
+
+# 2. Vault CRDs — wait for VSO to create api-env and minio-credentials Secrets
+kubectl apply -f k8s/vault/
+kubectl get secret -n reactbin api-env minio-credentials   # wait until both appear
+
+# 3. API, UI, Ingress — replace 'latest' tags and <your-domain> first
+kubectl apply -f k8s/api/ -f k8s/ui/ -f k8s/ingress.yaml
+kubectl rollout status deployment/api -n reactbin          # Alembic init container runs here
+
+# 4. MinIO — wait for StatefulSet ready before running the bucket init Job
+kubectl apply -f k8s/minio/service.yaml -f k8s/minio/statefulset.yaml
+kubectl rollout status statefulset/minio -n reactbin
+kubectl apply -f k8s/minio/init-job.yaml
+```
+
+Before applying: substitute real image tags in the Deployment manifests and replace `<your-domain>` in `k8s/ingress.yaml`.
+
+### Updating a secret
+
+1. Update the value in Vault
+2. Force VSO to sync immediately (otherwise waits up to 1 hour):
+   ```bash
+   kubectl annotate vaultstaticsecret api-secret -n reactbin \
+     secrets.hashicorp.com/force-sync=$(date +%s) --overwrite
+   ```
+3. Restart the deployment to pick up the new Secret:
+   ```bash
+   kubectl rollout restart deployment/api -n reactbin
+   ```
+
+### Validating manifests
+
+```bash
+make validate-k8s   # yamllint + kubectl apply --dry-run=client (requires kubeconfig)
+```
+
+---
+
+## Environment variables reference
+
+All variables are read at startup from environment / `.env`.
+
+| Variable | Default | Notes |
+|----------|---------|-------|
+| `DATABASE_URL` | — | Async DSN: `postgresql+asyncpg://...` |
+| `JWT_SECRET_KEY` | — | Required; use a long random string in production |
+| `JWT_EXPIRY_SECONDS` | `86400` | Token lifetime (24 h) |
+| `OWNER_USERNAME` | — | Single owner account username |
+| `OWNER_PASSWORD` | — | Single owner account password |
+| `S3_ENDPOINT_URL` | — | MinIO or any S3-compatible endpoint |
+| `S3_BUCKET_NAME` | `reactbin` | |
+| `S3_ACCESS_KEY_ID` | — | |
+| `S3_SECRET_ACCESS_KEY` | — | |
+| `S3_REGION` | `us-east-1` | |
+| `MAX_UPLOAD_BYTES` | `52428800` | 50 MiB |
+| `API_BASE_URL` | — | Used for generating public URLs |
+| `API_DOCS_ENABLED` | `true` | Set to `false` in production |
+| `LOGIN_MAX_FAILURES` | `5` | Failed attempts before cooldown |
+| `LOGIN_WINDOW_SECONDS` | `300` | Sliding window for failure count |
+| `LOGIN_COOLDOWN_SECONDS` | `900` | Lock duration after threshold hit |
+| `LOGIN_TRUSTED_PROXY_IPS` | `""` | Comma-separated CIDRs of trusted upstream proxies |
--- a/api/Dockerfile.prod
+++ b/api/Dockerfile.prod
@@ -0,0 +1,54 @@
+# syntax=docker/dockerfile:1
+
+# ════════════════════════════════════════════════
+# Build stage: install production deps via uv
+# ════════════════════════════════════════════════
+FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder
+
+WORKDIR /app
+
+ENV UV_COMPILE_BYTECODE=1 \
+    UV_LINK_MODE=copy \
+    UV_PYTHON_DOWNLOADS=never
+
+# Layer cache split: deps only (changes rarely)
+COPY pyproject.toml uv.lock ./
+RUN --mount=type=cache,target=/root/.cache/uv \
+    uv sync --frozen --no-dev --no-install-project
+
+# Layer cache split: source (changes often)
+COPY app/ ./app/
+
+# ════════════════════════════════════════════════
+# Runtime stage: lean image with venv + source
+# ════════════════════════════════════════════════
+FROM python:3.12-slim
+
+WORKDIR /app
+
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends curl \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN groupadd --system --gid 1001 appgroup \
+    && useradd --system --uid 1001 --gid 1001 --no-create-home appuser
+
+COPY --from=builder --chown=appuser:appgroup /app/.venv /app/.venv
+COPY --chown=appuser:appgroup app/ ./app/
+COPY --chown=appuser:appgroup alembic/ ./alembic/
+COPY --chown=appuser:appgroup alembic.ini .
+COPY --chown=appuser:appgroup scripts/ ./scripts/
+
+USER appuser
+
+ENV PATH="/app/.venv/bin:$PATH"
+
+EXPOSE 8000
+
+HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
+    CMD curl -f http://localhost:8000/api/v1/health || exit 1
+
+CMD ["uvicorn", "app.main:app", \
+     "--host", "0.0.0.0", \
+     "--port", "8000", \
+     "--timeout-graceful-shutdown", "30"]
--- a/api/alembic/versions/003_add_short_id.py
+++ b/api/alembic/versions/003_add_short_id.py
@@ -0,0 +1,24 @@
+"""add short_id column to images
+
+Revision ID: 003
+Revises: 002
+Create Date: 2026-05-09
+"""
+
+from alembic import op
+import sqlalchemy as sa
+
+revision = "003"
+down_revision = "002"
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    op.add_column("images", sa.Column("short_id", sa.String(8), nullable=True))
+    op.create_index("ix_images_short_id", "images", ["short_id"], unique=True)
+
+
+def downgrade() -> None:
+    op.drop_index("ix_images_short_id", table_name="images")
+    op.drop_column("images", "short_id")
--- a/api/alembic/versions/004_short_id_not_null.py
+++ b/api/alembic/versions/004_short_id_not_null.py
@@ -0,0 +1,24 @@
+"""set short_id NOT NULL on images
+
+Revision ID: 004
+Revises: 003
+Create Date: 2026-05-09
+
+IMPORTANT: Run migrate_to_short_ids.py script BEFORE applying this migration.
+This migration will fail if any rows still have short_id IS NULL.
+"""
+
+from alembic import op
+
+revision = "004"
+down_revision = "003"
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    op.alter_column("images", "short_id", nullable=False)
+
+
+def downgrade() -> None:
+    op.alter_column("images", "short_id", nullable=True)
--- a/api/app/auth/rate_limiter.py
+++ b/api/app/auth/rate_limiter.py
@@ -0,0 +1,99 @@
+import ipaddress
+import logging
+import time
+from dataclasses import dataclass, field
+from ipaddress import IPv4Network, IPv6Network
+from threading import Lock
+
+from starlette.requests import Request
+
+logger = logging.getLogger(__name__)
+
+
+def get_client_ip(
+    request: Request,
+    trusted_networks: list[IPv4Network | IPv6Network],
+) -> str:
+    """Return the resolved client IP.
+
+    Prefers X-Real-IP over X-Forwarded-For when the TCP peer is a trusted
+    proxy. ingress-nginx sets X-Real-IP via its realip module using an
+    authoritative CIDR allowlist; it overwrites any client-supplied value, so
+    it cannot be spoofed via XFF injection. XFF[0] is the fallback for paths
+    that lack nginx (none currently exist, but kept for defence in depth).
+    """
+    peer = request.client.host if request.client else "unknown"
+    if trusted_networks and peer != "unknown":
+        try:
+            peer_addr = ipaddress.ip_address(peer)
+            if any(peer_addr in net for net in trusted_networks):
+                real_ip = request.headers.get("X-Real-IP", "").strip()
+                if real_ip:
+                    return real_ip
+                # XFF[0] fallback — warn because this path should not be
+                # reached in production (nginx always sets X-Real-IP).
+                xff = request.headers.get("X-Forwarded-For", "").split(",")[0].strip()
+                if xff:
+                    logger.warning(
+                        "X-Real-IP absent from trusted peer %s; falling back to XFF[0]", peer
+                    )
+                    return xff
+        except ValueError:
+            pass
+    return peer
+
+
+@dataclass
+class _Record:
+    failures: int = 0
+    window_start: float = field(default_factory=time.time)
+    blocked_until: float = 0.0
+
+
+class LoginRateLimiter:
+    def __init__(
+        self,
+        max_failures: int = 5,
+        window_seconds: int = 300,
+        cooldown_seconds: int = 900,
+    ) -> None:
+        self._max = max_failures
+        self._window = window_seconds
+        self._cooldown = cooldown_seconds
+        self._store: dict[str, _Record] = {}
+        self._lock = Lock()
+
+    @property
+    def cooldown_seconds(self) -> int:
+        return self._cooldown
+
+    def is_blocked(self, ip: str) -> bool:
+        now = time.time()
+        with self._lock:
+            rec = self._store.get(ip)
+            if rec is None:
+                return False
+            if rec.blocked_until > now:
+                return True
+            if rec.blocked_until > 0:
+                del self._store[ip]
+            return False
+
+    def record_failure(self, ip: str) -> None:
+        now = time.time()
+        with self._lock:
+            rec = self._store.get(ip)
+            if rec is None:
+                rec = _Record(window_start=now)
+                self._store[ip] = rec
+            if now - rec.window_start > self._window:
+                rec.failures = 0
+                rec.window_start = now
+            rec.failures += 1
+            if rec.failures >= self._max:
+                rec.blocked_until = now + self._cooldown
+                logger.warning("Login blocked for %s after %d failures", ip, rec.failures)
+
+    def record_success(self, ip: str) -> None:
+        with self._lock:
+            self._store.pop(ip, None)
--- a/api/app/config.py
+++ b/api/app/config.py
@@ -1,5 +1,6 @@
 from functools import lru_cache

+from pydantic import field_validator
 from pydantic_settings import BaseSettings, SettingsConfigDict


@@ -13,11 +14,29 @@ class Settings(BaseSettings):
    s3_secret_access_key: str
    s3_region: str = "us-east-1"
    api_base_url: str = "http://localhost:8000"
+    s3_public_base_url: str | None = None
    max_upload_bytes: int = 52_428_800  # 50 MiB
    jwt_secret_key: str
    jwt_expiry_seconds: int = 86400
    owner_username: str
    owner_password: str
+    login_max_failures: int = 5
+    login_window_seconds: int = 300
+    login_cooldown_seconds: int = 900
+    login_trusted_proxy_ips: str = ""
+    api_docs_enabled: bool = True
+
+    @field_validator("api_docs_enabled", mode="before")
+    @classmethod
+    def coerce_docs_enabled(cls, v):
+        if isinstance(v, bool):
+            return v
+        try:
+            from pydantic import TypeAdapter
+
+            return TypeAdapter(bool).validate_python(v)
+        except Exception:
+            return True


@lru_cache
--- a/api/app/main.py
+++ b/api/app/main.py
@@ -1,17 +1,30 @@
-from contextlib import asynccontextmanager
+import ipaddress
+from contextlib import asynccontextmanager, suppress

 from fastapi import FastAPI, Request
 from fastapi.exceptions import HTTPException
 from fastapi.responses import JSONResponse

+from app.auth.rate_limiter import LoginRateLimiter
 from app.config import get_settings
 from app.database import Base, get_engine


@asynccontextmanager
 async def lifespan(application: FastAPI):
-    get_settings()
-    # Verify DB connection and run migrations on startup
+    settings = get_settings()
+    application.state.login_rate_limiter = LoginRateLimiter(
+        max_failures=settings.login_max_failures,
+        window_seconds=settings.login_window_seconds,
+        cooldown_seconds=settings.login_cooldown_seconds,
+    )
+    trusted_networks = []
+    for part in settings.login_trusted_proxy_ips.split(","):
+        part = part.strip()
+        if part:
+            with suppress(ValueError):
+                trusted_networks.append(ipaddress.ip_network(part, strict=False))
+    application.state.login_trusted_networks = trusted_networks
    engine = get_engine()
    async with engine.begin() as conn:
        # In production, Alembic handles migrations; this is a dev convenience
@@ -20,7 +33,20 @@ async def lifespan(application: FastAPI):
    await engine.dispose()


-app = FastAPI(title="Reactbin API", version="1.0.0", lifespan=lifespan)
+_settings = get_settings()
+
+app = FastAPI(
+    title="Reactbin API",
+    version="1.0.0",
+    lifespan=lifespan,
+    docs_url="/docs" if _settings.api_docs_enabled else None,
+    redoc_url="/redoc" if _settings.api_docs_enabled else None,
+    openapi_url="/openapi.json" if _settings.api_docs_enabled else None,
+)
+
+# Defaults so app.state is populated even when lifespan doesn't run (e.g. tests)
+app.state.login_rate_limiter = LoginRateLimiter()
+app.state.login_trusted_networks = []


@app.exception_handler(HTTPException)
--- a/api/app/models.py
+++ b/api/app/models.py
@@ -22,6 +22,7 @@ class Image(Base):
    size_bytes: Mapped[int] = mapped_column(BigInteger, nullable=False)
    width: Mapped[int] = mapped_column(Integer, nullable=False)
    height: Mapped[int] = mapped_column(Integer, nullable=False)
+    short_id: Mapped[str | None] = mapped_column(String(8), unique=True, nullable=True, index=True)
    storage_key: Mapped[str] = mapped_column(String(64), nullable=False)
    thumbnail_key: Mapped[str | None] = mapped_column(String(70), nullable=True, default=None)
    created_at: Mapped[datetime] = mapped_column(
--- a/api/app/repositories/image_repo.py
+++ b/api/app/repositories/image_repo.py
@@ -27,6 +27,14 @@ class ImageRepository:
        )
        return result.scalar_one_or_none()

+    async def get_by_short_id(self, short_id: str) -> Image | None:
+        result = await self._session.execute(
+            select(Image)
+            .where(Image.short_id == short_id)
+            .options(selectinload(Image.image_tags).selectinload(ImageTag.tag))
+        )
+        return result.scalar_one_or_none()
+
    async def create(
        self,
        *,
@@ -37,6 +45,7 @@ class ImageRepository:
        width: int,
        height: int,
        storage_key: str,
+        short_id: str,
        thumbnail_key: str | None = None,
    ) -> Image:
        image = Image(
@@ -47,6 +56,7 @@ class ImageRepository:
            width=width,
            height=height,
            storage_key=storage_key,
+            short_id=short_id,
            thumbnail_key=thumbnail_key,
        )
        self._session.add(image)
--- a/api/app/repositories/tag_repo.py
+++ b/api/app/repositories/tag_repo.py
@@ -48,9 +48,7 @@ class TagRepository:
        for name in tag_names:
            tag = await self.upsert_by_name(name)
            existing = await self._session.execute(
-                select(ImageTag).where(
-                    ImageTag.image_id == image.id, ImageTag.tag_id == tag.id
-                )
+                select(ImageTag).where(ImageTag.image_id == image.id, ImageTag.tag_id == tag.id)
            )
            if existing.scalar_one_or_none() is None:
                self._session.add(ImageTag(image_id=image.id, tag_id=tag.id))
@@ -88,7 +86,7 @@ class TagRepository:

        query = select(Tag, count_subq.label("image_count"))
        if prefix:
-            query = query.where(Tag.name.like(f"{prefix}%"))
+            query = query.where(Tag.name.ilike(f"%{prefix}%"))
        if min_count > 0:
            query = query.where(count_subq >= min_count)

@@ -102,7 +100,6 @@ class TagRepository:
        rows = await self._session.execute(paginated)

        items = [
-            {"id": str(tag.id), "name": tag.name, "image_count": count}
-            for tag, count in rows.all()
+            {"id": str(tag.id), "name": tag.name, "image_count": count} for tag, count in rows.all()
        ]
        return items, total
--- a/api/app/routers/auth.py
+++ b/api/app/routers/auth.py
@@ -1,7 +1,9 @@
-from fastapi import APIRouter, Depends, HTTPException
+from fastapi import APIRouter, Depends, HTTPException, Request
+from fastapi.responses import JSONResponse
 from pydantic import BaseModel

 from app.auth.jwt_provider import JWTAuthProvider
+from app.auth.rate_limiter import LoginRateLimiter, get_client_ip
 from app.dependencies import get_jwt_auth

 router = APIRouter(tags=["auth"])
@@ -19,12 +21,32 @@ class TokenResponse(BaseModel):


@router.post("/auth/token", response_model=TokenResponse)
-async def login(body: LoginRequest, auth: JWTAuthProvider = Depends(get_jwt_auth)):
+async def login(
+    request: Request,
+    body: LoginRequest,
+    auth: JWTAuthProvider = Depends(get_jwt_auth),
+):
+    limiter: LoginRateLimiter = request.app.state.login_rate_limiter
+    ip: str = get_client_ip(request, request.app.state.login_trusted_networks)
+
+    if limiter.is_blocked(ip):
+        return JSONResponse(
+            status_code=429,
+            content={
+                "detail": "Too many failed login attempts. Please try again later.",
+                "code": "login_rate_limited",
+            },
+            headers={"Retry-After": str(limiter.cooldown_seconds)},
+        )
+
    if not auth.verify_credentials(body.username, body.password):
+        limiter.record_failure(ip)
        raise HTTPException(
            status_code=401,
            detail={"detail": "Invalid credentials", "code": "invalid_credentials"},
        )
+
+    limiter.record_success(ip)
    token = auth.create_token()
    return TokenResponse(
        access_token=token,
--- a/api/app/routers/images.py
+++ b/api/app/routers/images.py
@@ -1,7 +1,7 @@
 import asyncio
 import logging
+import re
 import struct
-import uuid
 from typing import Any

 from fastapi import APIRouter, Depends, File, Form, HTTPException, Response, UploadFile
@@ -15,7 +15,7 @@ from app.repositories.image_repo import ImageRepository
 from app.repositories.tag_repo import TagRepository
 from app.storage.backend import StorageBackend
 from app.thumbnail import generate_thumbnail
-from app.utils import compute_sha256
+from app.utils import compute_sha256, generate_short_id
 from app.validation import FileSizeError, MimeTypeError, validate_file_size, validate_mime_type

 logger = logging.getLogger(__name__)
@@ -23,13 +23,35 @@ logger = logging.getLogger(__name__)
 router = APIRouter(tags=["images"])


+_SHORT_ID_RE = re.compile(r"^[a-zA-Z0-9]{8}$")
+
+
 def _error(detail: str, code: str, status: int):
    raise HTTPException(status_code=status, detail={"detail": detail, "code": code})


-def _image_to_dict(image: Image, *, duplicate: bool | None = None) -> dict[str, Any]:
+def _validate_short_id(short_id: str) -> str:
+    if not _SHORT_ID_RE.match(short_id):
+        raise HTTPException(
+            status_code=422,
+            detail={"detail": "Invalid image ID", "code": "invalid_short_id"},
+        )
+    return short_id
+
+
+def _image_to_dict(
+    image: Image, *, cdn_base: str | None = None, duplicate: bool | None = None
+) -> dict[str, Any]:
+    _base = cdn_base.strip().rstrip("/") if cdn_base else None
+    file_url = f"{_base}/{image.storage_key}" if _base else f"/api/v1/i/{image.short_id}/file"
+    thumbnail_url = (
+        (f"{_base}/{image.thumbnail_key}" if _base else f"/api/v1/i/{image.short_id}/thumbnail")
+        if image.thumbnail_key
+        else None
+    )
    data: dict[str, Any] = {
        "id": str(image.id),
+        "short_id": image.short_id,
        "hash": image.hash,
        "filename": image.filename,
        "mime_type": image.mime_type,
@@ -38,6 +60,8 @@ def _image_to_dict(image: Image, *, duplicate: bool | None = None) -> dict[str,
        "height": image.height,
        "storage_key": image.storage_key,
        "thumbnail_key": image.thumbnail_key,
+        "file_url": file_url,
+        "thumbnail_url": thumbnail_url,
        "created_at": image.created_at.isoformat(),
        "tags": image.tags,
    }
@@ -133,10 +157,13 @@ async def upload_image(

    hash_hex = compute_sha256(data)
    image_repo = ImageRepository(db)
+    _cdn_base = settings.s3_public_base_url
    existing = await image_repo.get_by_hash(hash_hex)
    if existing:
        return Response(
-            content=__import__("json").dumps(_image_to_dict(existing, duplicate=True)),
+            content=__import__("json").dumps(
+                _image_to_dict(existing, cdn_base=_cdn_base, duplicate=True)
+            ),
            status_code=200,
            media_type="application/json",
        )
@@ -155,35 +182,55 @@ async def upload_image(
            )

    width, height = _read_image_dimensions(data, mime_type)
-    await storage.put(hash_hex, data, mime_type)

-    thumbnail_key: str | None = None
-    try:
-        thumb_bytes = await asyncio.to_thread(generate_thumbnail, data, mime_type)
-        await storage.put(f"{hash_hex}-thumb", thumb_bytes, "image/webp")
-        thumbnail_key = f"{hash_hex}-thumb"
-    except Exception:
-        logger.warning(
-            "Thumbnail generation failed for %s; upload will proceed without thumbnail", hash_hex
+    from sqlalchemy.exc import IntegrityError
+
+    for _ in range(10):
+        short_id = generate_short_id()
+        await storage.put(short_id, data, mime_type)
+
+        thumbnail_key: str | None = None
+        try:
+            thumb_bytes = await asyncio.to_thread(generate_thumbnail, data, mime_type)
+            await storage.put(f"{short_id}-thumb", thumb_bytes, "image/webp")
+            thumbnail_key = f"{short_id}-thumb"
+        except Exception:
+            logger.warning(
+                "Thumbnail generation failed for %s; proceeding without thumbnail", short_id
+            )
+
+        try:
+            image = await image_repo.create(
+                hash_hex=hash_hex,
+                filename=file.filename or "upload",
+                mime_type=mime_type,
+                size_bytes=len(data),
+                width=width,
+                height=height,
+                storage_key=short_id,
+                short_id=short_id,
+                thumbnail_key=thumbnail_key,
+            )
+            break
+        except IntegrityError:
+            await db.rollback()
+            await storage.delete(short_id)
+            if thumbnail_key:
+                await storage.delete(thumbnail_key)
+            thumbnail_key = None
+            continue
+    else:
+        raise HTTPException(
+            status_code=500,
+            detail={"detail": "Failed to assign unique ID", "code": "id_collision"},
        )

-    image = await image_repo.create(
-        hash_hex=hash_hex,
-        filename=file.filename or "upload",
-        mime_type=mime_type,
-        size_bytes=len(data),
-        width=width,
-        height=height,
-        storage_key=hash_hex,
-        thumbnail_key=thumbnail_key,
-    )
-
    if tag_names:
        tag_repo = TagRepository(db)
        await tag_repo.attach_tags(image, tag_names)
        image = await image_repo.reload_with_tags(image.id)

-    return _image_to_dict(image, duplicate=False)
+    return _image_to_dict(image, cdn_base=_cdn_base, duplicate=False)


@router.get("/images")
@@ -192,42 +239,48 @@ async def list_images(
    limit: int = 50,
    offset: int = 0,
    db: AsyncSession = Depends(get_db),
+    settings=Depends(get_settings),
 ):
    limit = min(limit, 100)
+    _cdn_base = settings.s3_public_base_url
    tag_names = [t.strip() for t in tags.split(",") if t.strip()] if tags else None
    image_repo = ImageRepository(db)
    images, total = await image_repo.list_images(tag_names=tag_names, limit=limit, offset=offset)
    return {
-        "items": [_image_to_dict(img) for img in images],
+        "items": [_image_to_dict(img, cdn_base=_cdn_base) for img in images],
        "total": total,
        "limit": limit,
        "offset": offset,
    }


-@router.get("/images/{image_id}")
+@router.get("/i/{short_id}")
 async def get_image(
-    image_id: uuid.UUID,
+    short_id: str,
    db: AsyncSession = Depends(get_db),
+    settings=Depends(get_settings),
 ):
+    _validate_short_id(short_id)
+    _cdn_base = settings.s3_public_base_url
    image_repo = ImageRepository(db)
-    image = await image_repo.get_by_id(image_id)
+    image = await image_repo.get_by_short_id(short_id)
    if not image:
        raise HTTPException(
            status_code=404,
            detail={"detail": "Image not found", "code": "image_not_found"},
        )
-    return _image_to_dict(image)
+    return _image_to_dict(image, cdn_base=_cdn_base)


-@router.get("/images/{image_id}/file")
+@router.get("/i/{short_id}/file")
 async def serve_image_file(
-    image_id: uuid.UUID,
+    short_id: str,
    db: AsyncSession = Depends(get_db),
    storage: StorageBackend = Depends(get_storage),
 ):
+    _validate_short_id(short_id)
    image_repo = ImageRepository(db)
-    image = await image_repo.get_by_id(image_id)
+    image = await image_repo.get_by_short_id(short_id)
    if not image:
        raise HTTPException(
            status_code=404,
@@ -250,14 +303,15 @@ async def serve_image_file(
    )


-@router.get("/images/{image_id}/thumbnail")
+@router.get("/i/{short_id}/thumbnail")
 async def serve_image_thumbnail(
-    image_id: uuid.UUID,
+    short_id: str,
    db: AsyncSession = Depends(get_db),
    storage: StorageBackend = Depends(get_storage),
 ):
+    _validate_short_id(short_id)
    image_repo = ImageRepository(db)
-    image = await image_repo.get_by_id(image_id)
+    image = await image_repo.get_by_short_id(short_id)
    if not image:
        raise HTTPException(
            status_code=404,
@@ -282,15 +336,18 @@ async def serve_image_thumbnail(
    )


-@router.patch("/images/{image_id}/tags")
+@router.patch("/i/{short_id}/tags")
 async def update_image_tags(
-    image_id: uuid.UUID,
+    short_id: str,
    body: dict,
    db: AsyncSession = Depends(get_db),
    _: Identity = Depends(require_auth),
+    settings=Depends(get_settings),
 ):
+    _validate_short_id(short_id)
+    _cdn_base = settings.s3_public_base_url
    image_repo = ImageRepository(db)
-    image = await image_repo.get_by_id(image_id)
+    image = await image_repo.get_by_short_id(short_id)
    if not image:
        raise HTTPException(
            status_code=404,
@@ -309,18 +366,19 @@ async def update_image_tags(

    await tag_repo.replace_tags_on_image(image, tag_names)
    image = await image_repo.reload_with_tags(image.id)
-    return _image_to_dict(image)
+    return _image_to_dict(image, cdn_base=_cdn_base)


-@router.delete("/images/{image_id}", status_code=204)
+@router.delete("/i/{short_id}", status_code=204)
 async def delete_image(
-    image_id: uuid.UUID,
+    short_id: str,
    db: AsyncSession = Depends(get_db),
    storage: StorageBackend = Depends(get_storage),
    _: Identity = Depends(require_auth),
 ):
+    _validate_short_id(short_id)
    image_repo = ImageRepository(db)
-    image = await image_repo.get_by_id(image_id)
+    image = await image_repo.get_by_short_id(short_id)
    if not image:
        raise HTTPException(
            status_code=404,
--- a/api/app/utils.py
+++ b/api/app/utils.py
@@ -1,5 +1,13 @@
 import hashlib
+import secrets
+import string
+
+BASE62 = string.ascii_letters + string.digits


 def compute_sha256(data: bytes) -> str:
    return hashlib.sha256(data).hexdigest()
+
+
+def generate_short_id(length: int = 8) -> str:
+    return "".join(secrets.choice(BASE62) for _ in range(length))
--- a/api/scripts/init.py
+++ b/api/scripts/init.py
--- a/api/scripts/migrate_to_short_ids.py
+++ b/api/scripts/migrate_to_short_ids.py
@@ -0,0 +1,107 @@
+"""
+Migrate existing images to use short_id-based storage keys.
+
+Run after applying Alembic migration 003 (adds short_id column).
+Run before applying migration 004 (sets short_id NOT NULL).
+
+Usage:
+    python -m scripts.migrate_to_short_ids
+"""
+
+import asyncio
+import logging
+from typing import Any
+
+from sqlalchemy import select
+
+from app.database import get_session_factory
+from app.models import Image
+from app.storage.s3_backend import S3StorageBackend
+from app.utils import generate_short_id
+
+logger = logging.getLogger(__name__)
+
+
+async def migrate_image(image: Any, storage: Any, session: Any) -> bool:
+    """Migrate one image to a short_id-based key. Returns True if migrated, False if skipped."""
+    if image.short_id is not None:
+        return False
+
+    new_short_id = generate_short_id()
+    old_key = image.storage_key
+    old_thumb_key = image.thumbnail_key
+
+    try:
+        data = await storage.get(old_key)
+        await storage.put(new_short_id, data, image.mime_type)
+        # Verify copy succeeded
+        await storage.get(new_short_id)
+    except Exception as exc:
+        logger.error("Failed to copy storage object for image %s: %s", image.id, exc)
+        return False
+
+    new_thumb_key: str | None = None
+    if old_thumb_key:
+        try:
+            thumb_data = await storage.get(old_thumb_key)
+            new_thumb_key = f"{new_short_id}-thumb"
+            await storage.put(new_thumb_key, thumb_data, "image/webp")
+            await storage.get(new_thumb_key)
+        except Exception as exc:
+            logger.warning("Failed to copy thumbnail for image %s: %s", image.id, exc)
+            new_thumb_key = None
+
+    try:
+        image.short_id = new_short_id
+        image.storage_key = new_short_id
+        image.thumbnail_key = new_thumb_key
+        await session.flush()
+
+        await storage.delete(old_key)
+        if old_thumb_key and new_thumb_key:
+            await storage.delete(old_thumb_key)
+    except Exception as exc:
+        logger.error("Failed to update DB record for image %s: %s", image.id, exc)
+        return False
+
+    return True
+
+
+async def run_migration(images: list, storage: Any, session: Any) -> tuple[int, int, int]:
+    """Process a list of images. Returns (migrated, skipped, failed) counts."""
+    migrated = skipped = failed = 0
+    for image in images:
+        if image.short_id is not None:
+            skipped += 1
+            continue
+        try:
+            success = await migrate_image(image, storage, session)
+            if success:
+                migrated += 1
+            else:
+                failed += 1
+        except Exception as exc:
+            logger.error("Unexpected error migrating image %s: %s", image.id, exc)
+            failed += 1
+
+    return migrated, skipped, failed
+
+
+async def main() -> None:
+    logging.basicConfig(level=logging.INFO)
+
+    storage = S3StorageBackend()
+
+    async with get_session_factory()() as session:
+        result = await session.execute(select(Image).where(Image.short_id.is_(None)))
+        images = list(result.scalars().all())
+        logger.info("Found %d images to migrate", len(images))
+
+        migrated, skipped, failed = await run_migration(images, storage, session)
+        await session.commit()
+
+    print(f"Migrated: {migrated}, Skipped: {skipped}, Failed: {failed}")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/api/tests/build/.gitkeep
+++ b/api/tests/build/.gitkeep
--- a/api/tests/build/verify_production_image.sh
+++ b/api/tests/build/verify_production_image.sh
@@ -0,0 +1,119 @@
+#!/usr/bin/env bash
+# TDD verification script for api/Dockerfile.prod
+# Fails (red) if Dockerfile.prod does not exist or any check fails.
+set -euo pipefail
+
+IMAGE="reactbin-api-prod:verify-$$"
+IMAGE2="reactbin-api-prod:verify-cache-$$"
+PG_CONTAINER=""
+APP_CONTAINER=""
+
+cleanup() {
+    [ -n "$APP_CONTAINER" ] && docker rm -f "$APP_CONTAINER" 2>/dev/null || true
+    [ -n "$PG_CONTAINER" ] && docker rm -f "$PG_CONTAINER" 2>/dev/null || true
+    docker rmi "$IMAGE" 2>/dev/null || true
+    docker rmi "$IMAGE2" 2>/dev/null || true
+}
+trap cleanup EXIT
+
+# ── US1 check 1: build ────────────────────────────────────────────────────────
+echo "[verify] Building $IMAGE..."
+docker build -f api/Dockerfile.prod api/ -t "$IMAGE"
+echo "[verify] Build OK"
+
+# ── US1 check 2: start with a throwaway postgres ──────────────────────────────
+echo "[verify] Starting postgres..."
+PG_CONTAINER=$(docker run -d \
+    -e POSTGRES_DB=reactbin_verify \
+    -e POSTGRES_USER=verify \
+    -e POSTGRES_PASSWORD=verify \
+    postgres:16-alpine)
+
+for i in $(seq 1 30); do
+    if docker exec "$PG_CONTAINER" pg_isready -U verify -q 2>/dev/null; then break; fi
+    sleep 1
+    if [[ $i -eq 30 ]]; then echo "FAIL: postgres did not become ready"; exit 1; fi
+done
+
+PG_IP=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' "$PG_CONTAINER")
+
+echo "[verify] Starting production container..."
+APP_CONTAINER=$(docker run -d \
+    -p 18000:8000 \
+    -e JWT_SECRET_KEY=verify-key \
+    -e OWNER_USERNAME=testowner \
+    -e OWNER_PASSWORD=testpassword \
+    -e DATABASE_URL="postgresql+asyncpg://verify:verify@${PG_IP}:5432/reactbin_verify" \
+    -e S3_ENDPOINT_URL=http://noop:9000 \
+    -e S3_BUCKET_NAME=noop \
+    -e S3_ACCESS_KEY_ID=noop \
+    -e S3_SECRET_ACCESS_KEY=noop \
+    -e S3_REGION=us-east-1 \
+    "$IMAGE")
+
+# ── US1 check 3: health endpoint ──────────────────────────────────────────────
+echo "[verify] Polling health endpoint..."
+for i in $(seq 1 30); do
+    if curl -sf http://localhost:18000/api/v1/health > /dev/null; then break; fi
+    sleep 1
+    if [[ $i -eq 30 ]]; then echo "FAIL: health check timed out after 30s"; exit 1; fi
+done
+echo "[verify] Health check passed"
+
+# ── US2 check 1: non-root user ────────────────────────────────────────────────
+UID_IN_CONTAINER=$(docker exec "$APP_CONTAINER" id -u)
+if [[ "$UID_IN_CONTAINER" -eq 0 ]]; then
+    echo "FAIL: process running as root (UID 0)"; exit 1
+fi
+echo "[verify] Non-root user OK (UID $UID_IN_CONTAINER)"
+
+# ── C1: stdout/stderr log capture ────────────────────────────────────────────
+LOGS=$(docker logs "$APP_CONTAINER" 2>&1)
+if [[ -z "$LOGS" ]]; then
+    echo "FAIL: no output on stdout/stderr"; exit 1
+fi
+if ! echo "$LOGS" | grep -qiE "(started server|application startup complete|uvicorn)"; then
+    echo "FAIL: no startup logs found on stdout/stderr"; exit 1
+fi
+echo "[verify] Stdout logging OK"
+
+# ── US1 check 4: SIGTERM → exit 0 ────────────────────────────────────────────
+docker stop "$APP_CONTAINER" > /dev/null
+EXIT_CODE=$(docker wait "$APP_CONTAINER")
+if [[ "$EXIT_CODE" -ne 0 ]]; then
+    echo "FAIL: non-zero exit code $EXIT_CODE after SIGTERM"; exit 1
+fi
+echo "[verify] Graceful shutdown OK (exit $EXIT_CODE)"
+
+# ── US2 check 2: dev deps absent ─────────────────────────────────────────────
+if docker run --rm "$IMAGE" /app/.venv/bin/python -c "import pytest" 2>/dev/null; then
+    echo "FAIL: pytest importable in production image (dev deps present)"; exit 1
+fi
+echo "[verify] Dev deps absent OK"
+
+# ── C2: no hardcoded secrets in image layers ─────────────────────────────────
+if docker history --no-trunc "$IMAGE" 2>&1 | grep -qiE "(password|secret_key|api_key|token)"; then
+    echo "FAIL: potential secret found in image history"; exit 1
+fi
+echo "[verify] No secrets in image layers OK"
+
+# ── C3: missing env var → non-zero exit ──────────────────────────────────────
+set +e
+docker run --rm -e JWT_SECRET_KEY=verify-key "$IMAGE" 2>/dev/null
+MISSING_ENV_EXIT=$?
+set -e
+if [[ "$MISSING_ENV_EXIT" -eq 0 ]]; then
+    echo "FAIL: container exited 0 despite missing OWNER_USERNAME"; exit 1
+fi
+echo "[verify] Missing-env-var exit check OK (exit $MISSING_ENV_EXIT)"
+
+# ── US3: dep layer cached on source-only rebuild ──────────────────────────────
+echo "[verify] Testing cache hit on source-only rebuild..."
+touch api/app/main.py
+BUILD2_OUTPUT=$(docker build --progress=plain -f api/Dockerfile.prod api/ -t "$IMAGE2" 2>&1)
+if ! echo "$BUILD2_OUTPUT" | grep -q "CACHED"; then
+    echo "FAIL: dependency layer not reused on source-only rebuild"; exit 1
+fi
+echo "[verify] Dep layer cache hit confirmed (US3 OK)"
+
+echo "[verify] All checks passed (US1 + US2 + US3)."
--- a/api/tests/integration/test_delete.py
+++ b/api/tests/integration/test_delete.py
@@ -1,10 +1,9 @@
 """
-T065 — DELETE /api/v1/images/{id} → 204; subsequent GET returns 404
+T065 — DELETE /api/v1/i/{short_id} → 204; subsequent GET returns 404
 T066 — DELETE verifies MinIO object is removed
 T067 — DELETE of unknown ID → 404 image_not_found
 """
 import io
-import uuid

 import pytest
 from PIL import Image as PILImage
@@ -28,12 +27,12 @@ async def test_delete_removes_record(authed_client):
        files={"file": ("del-test.jpg", io.BytesIO(data), "image/jpeg")},
        headers=headers,
    )
-    image_id = upload.json()["id"]
+    image_id = upload.json()["short_id"]

-    delete_resp = await client.delete(f"/api/v1/images/{image_id}", headers=headers)
+    delete_resp = await client.delete(f"/api/v1/i/{image_id}", headers=headers)
    assert delete_resp.status_code == 204

-    get_resp = await client.get(f"/api/v1/images/{image_id}")
+    get_resp = await client.get(f"/api/v1/i/{image_id}")
    assert get_resp.status_code == 404
    assert get_resp.json()["code"] == "image_not_found"

@@ -49,13 +48,13 @@ async def test_delete_removes_storage_object(authed_client):
        headers=headers,
    )
    assert upload.status_code in (200, 201)
-    image_id = upload.json()["id"]
+    image_id = upload.json()["short_id"]

-    delete_resp = await client.delete(f"/api/v1/images/{image_id}", headers=headers)
+    delete_resp = await client.delete(f"/api/v1/i/{image_id}", headers=headers)
    assert delete_resp.status_code == 204

    # Confirm storage redirect no longer works (404 since record is gone)
-    file_resp = await client.get(f"/api/v1/images/{image_id}/file")
+    file_resp = await client.get(f"/api/v1/i/{image_id}/file")
    assert file_resp.status_code == 404


@@ -63,7 +62,7 @@ async def test_delete_removes_storage_object(authed_client):
 async def test_delete_unknown_id_returns_404(authed_client):
    client, token = authed_client
    response = await client.delete(
-        f"/api/v1/images/{uuid.uuid4()}",
+        "/api/v1/i/NotFound",
        headers={"Authorization": f"Bearer {token}"},
    )
    assert response.status_code == 404
@@ -85,12 +84,12 @@ async def test_delete_removes_thumbnail(authed_client):
        headers=headers,
    )
    assert upload.status_code == 201
-    image_id = upload.json()["id"]
+    image_id = upload.json()["short_id"]
    assert upload.json()["thumbnail_key"] is not None

-    delete_resp = await client.delete(f"/api/v1/images/{image_id}", headers=headers)
+    delete_resp = await client.delete(f"/api/v1/i/{image_id}", headers=headers)
    assert delete_resp.status_code == 204

-    thumb_resp = await client.get(f"/api/v1/images/{image_id}/thumbnail")
+    thumb_resp = await client.get(f"/api/v1/i/{image_id}/thumbnail")
    assert thumb_resp.status_code == 404
    assert thumb_resp.json()["code"] == "image_not_found"
--- a/api/tests/integration/test_docs_gate.py
+++ b/api/tests/integration/test_docs_gate.py
@@ -0,0 +1,48 @@
+import importlib
+
+from starlette.testclient import TestClient
+
+from app.config import get_settings
+
+_BASE_ENV = {
+    "DATABASE_URL": "postgresql+asyncpg://u:p@localhost/db",
+    "JWT_SECRET_KEY": "test-secret",
+    "OWNER_USERNAME": "admin",
+    "OWNER_PASSWORD": "password",
+    "S3_ENDPOINT_URL": "http://localhost:9000",
+    "S3_BUCKET_NAME": "test-bucket",
+    "S3_ACCESS_KEY_ID": "key",
+    "S3_SECRET_ACCESS_KEY": "secret",
+}
+
+
+def _set_env(monkeypatch, extra=None):
+    for k, v in {**_BASE_ENV, **(extra or {})}.items():
+        monkeypatch.setenv(k, v)
+
+
+def test_docs_hidden_when_flag_disabled(monkeypatch):
+    _set_env(monkeypatch, {"API_DOCS_ENABLED": "false"})
+    get_settings.cache_clear()
+    import app.main as m
+
+    importlib.reload(m)
+    client = TestClient(m.app, raise_server_exceptions=False)
+    assert client.get("/docs").status_code == 404
+    assert client.get("/redoc").status_code == 404
+    assert client.get("/openapi.json").status_code == 404
+    assert client.get("/api/v1/health").status_code == 200
+    get_settings.cache_clear()
+
+
+def test_docs_visible_when_flag_enabled(monkeypatch):
+    _set_env(monkeypatch, {"API_DOCS_ENABLED": "true"})
+    get_settings.cache_clear()
+    import app.main as m
+
+    importlib.reload(m)
+    client = TestClient(m.app, raise_server_exceptions=False)
+    assert client.get("/docs").status_code == 200
+    assert client.get("/redoc").status_code == 200
+    assert client.get("/openapi.json").status_code == 200
+    get_settings.cache_clear()
--- a/api/tests/integration/test_login_rate_limit.py
+++ b/api/tests/integration/test_login_rate_limit.py
@@ -0,0 +1,121 @@
+import os
+
+import pytest
+from httpx import AsyncClient
+
+from app.auth.rate_limiter import LoginRateLimiter
+from app.main import app
+
+BAD_CREDS = {"username": "attacker", "password": "wrong"}
+VALID_CREDS = {
+    "username": os.environ.get("OWNER_USERNAME", "testowner"),
+    "password": os.environ.get("OWNER_PASSWORD", "testpassword"),
+}
+
+
+def _fresh_limiter():
+    return LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=30)
+
+
+@pytest.mark.asyncio
+async def test_repeated_failures_trigger_429(client: AsyncClient):
+    original_limiter = app.state.login_rate_limiter
+    original_networks = app.state.login_trusted_networks
+    app.state.login_rate_limiter = _fresh_limiter()
+    app.state.login_trusted_networks = []
+    try:
+        for _ in range(3):
+            await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        resp = await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        assert resp.status_code == 429
+        assert resp.json()["code"] == "login_rate_limited"
+    finally:
+        app.state.login_rate_limiter = original_limiter
+        app.state.login_trusted_networks = original_networks
+
+
+@pytest.mark.asyncio
+async def test_success_resets_counter(client: AsyncClient):
+    original_limiter = app.state.login_rate_limiter
+    original_networks = app.state.login_trusted_networks
+    app.state.login_rate_limiter = _fresh_limiter()
+    app.state.login_trusted_networks = []
+    try:
+        for _ in range(2):
+            await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        await client.post("/api/v1/auth/token", json=VALID_CREDS)
+        for _ in range(3):
+            resp = await client.post("/api/v1/auth/token", json=BAD_CREDS)
+            assert resp.status_code == 401, "counter should have reset after success"
+    finally:
+        app.state.login_rate_limiter = original_limiter
+        app.state.login_trusted_networks = original_networks
+
+
+@pytest.mark.asyncio
+async def test_429_has_retry_after_header(client: AsyncClient):
+    original_limiter = app.state.login_rate_limiter
+    original_networks = app.state.login_trusted_networks
+    app.state.login_rate_limiter = _fresh_limiter()
+    app.state.login_trusted_networks = []
+    try:
+        for _ in range(3):
+            await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        resp = await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        assert resp.status_code == 429
+        assert "Retry-After" in resp.headers
+        assert int(resp.headers["Retry-After"]) > 0
+    finally:
+        app.state.login_rate_limiter = original_limiter
+        app.state.login_trusted_networks = original_networks
+
+
+@pytest.mark.asyncio
+async def test_429_body_shape(client: AsyncClient):
+    original_limiter = app.state.login_rate_limiter
+    original_networks = app.state.login_trusted_networks
+    app.state.login_rate_limiter = _fresh_limiter()
+    app.state.login_trusted_networks = []
+    try:
+        for _ in range(3):
+            await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        resp = await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        assert resp.status_code == 429
+        assert resp.json() == {
+            "detail": "Too many failed login attempts. Please try again later.",
+            "code": "login_rate_limited",
+        }
+    finally:
+        app.state.login_rate_limiter = original_limiter
+        app.state.login_trusted_networks = original_networks
+
+
+@pytest.mark.asyncio
+async def test_xff_header_ignored_when_no_trusted_networks(client: AsyncClient):
+    original_limiter = app.state.login_rate_limiter
+    original_networks = app.state.login_trusted_networks
+    app.state.login_rate_limiter = _fresh_limiter()
+    app.state.login_trusted_networks = []
+    try:
+        # Send 3 failures all claiming to be "1.2.3.4" via XFF
+        for _ in range(3):
+            await client.post(
+                "/api/v1/auth/token",
+                json=BAD_CREDS,
+                headers={"X-Forwarded-For": "1.2.3.4"},
+            )
+        # 4th request with a *different* XFF — if XFF were trusted, this
+        # would appear to be a fresh IP and get 401. Since XFF is ignored,
+        # the real peer ("testclient") is blocked and we get 429.
+        resp = await client.post(
+            "/api/v1/auth/token",
+            json=BAD_CREDS,
+            headers={"X-Forwarded-For": "9.9.9.9"},
+        )
+        assert resp.status_code == 429, (
+            "XFF should be ignored when no trusted networks are configured; "
+            "expected real peer to be blocked"
+        )
+    finally:
+        app.state.login_rate_limiter = original_limiter
+        app.state.login_trusted_networks = original_networks
--- a/api/tests/integration/test_protected.py
+++ b/api/tests/integration/test_protected.py
@@ -3,7 +3,6 @@ Tests that write endpoints require authentication (US2).
 These use the authed_client fixture which wires JWTAuthProvider.
 """
 import io
-import uuid

 import pytest

@@ -42,8 +41,7 @@ async def test_upload_with_valid_token_succeeds(authed_client):
@pytest.mark.asyncio
 async def test_delete_without_token_returns_401(authed_client):
    client, _ = authed_client
-    fake_id = uuid.uuid4()
-    response = await client.delete(f"/api/v1/images/{fake_id}")
+    response = await client.delete("/api/v1/i/NotFound")
    assert response.status_code == 401
    assert response.json().get("code") == "unauthorized"

@@ -57,9 +55,9 @@ async def test_delete_with_valid_token_succeeds(authed_client):
        files={"file": ("del-protected.jpg", io.BytesIO(data), "image/jpeg")},
        headers={"Authorization": f"Bearer {token}"},
    )
-    image_id = upload.json()["id"]
+    image_id = upload.json()["short_id"]
    response = await client.delete(
-        f"/api/v1/images/{image_id}",
+        f"/api/v1/i/{image_id}",
        headers={"Authorization": f"Bearer {token}"},
    )
    assert response.status_code == 204
@@ -68,9 +66,8 @@ async def test_delete_with_valid_token_succeeds(authed_client):
@pytest.mark.asyncio
 async def test_patch_tags_without_token_returns_401(authed_client):
    client, _ = authed_client
-    fake_id = uuid.uuid4()
    response = await client.patch(
-        f"/api/v1/images/{fake_id}/tags",
+        "/api/v1/i/NotFound/tags",
        json={"tags": ["a"]},
    )
    assert response.status_code == 401
@@ -86,9 +83,9 @@ async def test_patch_tags_with_valid_token_succeeds(authed_client):
        files={"file": ("tag-protected.jpg", io.BytesIO(data), "image/jpeg")},
        headers={"Authorization": f"Bearer {token}"},
    )
-    image_id = upload.json()["id"]
+    image_id = upload.json()["short_id"]
    response = await client.patch(
-        f"/api/v1/images/{image_id}/tags",
+        f"/api/v1/i/{image_id}/tags",
        json={"tags": ["protected-tag"]},
        headers={"Authorization": f"Bearer {token}"},
    )
--- a/api/tests/integration/test_public_access.py
+++ b/api/tests/integration/test_public_access.py
@@ -30,8 +30,8 @@ async def test_get_image_without_token_is_200(authed_client):
        files={"file": ("pub-test.jpg", io.BytesIO(data), "image/jpeg")},
        headers={"Authorization": f"Bearer {token}"},
    )
-    image_id = upload.json()["id"]
-    response = await client.get(f"/api/v1/images/{image_id}")
+    image_id = upload.json()["short_id"]
+    response = await client.get(f"/api/v1/i/{image_id}")
    assert response.status_code == 200


@@ -44,8 +44,8 @@ async def test_serve_file_without_token_is_200(authed_client):
        files={"file": ("pub-file.jpg", io.BytesIO(data), "image/jpeg")},
        headers={"Authorization": f"Bearer {token}"},
    )
-    image_id = upload.json()["id"]
-    response = await client.get(f"/api/v1/images/{image_id}/file")
+    image_id = upload.json()["short_id"]
+    response = await client.get(f"/api/v1/i/{image_id}/file")
    assert response.status_code == 200


@@ -58,8 +58,8 @@ async def test_serve_thumbnail_without_token_is_200(authed_client):
        files={"file": ("pub-thumb.jpg", io.BytesIO(data), "image/jpeg")},
        headers={"Authorization": f"Bearer {token}"},
    )
-    image_id = upload.json()["id"]
-    response = await client.get(f"/api/v1/images/{image_id}/thumbnail")
+    image_id = upload.json()["short_id"]
+    response = await client.get(f"/api/v1/i/{image_id}/thumbnail")
    assert response.status_code == 200


--- a/api/tests/integration/test_serving.py
+++ b/api/tests/integration/test_serving.py
@@ -1,10 +1,9 @@
 """
-T055 — GET /api/v1/images/{id}/file → 200 with binary content, ETag, Cache-Control
+T055 — GET /api/v1/i/{short_id}/file → 200 with binary content, ETag, Cache-Control
 T056 — /file for unknown ID → 404 image_not_found
 T057 — /file response exposes no storage-specific details
 """
 import io
-import uuid

 import pytest
 from PIL import Image as PILImage
@@ -39,10 +38,10 @@ async def test_file_returns_200_with_content(authed_client):
    )
    assert upload.status_code in (200, 201)
    upload_body = upload.json()
-    image_id = upload_body["id"]
+    image_id = upload_body["short_id"]
    image_hash = upload_body["hash"]

-    response = await client.get(f"/api/v1/images/{image_id}/file")
+    response = await client.get(f"/api/v1/i/{image_id}/file")
    assert response.status_code == 200
    assert response.headers["content-type"].startswith("image/")
    assert response.headers["etag"] == f'"{image_hash}"'
@@ -52,7 +51,7 @@ async def test_file_returns_200_with_content(authed_client):

@pytest.mark.asyncio
 async def test_file_unknown_id_returns_404(client):
-    response = await client.get(f"/api/v1/images/{uuid.uuid4()}/file")
+    response = await client.get("/api/v1/i/NotFound/file")
    assert response.status_code == 404
    body = response.json()
    assert body["code"] == "image_not_found"
@@ -68,9 +67,9 @@ async def test_file_response_exposes_no_storage_details(authed_client):
        headers={"Authorization": f"Bearer {token}"},
    )
    assert upload.status_code in (200, 201)
-    image_id = upload.json()["id"]
+    image_id = upload.json()["short_id"]

-    response = await client.get(f"/api/v1/images/{image_id}/file")
+    response = await client.get(f"/api/v1/i/{image_id}/file")
    assert response.status_code == 200
    assert "location" not in response.headers
    assert "minio" not in response.text.lower()
@@ -89,10 +88,10 @@ async def test_thumbnail_returns_webp(authed_client):
    )
    assert upload.status_code == 201
    body = upload.json()
-    image_id = body["id"]
+    image_id = body["short_id"]
    image_hash = body["hash"]

-    response = await client.get(f"/api/v1/images/{image_id}/thumbnail")
+    response = await client.get(f"/api/v1/i/{image_id}/thumbnail")
    assert response.status_code == 200
    assert response.headers["content-type"] == "image/webp"
    assert response.headers["etag"] == f'"{image_hash}"'
@@ -110,15 +109,15 @@ async def test_thumbnail_fallback_returns_original(authed_client, db_session):
        headers={"Authorization": f"Bearer {token}"},
    )
    assert upload.status_code == 201
-    image_id = upload.json()["id"]
+    image_id = upload.json()["short_id"]

    await db_session.execute(
-        update(Image).where(Image.id == uuid.UUID(image_id)).values(thumbnail_key=None)
+        update(Image).where(Image.short_id == image_id).values(thumbnail_key=None)
    )
    await db_session.flush()
    db_session.expire_all()

-    response = await client.get(f"/api/v1/images/{image_id}/thumbnail")
+    response = await client.get(f"/api/v1/i/{image_id}/thumbnail")
    assert response.status_code == 200
    assert "image/jpeg" in response.headers["content-type"]
    assert len(response.content) > 0
@@ -126,7 +125,7 @@ async def test_thumbnail_fallback_returns_original(authed_client, db_session):

@pytest.mark.asyncio
 async def test_thumbnail_unknown_id_returns_404(client):
-    response = await client.get(f"/api/v1/images/{uuid.uuid4()}/thumbnail")
+    response = await client.get("/api/v1/i/NotFound/thumbnail")
    assert response.status_code == 404
    body = response.json()
    assert body["code"] == "image_not_found"
--- a/api/tests/integration/test_tags.py
+++ b/api/tests/integration/test_tags.py
@@ -81,10 +81,10 @@ async def test_patch_replaces_tag_set(authed_client):
        data={"tags": "old-tag"},
        headers=headers,
    )
-    image_id = r1.json()["id"]
+    image_id = r1.json()["short_id"]

    patch = await client.patch(
-        f"/api/v1/images/{image_id}/tags",
+        f"/api/v1/i/{image_id}/tags",
        json={"tags": ["new-tag", "another"]},
        headers=headers,
    )
@@ -104,10 +104,10 @@ async def test_patch_invalid_tag_returns_422(authed_client):
        files={"file": ("invalid-tag-test.png", io.BytesIO(data), "image/png")},
        headers=headers,
    )
-    image_id = r1.json()["id"]
+    image_id = r1.json()["short_id"]

    patch = await client.patch(
-        f"/api/v1/images/{image_id}/tags",
+        f"/api/v1/i/{image_id}/tags",
        json={"tags": ["valid", "INVALID TAG WITH SPACES!"]},
        headers=headers,
    )
--- a/api/tests/integration/test_upload.py
+++ b/api/tests/integration/test_upload.py
@@ -3,10 +3,10 @@ T026 — valid JPEG upload → 201, record in DB, object in MinIO
 T027 — same image uploaded twice → 200, duplicate: true, no second MinIO object
 T028 — invalid MIME type → 422 invalid_mime_type (error envelope with code field)
 T029 — file > MAX_UPLOAD_BYTES → 422 file_too_large
-T079 — GET /api/v1/images/{id} 404 → error envelope shape
+T013 — upload produces short_id; storage_key equals short_id; thumbnail_key = {short_id}-thumb
 """
 import io
-import uuid
+import re
 from unittest.mock import patch

 import pytest
@@ -111,13 +111,81 @@ async def test_upload_oversized_file_returns_422(authed_client):

@pytest.mark.asyncio
 async def test_get_unknown_image_returns_404_with_envelope(client):
-    response = await client.get(f"/api/v1/images/{uuid.uuid4()}")
+    response = await client.get("/api/v1/i/NotFound")
    assert response.status_code == 404
    body = response.json()
    assert body["code"] == "image_not_found"
    assert "detail" in body


+_SHORT_ID_RE = re.compile(r"^[a-zA-Z0-9]{8}$")
+
+
+@pytest.mark.asyncio
+async def test_upload_returns_short_id(authed_client):
+    client, token = authed_client
+    data = _minimal_jpeg()
+    response = await client.post(
+        "/api/v1/images",
+        files={"file": ("s1.jpg", io.BytesIO(data), "image/jpeg")},
+        headers={"Authorization": f"Bearer {token}"},
+    )
+    assert response.status_code == 201
+    body = response.json()
+    assert "short_id" in body
+    assert _SHORT_ID_RE.match(body["short_id"]), f"short_id invalid: {body['short_id']}"
+
+
+@pytest.mark.asyncio
+async def test_upload_storage_key_equals_short_id(authed_client):
+    client, token = authed_client
+    data = _real_jpeg(color=(10, 20, 30))
+    response = await client.post(
+        "/api/v1/images",
+        files={"file": ("s2.jpg", io.BytesIO(data), "image/jpeg")},
+        headers={"Authorization": f"Bearer {token}"},
+    )
+    assert response.status_code == 201
+    body = response.json()
+    assert body["storage_key"] == body["short_id"]
+
+
+@pytest.mark.asyncio
+async def test_upload_thumbnail_key_equals_short_id_thumb(authed_client):
+    client, token = authed_client
+    data = _real_jpeg(color=(30, 60, 90))
+    response = await client.post(
+        "/api/v1/images",
+        files={"file": ("s3.jpg", io.BytesIO(data), "image/jpeg")},
+        headers={"Authorization": f"Bearer {token}"},
+    )
+    assert response.status_code == 201
+    body = response.json()
+    if body["thumbnail_key"] is not None:
+        assert body["thumbnail_key"] == f"{body['short_id']}-thumb"
+
+
+@pytest.mark.asyncio
+async def test_duplicate_upload_returns_same_short_id(authed_client):
+    client, token = authed_client
+    data = _real_jpeg(color=(200, 100, 50))
+    headers = {"Authorization": f"Bearer {token}"}
+    r1 = await client.post(
+        "/api/v1/images",
+        files={"file": ("dup_short.jpg", io.BytesIO(data), "image/jpeg")},
+        headers=headers,
+    )
+    assert r1.status_code in (200, 201)
+    r2 = await client.post(
+        "/api/v1/images",
+        files={"file": ("dup_short.jpg", io.BytesIO(data), "image/jpeg")},
+        headers=headers,
+    )
+    assert r2.status_code == 200
+    assert r2.json()["duplicate"] is True
+    assert r2.json()["short_id"] == r1.json()["short_id"]
+
+
@pytest.mark.asyncio
 async def test_upload_returns_thumbnail_key(authed_client):
    client, token = authed_client
@@ -132,6 +200,10 @@ async def test_upload_returns_thumbnail_key(authed_client):
    assert "thumbnail_key" in body
    assert body["thumbnail_key"] is not None
    assert body["thumbnail_key"].endswith("-thumb")
+    assert "file_url" in body
+    assert body["file_url"].startswith("/api/v1/i/")
+    assert "thumbnail_url" in body
+    assert body["thumbnail_url"].startswith("/api/v1/i/")


@pytest.mark.asyncio
@@ -172,3 +244,6 @@ async def test_upload_succeeds_when_thumbnail_fails(authed_client):
    assert response.status_code in (200, 201)
    body = response.json()
    assert body["thumbnail_key"] is None
+    assert "file_url" in body
+    assert body["file_url"].startswith("/api/v1/i/")
+    assert body["thumbnail_url"] is None
--- a/api/tests/unit/test_config.py
+++ b/api/tests/unit/test_config.py
@@ -1,5 +1,3 @@
-
-
 _BASE_ENV = {
    "DATABASE_URL": "postgresql+asyncpg://u:p@localhost/db",
    "S3_ENDPOINT_URL": "http://localhost:9000",
@@ -26,6 +24,7 @@ def test_settings_load_from_env(monkeypatch):
    import importlib

    import app.config as config_module
+
    importlib.reload(config_module)

    s = config_module.Settings()
@@ -43,6 +42,7 @@ def test_settings_max_upload_bytes_override(monkeypatch):
    import importlib

    import app.config as config_module
+
    importlib.reload(config_module)

    s = config_module.Settings()
@@ -55,7 +55,47 @@ def test_settings_jwt_expiry_override(monkeypatch):
    import importlib

    import app.config as config_module
+
    importlib.reload(config_module)

    s = config_module.Settings()
    assert s.jwt_expiry_seconds == 3600
+
+
+def test_api_docs_enabled_default(monkeypatch):
+    _apply_env(monkeypatch)
+
+    import importlib
+
+    import app.config as config_module
+
+    importlib.reload(config_module)
+
+    s = config_module.Settings()
+    assert s.api_docs_enabled is True
+
+
+def test_api_docs_enabled_false(monkeypatch):
+    _apply_env(monkeypatch, {"API_DOCS_ENABLED": "false"})
+
+    import importlib
+
+    import app.config as config_module
+
+    importlib.reload(config_module)
+
+    s = config_module.Settings()
+    assert s.api_docs_enabled is False
+
+
+def test_api_docs_invalid_value_defaults_to_enabled(monkeypatch):
+    _apply_env(monkeypatch, {"API_DOCS_ENABLED": "not-a-bool"})
+
+    import importlib
+
+    import app.config as config_module
+
+    importlib.reload(config_module)
+
+    s = config_module.Settings()
+    assert s.api_docs_enabled is True
--- a/api/tests/unit/test_hashing.py
+++ b/api/tests/unit/test_hashing.py
@@ -1,6 +1,6 @@
 import hashlib

-from app.utils import compute_sha256
+from app.utils import compute_sha256, generate_short_id


 def test_sha256_known_bytes():
@@ -19,3 +19,24 @@ def test_sha256_returns_64_char_hex():
    result = compute_sha256(b"test data")
    assert len(result) == 64
    assert all(c in "0123456789abcdef" for c in result)
+
+
+def test_generate_short_id_length():
+    assert len(generate_short_id()) == 8
+
+
+def test_generate_short_id_charset():
+    result = generate_short_id()
+    assert all(
+        c in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" for c in result
+    )
+
+
+def test_generate_short_id_randomness():
+    assert generate_short_id() != generate_short_id()
+
+
+def test_generate_short_id_importable():
+    from app.utils import generate_short_id as fn
+
+    assert callable(fn)
--- a/api/tests/unit/test_migration.py
+++ b/api/tests/unit/test_migration.py
@@ -0,0 +1,110 @@
+"""Unit tests for migrate_to_short_ids script logic."""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+
+@pytest.fixture
+def mock_image_null_short_id():
+    img = MagicMock()
+    img.id = "img-uuid-1"
+    img.short_id = None
+    img.storage_key = "oldhashkey1234567890"
+    img.thumbnail_key = "oldhashkey1234567890-thumb"
+    img.mime_type = "image/jpeg"
+    return img
+
+
+@pytest.fixture
+def mock_image_with_short_id():
+    img = MagicMock()
+    img.id = "img-uuid-2"
+    img.short_id = "AbCd1234"
+    img.storage_key = "AbCd1234"
+    img.thumbnail_key = "AbCd1234-thumb"
+    img.mime_type = "image/jpeg"
+    return img
+
+
+@pytest.mark.asyncio
+async def test_migrate_processes_image_without_short_id(mock_image_null_short_id):
+    """Images with short_id IS NULL are processed: storage copied, DB updated, old keys deleted."""
+    from scripts.migrate_to_short_ids import migrate_image
+
+    storage = MagicMock()
+    storage.get = AsyncMock(return_value=b"imagedata")
+    storage.put = AsyncMock()
+    storage.delete = AsyncMock()
+
+    session = MagicMock()
+    session.execute = AsyncMock()
+    session.flush = AsyncMock()
+
+    old_key = mock_image_null_short_id.storage_key
+    new_short_id = "NewSh123"
+    with patch("scripts.migrate_to_short_ids.generate_short_id", return_value=new_short_id):
+        result = await migrate_image(mock_image_null_short_id, storage, session)
+
+    assert result is True
+    storage.put.assert_any_call(new_short_id, b"imagedata", "image/jpeg")
+    storage.delete.assert_any_call(old_key)
+
+
+@pytest.mark.asyncio
+async def test_migrate_skips_image_with_short_id(mock_image_with_short_id):
+    """Images that already have a short_id are skipped."""
+    from scripts.migrate_to_short_ids import migrate_image
+
+    storage = MagicMock()
+    session = MagicMock()
+
+    result = await migrate_image(mock_image_with_short_id, storage, session)
+
+    assert result is False
+    storage.get.assert_not_called() if hasattr(storage.get, "assert_not_called") else None
+
+
+@pytest.mark.asyncio
+async def test_migrate_continues_on_storage_error(mock_image_null_short_id):
+    """If storage copy fails, error is logged and migrate_image returns False without aborting."""
+    from scripts.migrate_to_short_ids import migrate_image
+
+    storage = MagicMock()
+    storage.get = AsyncMock(side_effect=Exception("storage read error"))
+    storage.put = AsyncMock()
+    storage.delete = AsyncMock()
+
+    session = MagicMock()
+    session.execute = AsyncMock()
+    session.flush = AsyncMock()
+
+    with patch("scripts.migrate_to_short_ids.generate_short_id", return_value="ErrSh123"):
+        result = await migrate_image(mock_image_null_short_id, storage, session)
+
+    assert result is False
+    storage.put.assert_not_called()
+
+
+@pytest.mark.asyncio
+async def test_migrate_summary_counts(mock_image_null_short_id, mock_image_with_short_id):
+    """run_migration reports correct migrated and skipped counts."""
+    from scripts.migrate_to_short_ids import run_migration
+
+    storage = MagicMock()
+    storage.get = AsyncMock(return_value=b"data")
+    storage.put = AsyncMock()
+    storage.delete = AsyncMock()
+
+    session = MagicMock()
+    session.execute = AsyncMock()
+    session.flush = AsyncMock()
+
+    images = [mock_image_null_short_id, mock_image_with_short_id]
+
+    with patch("scripts.migrate_to_short_ids.generate_short_id", return_value="NewSh999"):
+        migrated, skipped, failed = await run_migration(images, storage, session)
+
+    assert migrated == 1
+    assert skipped == 1
+    assert failed == 0
--- a/api/tests/unit/test_rate_limiter.py
+++ b/api/tests/unit/test_rate_limiter.py
@@ -0,0 +1,105 @@
+import ipaddress
+from unittest.mock import MagicMock
+
+from starlette.requests import Request
+
+from app.auth.rate_limiter import LoginRateLimiter, get_client_ip
+
+# ---------------------------------------------------------------------------
+# LoginRateLimiter tests
+# ---------------------------------------------------------------------------
+
+
+def make_limiter():
+    return LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=300)
+
+
+def test_not_blocked_initially():
+    assert make_limiter().is_blocked("1.2.3.4") is False
+
+
+def test_blocked_after_threshold():
+    limiter = make_limiter()
+    for _ in range(3):
+        limiter.record_failure("1.2.3.4")
+    assert limiter.is_blocked("1.2.3.4") is True
+
+
+def test_success_clears_failures():
+    limiter = make_limiter()
+    limiter.record_failure("1.2.3.4")
+    limiter.record_failure("1.2.3.4")
+    limiter.record_success("1.2.3.4")
+    assert limiter.is_blocked("1.2.3.4") is False
+
+
+def test_ips_are_isolated():
+    limiter = make_limiter()
+    for _ in range(3):
+        limiter.record_failure("1.1.1.1")
+    assert limiter.is_blocked("2.2.2.2") is False
+
+
+def test_window_resets_after_expiry():
+    import time
+
+    limiter = LoginRateLimiter(max_failures=3, window_seconds=0, cooldown_seconds=300)
+    limiter.record_failure("1.2.3.4")
+    limiter.record_failure("1.2.3.4")
+    time.sleep(0.01)
+    limiter.record_failure("1.2.3.4")
+    # window expired — counter reset on third call, so failures = 1, not 3
+    assert limiter.is_blocked("1.2.3.4") is False
+
+
+def test_log_warning_on_lockout(caplog):
+    import logging
+
+    limiter = make_limiter()
+    with caplog.at_level(logging.WARNING, logger="app.auth.rate_limiter"):
+        for _ in range(3):
+            limiter.record_failure("5.6.7.8")
+    assert "Login blocked" in caplog.text
+    assert "5.6.7.8" in caplog.text
+
+
+# ---------------------------------------------------------------------------
+# get_client_ip tests
+# ---------------------------------------------------------------------------
+
+
+def make_request(peer: str, headers: dict) -> MagicMock:
+    req = MagicMock(spec=Request)
+    req.client.host = peer
+    req.headers = headers
+    return req
+
+
+def test_get_client_ip_no_trusted_networks_returns_peer():
+    req = make_request("203.0.113.1", {"X-Forwarded-For": "10.0.0.1"})
+    assert get_client_ip(req, []) == "203.0.113.1"
+
+
+def test_get_client_ip_trusted_peer_uses_real_ip():
+    req = make_request("10.0.0.1", {"X-Real-IP": "203.0.113.9"})
+    nets = [ipaddress.ip_network("10.0.0.0/8")]
+    assert get_client_ip(req, nets) == "203.0.113.9"
+
+
+def test_get_client_ip_real_ip_wins_over_xff():
+    # Regression: spoofed XFF must not override nginx-set X-Real-IP.
+    req = make_request("10.0.0.1", {"X-Real-IP": "203.0.113.9", "X-Forwarded-For": "1.2.3.4"})
+    nets = [ipaddress.ip_network("10.0.0.0/8")]
+    assert get_client_ip(req, nets) == "203.0.113.9"
+
+
+def test_get_client_ip_untrusted_peer_ignores_xff():
+    req = make_request("8.8.8.8", {"X-Forwarded-For": "203.0.113.5"})
+    nets = [ipaddress.ip_network("10.0.0.0/8")]
+    assert get_client_ip(req, nets) == "8.8.8.8"
+
+
+def test_get_client_ip_trusted_peer_falls_back_to_xff_when_no_real_ip():
+    req = make_request("10.0.0.1", {"X-Forwarded-For": "203.0.113.5"})
+    nets = [ipaddress.ip_network("10.0.0.0/8")]
+    assert get_client_ip(req, nets) == "203.0.113.5"
--- a/api/tests/unit/test_short_id.py
+++ b/api/tests/unit/test_short_id.py
@@ -0,0 +1,59 @@
+"""Unit tests for short_id generation, validation, and repository lookup."""
+
+import re
+from unittest.mock import AsyncMock, MagicMock
+
+import pytest
+from fastapi import HTTPException
+
+from app.routers.images import _validate_short_id
+from app.utils import generate_short_id
+
+_SHORT_ID_RE = re.compile(r"^[a-zA-Z0-9]{8}$")
+
+
+def test_validate_short_id_accepts_valid():
+    _validate_short_id("AbCd1234")  # must not raise
+
+
+def test_validate_short_id_rejects_too_long():
+    with pytest.raises(HTTPException) as exc:
+        _validate_short_id("toolong!!")
+    assert exc.value.status_code == 422
+
+
+def test_validate_short_id_rejects_too_short():
+    with pytest.raises(HTTPException) as exc:
+        _validate_short_id("short")
+    assert exc.value.status_code == 422
+
+
+def test_validate_short_id_rejects_invalid_chars():
+    with pytest.raises(HTTPException) as exc:
+        _validate_short_id("has spa!")
+    assert exc.value.status_code == 422
+
+
+def test_generate_short_id_unique():
+    ids = {generate_short_id() for _ in range(100)}
+    assert len(ids) > 90  # collision in 100 draws would be astronomically unlikely
+
+
+def test_repo_get_by_short_id_uses_correct_field():
+    """get_by_short_id selects on Image.short_id, not Image.id."""
+    import asyncio
+
+    from app.repositories.image_repo import ImageRepository
+
+    mock_session = MagicMock()
+    scalar = MagicMock()
+    scalar.scalar_one_or_none = MagicMock(return_value=None)
+    mock_session.execute = AsyncMock(return_value=scalar)
+
+    repo = ImageRepository(mock_session)
+    asyncio.get_event_loop().run_until_complete(repo.get_by_short_id("AbCd1234"))
+
+    call_args = mock_session.execute.call_args[0][0]
+    compiled = call_args.compile(compile_kwargs={"literal_binds": True})
+    assert "short_id" in str(compiled)
+    assert "AbCd1234" in str(compiled)
--- a/api/tests/unit/test_tags.py
+++ b/api/tests/unit/test_tags.py
@@ -2,17 +2,21 @@
 T037 — tag normalisation: uppercase → lowercase, whitespace stripped
 T038 — tag validation: rejects names > 64 chars, invalid chars
 """
+
 import pytest

 from app.repositories.tag_repo import TagRepository


-@pytest.mark.parametrize("raw,expected", [
-    ("Cat", "cat"),
-    ("  funny  ", "funny"),
-    ("REACTION", "reaction"),
-    (" MiXeD ", "mixed"),
-])
+@pytest.mark.parametrize(
+    "raw,expected",
+    [
+        ("Cat", "cat"),
+        ("  funny  ", "funny"),
+        ("REACTION", "reaction"),
+        (" MiXeD ", "mixed"),
+    ],
+)
 def test_normalise_lowercases_and_strips(raw, expected):
    assert TagRepository.normalise(raw) == expected

--- a/api/tests/unit/test_thumbnail.py
+++ b/api/tests/unit/test_thumbnail.py
@@ -1,4 +1,5 @@
 """Unit tests for thumbnail generation utility."""
+
 import io

 from PIL import Image as PILImage
--- a/api/tests/unit/test_url_construction.py
+++ b/api/tests/unit/test_url_construction.py
@@ -0,0 +1,72 @@
+import uuid
+from unittest.mock import MagicMock
+
+from app.routers.images import _image_to_dict
+
+
+def _make_image(*, thumbnail_key=None):
+    img = MagicMock()
+    img.id = uuid.UUID("00000000-0000-0000-0000-000000000001")
+    img.short_id = "AbCd1234"
+    img.hash = "abc123"
+    img.filename = "test.jpg"
+    img.mime_type = "image/jpeg"
+    img.size_bytes = 1024
+    img.width = 100
+    img.height = 100
+    img.storage_key = "abc123storagekey"
+    img.thumbnail_key = thumbnail_key
+    img.created_at.isoformat.return_value = "2026-05-09T00:00:00"
+    img.tags = []
+    return img
+
+
+def test_cdn_configured_with_thumbnail():
+    img = _make_image(thumbnail_key="abc123storagekey-thumb")
+    result = _image_to_dict(img, cdn_base="https://cdn.example.com")
+    assert result["file_url"] == "https://cdn.example.com/abc123storagekey"
+    assert result["thumbnail_url"] == "https://cdn.example.com/abc123storagekey-thumb"
+    assert result["short_id"] == "AbCd1234"
+
+
+def test_cdn_configured_no_thumbnail():
+    img = _make_image(thumbnail_key=None)
+    result = _image_to_dict(img, cdn_base="https://cdn.example.com")
+    assert result["file_url"] == "https://cdn.example.com/abc123storagekey"
+    assert result["thumbnail_url"] is None
+    assert result["short_id"] == "AbCd1234"
+
+
+def test_no_cdn_with_thumbnail():
+    img = _make_image(thumbnail_key="abc123storagekey-thumb")
+    result = _image_to_dict(img, cdn_base=None)
+    assert result["file_url"] == "/api/v1/i/AbCd1234/file"
+    assert result["thumbnail_url"] == "/api/v1/i/AbCd1234/thumbnail"
+
+
+def test_no_cdn_no_thumbnail():
+    img = _make_image(thumbnail_key=None)
+    result = _image_to_dict(img, cdn_base=None)
+    assert result["file_url"] == "/api/v1/i/AbCd1234/file"
+    assert result["thumbnail_url"] is None
+
+
+def test_cdn_trailing_slash_normalised():
+    img = _make_image(thumbnail_key="abc123storagekey-thumb")
+    result = _image_to_dict(img, cdn_base="https://cdn.example.com/")
+    assert result["file_url"] == "https://cdn.example.com/abc123storagekey"
+    assert result["thumbnail_url"] == "https://cdn.example.com/abc123storagekey-thumb"
+    assert "//" not in result["file_url"].replace("https://", "")
+
+
+def test_cdn_trailing_whitespace_normalised():
+    img = _make_image(thumbnail_key="abc123storagekey-thumb")
+    result = _image_to_dict(img, cdn_base="https://cdn.example.com  ")
+    assert result["file_url"] == "https://cdn.example.com/abc123storagekey"
+    assert result["thumbnail_url"] == "https://cdn.example.com/abc123storagekey-thumb"
+
+
+def test_short_id_in_response():
+    img = _make_image()
+    result = _image_to_dict(img, cdn_base=None)
+    assert result["short_id"] == "AbCd1234"
--- a/k8s/api/deployment.yaml
+++ b/k8s/api/deployment.yaml
@@ -0,0 +1,52 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: api
+  namespace: reactbin
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: api
+  template:
+    metadata:
+      labels:
+        app: api
+    spec:
+      initContainers:
+        - name: migrate
+          image: git.juggalol.com/juggalol/reactbin-api:v1.4.0
+          command: ["alembic", "upgrade", "head"]
+          workingDir: /app
+          envFrom:
+            - secretRef:
+                name: api-env
+          securityContext:
+            runAsNonRoot: true
+            runAsUser: 1001
+      containers:
+        - name: api
+          image: git.juggalol.com/juggalol/reactbin-api:v1.4.0
+          ports:
+            - containerPort: 8000
+          envFrom:
+            - secretRef:
+                name: api-env
+          env:
+            - name: API_DOCS_ENABLED
+              value: "false"
+          livenessProbe:
+            httpGet:
+              path: /api/v1/health
+              port: 8000
+            initialDelaySeconds: 10
+            periodSeconds: 30
+          readinessProbe:
+            httpGet:
+              path: /api/v1/health
+              port: 8000
+            initialDelaySeconds: 5
+            periodSeconds: 10
+          securityContext:
+            runAsNonRoot: true
+            runAsUser: 1001
--- a/k8s/api/service.yaml
+++ b/k8s/api/service.yaml
@@ -0,0 +1,13 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: api
+  namespace: reactbin
+spec:
+  type: ClusterIP
+  selector:
+    app: api
+  ports:
+    - name: http
+      port: 8000
+      targetPort: 8000
--- a/k8s/ingress.yaml
+++ b/k8s/ingress.yaml
@@ -0,0 +1,34 @@
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: reactbin
+  namespace: reactbin
+  annotations:
+    cert-manager.io/cluster-issuer: letsencrypt-prod
+    kubernetes.io/tls-acme: "true"
+    nginx.ingress.kubernetes.io/ssl-redirect: "true"
+    nginx.ingress.kubernetes.io/proxy-body-size: "52m"
+spec:
+  ingressClassName: nginx-public
+  tls:
+    - hosts:
+        - reactbin.juggalol.com
+      secretName: reactbin-tls
+  rules:
+    - host: reactbin.juggalol.com
+      http:
+        paths:
+          - path: /api/
+            pathType: Prefix
+            backend:
+              service:
+                name: api
+                port:
+                  number: 8000
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: ui
+                port:
+                  number: 8080
--- a/k8s/minio/init-job.yaml
+++ b/k8s/minio/init-job.yaml
@@ -0,0 +1,24 @@
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: minio-init
+  namespace: reactbin
+spec:
+  template:
+    spec:
+      restartPolicy: OnFailure
+      containers:
+        - name: mc
+          image: minio/mc:latest
+          # mc runs as root by default; FR-013 exception documented in spec
+          securityContext:
+            runAsNonRoot: false
+          command:
+            - sh
+            - -c
+            - |
+              mc alias set local http://minio:9000 "$MINIO_ROOT_USER" "$MINIO_ROOT_PASSWORD"
+              mc mb --ignore-existing local/reactbin
+          envFrom:
+            - secretRef:
+                name: minio-credentials
--- a/k8s/minio/service.yaml
+++ b/k8s/minio/service.yaml
@@ -0,0 +1,16 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: minio
+  namespace: reactbin
+spec:
+  type: ClusterIP
+  selector:
+    app: minio
+  ports:
+    - name: api
+      port: 9000
+      targetPort: 9000
+    - name: console
+      port: 9001
+      targetPort: 9001
--- a/k8s/minio/statefulset.yaml
+++ b/k8s/minio/statefulset.yaml
@@ -0,0 +1,59 @@
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: minio
+  namespace: reactbin
+spec:
+  serviceName: minio
+  replicas: 1
+  selector:
+    matchLabels:
+      app: minio
+  template:
+    metadata:
+      labels:
+        app: minio
+    spec:
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 1000
+        runAsGroup: 1000
+        fsGroup: 1000
+      containers:
+        - name: minio
+          image: minio/minio:latest
+          args:
+            - server
+            - /data
+            - --console-address
+            - ":9001"
+          ports:
+            - containerPort: 9000
+            - containerPort: 9001
+          envFrom:
+            - secretRef:
+                name: minio-credentials
+          livenessProbe:
+            httpGet:
+              path: /minio/health/live
+              port: 9000
+            initialDelaySeconds: 10
+            periodSeconds: 30
+          readinessProbe:
+            httpGet:
+              path: /minio/health/ready
+              port: 9000
+            initialDelaySeconds: 5
+            periodSeconds: 10
+          volumeMounts:
+            - name: data
+              mountPath: /data
+  volumeClaimTemplates:
+    - metadata:
+        name: data
+      spec:
+        accessModes:
+          - ReadWriteOnce
+        resources:
+          requests:
+            storage: 10Gi
--- a/k8s/namespace.yaml
+++ b/k8s/namespace.yaml
@@ -0,0 +1,4 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: reactbin
--- a/k8s/ui/deployment.yaml
+++ b/k8s/ui/deployment.yaml
@@ -0,0 +1,29 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: ui
+  namespace: reactbin
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: ui
+  template:
+    metadata:
+      labels:
+        app: ui
+    spec:
+      containers:
+        - name: ui
+          image: git.juggalol.com/juggalol/reactbin-ui:v1.4.0
+          ports:
+            - containerPort: 8080
+          livenessProbe:
+            httpGet:
+              path: /
+              port: 8080
+            initialDelaySeconds: 10
+            periodSeconds: 30
+          securityContext:
+            runAsNonRoot: true
+            runAsUser: 101  # nginxinc/nginx-unprivileged default UID
--- a/k8s/ui/service.yaml
+++ b/k8s/ui/service.yaml
@@ -0,0 +1,13 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: ui
+  namespace: reactbin
+spec:
+  type: ClusterIP
+  selector:
+    app: ui
+  ports:
+    - name: http
+      port: 8080
+      targetPort: 8080
--- a/k8s/vault/api-secret.yaml
+++ b/k8s/vault/api-secret.yaml
@@ -0,0 +1,18 @@
+apiVersion: secrets.hashicorp.com/v1beta1
+kind: VaultStaticSecret
+metadata:
+  name: api-secret
+  namespace: reactbin
+spec:
+  vaultAuthRef: reactbin-vault-auth
+  mount: kv
+  type: kv-v2
+  # Required Vault keys at this path:
+  #   DATABASE_URL, JWT_SECRET_KEY, OWNER_USERNAME, OWNER_PASSWORD,
+  #   S3_ENDPOINT_URL, S3_BUCKET_NAME, S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY,
+  #   API_BASE_URL
+  path: reactbin/api/config
+  refreshAfter: 1h
+  destination:
+    name: api-env
+    create: true
--- a/k8s/vault/minio-secret.yaml
+++ b/k8s/vault/minio-secret.yaml
@@ -0,0 +1,16 @@
+apiVersion: secrets.hashicorp.com/v1beta1
+kind: VaultStaticSecret
+metadata:
+  name: minio-secret
+  namespace: reactbin
+spec:
+  vaultAuthRef: reactbin-vault-auth
+  mount: kv
+  type: kv-v2
+  # Required Vault keys at this path:
+  #   MINIO_ROOT_USER, MINIO_ROOT_PASSWORD
+  path: reactbin/minio/credentials
+  refreshAfter: 1h
+  destination:
+    name: minio-credentials
+    create: true
--- a/k8s/vault/vault-auth.yaml
+++ b/k8s/vault/vault-auth.yaml
@@ -0,0 +1,22 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: vso-reactbin
+  namespace: reactbin
+---
+apiVersion: secrets.hashicorp.com/v1beta1
+kind: VaultAuth
+metadata:
+  name: reactbin-vault-auth
+  namespace: reactbin
+spec:
+  method: kubernetes
+  mount: kubernetes
+  kubernetes:
+    # The operator must create this role in Vault and bind it to the
+    # default service account in the reactbin namespace with read access
+    # to both reactbin/api/config and reactbin/minio/credentials.
+    role: vso-reactbin
+    serviceAccount: vso-reactbin
+    audiences:
+      - vault
--- a/scripts/test_lockout.sh
+++ b/scripts/test_lockout.sh
@@ -0,0 +1,67 @@
+#!/usr/bin/env bash
+#
+# Test reactbin's login rate limiter and demonstrate the XFF injection bypass.
+#
+# Phase 1: Send 6 bad login attempts in quick succession.
+#   Attempts 1-5 should return 401 (invalid credentials).
+#   Attempt 6 should return 429 (rate limited) — the limiter blocks after
+#   max_failures=5 within the window.
+#
+# Phase 2: Send a 7th bad attempt with a spoofed X-Forwarded-For header
+#   pointing at a different IP. If the lockout keys correctly on the trusted
+#   client IP, this should still return 429 (same client, still locked).
+#   If reactbin trusts client-supplied XFF blindly, this would return 401
+#   instead — the spoof would make the request look like a different client
+#   that hasn't accumulated failures.
+#
+# Interpretation:
+#   - 429 on attempt 7  → lockout is correctly identifying the client
+#   - 401 on attempt 7  → XFF injection succeeded; server treated us as a
+#                          new client because we set a fake XFF
+#
+# Note: this script is ONLY useful when run against the public origin path
+# where XFF spoofing is potentially possible. It does not exercise the
+# Cloudflare-proxied path because Cloudflare strips/replaces XFF before
+# forwarding to origin.
+
+set -u
+
+URL="${URL:-https://reactbin.juggalol.com/api/v1/auth/token}"
+SPOOFED_IP="${SPOOFED_IP:-198.51.100.99}"  # TEST-NET-2, never routed
+USERNAME="${USERNAME:-not-a-real-user}"
+PASSWORD="${PASSWORD:-not-a-real-password}"
+
+# JSON body for a bad login. Username/password chosen to be obviously fake;
+# adjust if your auth provider has its own validation that would 400 instead
+# of 401 on these values.
+BODY=$(printf '{"username":"%s","password":"%s"}' "$USERNAME" "$PASSWORD")
+
+echo "Target: $URL"
+echo "Body:   $BODY"
+echo
+
+echo "=== Phase 1: 6 bad logins from real client IP ==="
+for i in 1 2 3 4 5 6; do
+    code=$(curl -sS -o /dev/null -w '%{http_code}' \
+        -X POST \
+        -H 'Content-Type: application/json' \
+        --data "$BODY" \
+        "$URL")
+    echo "Attempt $i: HTTP $code"
+done
+
+echo
+echo "=== Phase 2: 7th attempt with spoofed X-Forwarded-For ==="
+echo "Setting X-Forwarded-For: $SPOOFED_IP"
+code=$(curl -sS -o /dev/null -w '%{http_code}' \
+    -X POST \
+    -H 'Content-Type: application/json' \
+    -H "X-Forwarded-For: $SPOOFED_IP" \
+    --data "$BODY" \
+    "$URL")
+echo "Attempt 7: HTTP $code"
+
+echo
+echo "Interpretation:"
+echo "  Attempt 7 = 429  → lockout correctly tracks real client; XFF spoof ineffective"
+echo "  Attempt 7 = 401  → XFF spoof succeeded; server believed the fake client IP"
--- a/specs/009-login-rate-limiting/checklists/requirements.md
+++ b/specs/009-login-rate-limiting/checklists/requirements.md
@@ -0,0 +1,34 @@
+# Specification Quality Checklist: Login Brute-Force Protection
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-05-06
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [X] No implementation details (languages, frameworks, APIs)
+- [X] Focused on user value and business needs
+- [X] Written for non-technical stakeholders
+- [X] All mandatory sections completed
+
+## Requirement Completeness
+
+- [X] No [NEEDS CLARIFICATION] markers remain
+- [X] Requirements are testable and unambiguous
+- [X] Success criteria are measurable
+- [X] Success criteria are technology-agnostic (no implementation details)
+- [X] All acceptance scenarios are defined
+- [X] Edge cases are identified
+- [X] Scope is clearly bounded
+- [X] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [X] All functional requirements have clear acceptance criteria
+- [X] User scenarios cover primary flows
+- [X] Feature meets measurable outcomes defined in Success Criteria
+- [X] No implementation details leak into specification
+
+## Notes
+
+- All items pass. Spec is ready for `/speckit-plan`.
--- a/specs/009-login-rate-limiting/contracts/auth.md
+++ b/specs/009-login-rate-limiting/contracts/auth.md
@@ -0,0 +1,85 @@
+# API Contract: Authentication
+
+## POST /api/v1/auth/token
+
+Authenticates the owner and returns a JWT access token.
+
+**This endpoint is modified by feature 009** to enforce brute-force protection.
+All previous behaviour is preserved. One new response code (429) is added.
+
+### Request
+
+```
+POST /api/v1/auth/token
+Content-Type: application/json
+```
+
+```json
+{
+  "username": "string",
+  "password": "string"
+}
+```
+
+### Responses
+
+#### 200 OK — Credentials accepted
+
+```json
+{
+  "access_token": "<jwt>",
+  "token_type": "bearer",
+  "expires_in": 86400
+}
+```
+
+Side effect: resets the failure counter for the caller's IP address.
+
+---
+
+#### 401 Unauthorized — Credentials rejected
+
+```json
+{
+  "detail": "Invalid credentials",
+  "code": "invalid_credentials"
+}
+```
+
+Side effect: increments the failure counter for the caller's IP address. If the
+counter reaches `LOGIN_MAX_FAILURES`, subsequent requests from this IP will receive
+429 until the cooldown expires.
+
+---
+
+#### 429 Too Many Requests — Source blocked after repeated failures
+
+**This response is new in feature 009.**
+
+```
+HTTP/1.1 429 Too Many Requests
+Retry-After: 900
+Content-Type: application/json
+```
+
+```json
+{
+  "detail": "Too many failed login attempts. Please try again later.",
+  "code": "login_rate_limited"
+}
+```
+
+The `Retry-After` header value is the configured cooldown duration in seconds (default: 900).
+It reflects the maximum possible wait, not the exact remaining lockout time.
+
+No credentials are verified when this response is returned — the request is
+rejected before authentication is attempted.
+
+---
+
+### Notes
+
+- The failure counter is per source IP address (TCP peer, not forwarded headers).
+- Threshold values (`LOGIN_MAX_FAILURES`, `LOGIN_WINDOW_SECONDS`, `LOGIN_COOLDOWN_SECONDS`)
+  are not disclosed in any response.
+- Counters are in-memory and reset on process restart.
--- a/specs/009-login-rate-limiting/data-model.md
+++ b/specs/009-login-rate-limiting/data-model.md
@@ -0,0 +1,53 @@
+# Data Model: Login Brute-Force Protection
+
+## Overview
+
+This feature introduces no new database tables. The only data entity is a transient,
+in-memory rate-limit record that does not survive process restarts. This is intentional
+(see research.md Decision 3).
+
+---
+
+## Entity: Rate-Limit Record (in-memory only)
+
+| Field          | Type    | Description                                                                 |
+|----------------|---------|-----------------------------------------------------------------------------|
+| `failures`     | int     | Count of consecutive failed login attempts in the current window            |
+| `window_start` | float   | Unix timestamp marking when the current counting window began               |
+| `blocked_until`| float   | Unix timestamp after which the source is no longer blocked; 0.0 if not blocked |
+
+**Keyed by**: resolved client IP address string (e.g., `"192.168.1.1"`); see `get_client_ip()` in `rate_limiter.py` for resolution logic
+
+**Lifecycle**:
+1. Record is created on the first failed login from a source.
+2. `failures` increments on each subsequent failure within the window.
+3. When `failures >= LOGIN_MAX_FAILURES`, `blocked_until` is set to `now + LOGIN_COOLDOWN_SECONDS`.
+4. When `blocked_until` has passed, the record is deleted on the next request from that source.
+5. A successful login deletes the record immediately (failure counter reset).
+6. If `now - window_start > LOGIN_WINDOW_SECONDS` without triggering lockout, the counter resets within the existing record.
+
+**State machine**:
+
+```
+[no record]
+     │ first failure
+     ▼
+[tracking] ──── failure N ≥ max ────► [blocked]
+     │                                     │
+     │ success / window expires             │ cooldown expires
+     ▼                                     ▼
+[no record] ◄─────────────────────── [no record]
+```
+
+---
+
+## Configuration Entity: Rate-Limit Settings
+
+Stored as environment variables; loaded via `app.config.Settings`:
+
+| Env Var                    | Default | Description                                              |
+|----------------------------|---------|----------------------------------------------------------|
+| `LOGIN_MAX_FAILURES`       | `5`     | Failures within window before lockout                    |
+| `LOGIN_WINDOW_SECONDS`     | `300`   | Rolling window duration in seconds (5 minutes)           |
+| `LOGIN_COOLDOWN_SECONDS`   | `900`   | Lockout duration in seconds after threshold exceeded (15 minutes) |
+| `LOGIN_TRUSTED_PROXY_IPS`  | `""`    | Comma-separated IPs/CIDRs of trusted upstream proxies (e.g., `10.0.0.0/8`); empty = disabled |
--- a/specs/009-login-rate-limiting/plan.md
+++ b/specs/009-login-rate-limiting/plan.md
@@ -0,0 +1,388 @@
+# Implementation Plan: Login Brute-Force Protection
+
+**Branch**: `009-login-rate-limiting` | **Date**: 2026-05-06 | **Spec**: [spec.md](spec.md)  
+**Input**: Feature specification from `specs/009-login-rate-limiting/spec.md`
+
+## Summary
+
+Add failure-counting brute-force protection to the login endpoint (`POST /api/v1/auth/token`).
+After a configurable number of consecutive failed attempts from the same resolved client IP,
+the endpoint returns HTTP 429 with a `Retry-After` header for a configurable cooldown period.
+A successful login resets the counter. All thresholds are configurable via environment variables.
+When deployed behind a reverse proxy (nginx, Kubernetes ingress), a `LOGIN_TRUSTED_PROXY_IPS`
+setting enables extraction of the real client IP from `X-Forwarded-For`. No new infrastructure
+(no Redis, no new DB table) — counters live in process memory.
+
+---
+
+## Technical Context
+
+**Language/Version**: Python 3.12+  
+**Primary Dependencies**: FastAPI, pydantic-settings (already in use); no new dependencies added  
+**Storage**: In-memory `dict` (no persistence across restarts — intentional)  
+**Testing**: pytest + pytest-asyncio (existing test infrastructure)  
+**Target Platform**: Linux server (Docker)  
+**Project Type**: Web service (API only — this feature has no UI surface)  
+**Performance Goals**: Rate limiter adds negligible overhead (dict lookup + lock acquisition; sub-millisecond)  
+**Constraints**: Must not add new runtime service dependencies; must not change any auth behaviour for non-blocked sources  
+**Scale/Scope**: Single process, single user; in-memory store is sufficient
+
+---
+
+## Constitution Check
+
+| Principle | Status | Notes |
+|-----------|--------|-------|
+| §2.4 Auth abstraction (AuthProvider interface) | ✅ Pass | Rate limiter is a guard *before* `JWTAuthProvider.verify_credentials()`, not a bypass of the interface |
+| §2.5 DB abstraction (repository layer) | ✅ Pass | No database access; in-memory only |
+| §2.6 No speculative abstraction | ✅ Pass | Concrete `LoginRateLimiter` class, no interface; only one implementation planned |
+| §3.3 Error envelope (`detail` + `code`) | ✅ Pass | 429 response uses `{"detail": "...", "code": "login_rate_limited"}` |
+| §5.1 TDD | ✅ Required | Tasks follow red → green order |
+| §5.2 Integration tests against PostgreSQL | ✅ Pass | Integration test for the login endpoint will run against the Docker PostgreSQL stack |
+| §7.2 Environment configuration | ✅ Pass | `LOGIN_MAX_FAILURES`, `LOGIN_WINDOW_SECONDS`, `LOGIN_COOLDOWN_SECONDS`, `LOGIN_TRUSTED_PROXY_IPS` from env vars |
+| §7.3 Linting (ruff) | ✅ Required | All new files must pass `ruff check` |
+
+**Gate result**: No violations. Cleared to proceed.
+
+---
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/009-login-rate-limiting/
+├── plan.md              ← this file
+├── research.md          ← decisions on approach
+├── data-model.md        ← rate-limit record entity
+├── quickstart.md        ← curl runbook
+├── contracts/
+│   └── auth.md          ← updated POST /api/v1/auth/token with 429
+└── tasks.md             ← generated by /speckit-tasks
+```
+
+### Source Code Changes
+
+```text
+api/
+├── app/
+│   ├── auth/
+│   │   ├── rate_limiter.py          ← NEW: LoginRateLimiter class
+│   │   ├── jwt_provider.py          (unchanged)
+│   │   ├── noop.py                  (unchanged)
+│   │   └── provider.py              (unchanged)
+│   ├── config.py                    ← add login_max_failures, login_window_seconds, login_cooldown_seconds, login_trusted_proxy_ips
+│   ├── main.py                      ← init LoginRateLimiter in lifespan, attach to app.state
+│   └── routers/
+│       └── auth.py                  ← check rate limit before auth, record outcome
+└── tests/
+    ├── unit/
+    │   └── test_rate_limiter.py     ← NEW: unit tests for LoginRateLimiter logic
+    └── integration/
+        └── test_login_rate_limit.py ← NEW: integration tests for 429 behaviour via HTTP
+```
+
+---
+
+## Implementation Detail
+
+### `api/app/auth/rate_limiter.py`
+
+```python
+import ipaddress
+import logging
+import time
+from dataclasses import dataclass, field
+from ipaddress import IPv4Network, IPv6Network
+from threading import Lock
+
+from starlette.requests import Request
+
+logger = logging.getLogger(__name__)
+
+
+def get_client_ip(
+    request: Request,
+    trusted_networks: list[IPv4Network | IPv6Network],
+) -> str:
+    """Return the resolved client IP, honouring X-Forwarded-For when the
+    TCP peer is a trusted upstream proxy. Falls back to the TCP peer address
+    when no trusted networks are configured or the peer is not in the list."""
+    peer = request.client.host if request.client else "unknown"
+    if trusted_networks and peer != "unknown":
+        try:
+            peer_addr = ipaddress.ip_address(peer)
+            if any(peer_addr in net for net in trusted_networks):
+                xff = request.headers.get("X-Forwarded-For", "").split(",")[0].strip()
+                if xff:
+                    return xff
+                real_ip = request.headers.get("X-Real-IP", "").strip()
+                if real_ip:
+                    return real_ip
+        except ValueError:
+            pass
+    return peer
+
+
+@dataclass
+class _Record:
+    failures: int = 0
+    window_start: float = field(default_factory=time.time)
+    blocked_until: float = 0.0
+
+
+class LoginRateLimiter:
+    def __init__(
+        self,
+        max_failures: int = 5,
+        window_seconds: int = 300,
+        cooldown_seconds: int = 900,
+    ) -> None:
+        self._max = max_failures
+        self._window = window_seconds
+        self._cooldown = cooldown_seconds
+        self._store: dict[str, _Record] = {}
+        self._lock = Lock()
+
+    @property
+    def cooldown_seconds(self) -> int:
+        return self._cooldown
+
+    def is_blocked(self, ip: str) -> bool:
+        now = time.time()
+        with self._lock:
+            rec = self._store.get(ip)
+            if rec is None:
+                return False
+            if rec.blocked_until > now:
+                return True
+            if rec.blocked_until > 0:
+                del self._store[ip]
+            return False
+
+    def record_failure(self, ip: str) -> None:
+        now = time.time()
+        with self._lock:
+            rec = self._store.get(ip)
+            if rec is None:
+                rec = _Record(window_start=now)
+                self._store[ip] = rec
+            if now - rec.window_start > self._window:
+                rec.failures = 0
+                rec.window_start = now
+            rec.failures += 1
+            if rec.failures >= self._max:
+                rec.blocked_until = now + self._cooldown
+                logger.warning(
+                    "Login blocked for %s after %d failures", ip, rec.failures
+                )
+
+    def record_success(self, ip: str) -> None:
+        with self._lock:
+            self._store.pop(ip, None)
+```
+
+### `api/app/config.py` additions
+
+```python
+login_max_failures: int = 5
+login_window_seconds: int = 300
+login_cooldown_seconds: int = 900
+login_trusted_proxy_ips: str = ""  # comma-separated IPs/CIDRs; empty = disabled
+```
+
+### `api/app/main.py` lifespan update
+
+```python
+import ipaddress
+
+from app.auth.rate_limiter import LoginRateLimiter
+
+@asynccontextmanager
+async def lifespan(application: FastAPI):
+    settings = get_settings()
+    application.state.login_rate_limiter = LoginRateLimiter(
+        max_failures=settings.login_max_failures,
+        window_seconds=settings.login_window_seconds,
+        cooldown_seconds=settings.login_cooldown_seconds,
+    )
+    trusted_networks = []
+    for part in settings.login_trusted_proxy_ips.split(","):
+        part = part.strip()
+        if part:
+            try:
+                trusted_networks.append(ipaddress.ip_network(part, strict=False))
+            except ValueError:
+                pass  # invalid entry — skip silently
+    application.state.login_trusted_networks = trusted_networks
+    # ... existing DB setup unchanged
+    engine = get_engine()
+    async with engine.begin() as conn:
+        await conn.run_sync(Base.metadata.create_all)
+    yield
+    await engine.dispose()
+```
+
+### `api/app/routers/auth.py` update
+
+```python
+from fastapi import APIRouter, Depends, HTTPException, Request
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel
+
+from app.auth.jwt_provider import JWTAuthProvider
+from app.auth.rate_limiter import LoginRateLimiter, get_client_ip
+from app.dependencies import get_jwt_auth
+
+router = APIRouter(tags=["auth"])
+
+
+class LoginRequest(BaseModel):
+    username: str
+    password: str
+
+
+class TokenResponse(BaseModel):
+    access_token: str
+    token_type: str = "bearer"
+    expires_in: int
+
+
+@router.post("/auth/token", response_model=TokenResponse)
+async def login(
+    request: Request,
+    body: LoginRequest,
+    auth: JWTAuthProvider = Depends(get_jwt_auth),
+):
+    limiter: LoginRateLimiter = request.app.state.login_rate_limiter
+    ip: str = get_client_ip(request, request.app.state.login_trusted_networks)
+
+    if limiter.is_blocked(ip):
+        return JSONResponse(
+            status_code=429,
+            content={
+                "detail": "Too many failed login attempts. Please try again later.",
+                "code": "login_rate_limited",
+            },
+            headers={"Retry-After": str(limiter.cooldown_seconds)},
+        )
+
+    if not auth.verify_credentials(body.username, body.password):
+        limiter.record_failure(ip)
+        raise HTTPException(
+            status_code=401,
+            detail={"detail": "Invalid credentials", "code": "invalid_credentials"},
+        )
+
+    limiter.record_success(ip)
+    token = auth.create_token()
+    return TokenResponse(
+        access_token=token,
+        token_type="bearer",
+        expires_in=auth._expiry_seconds,
+    )
+```
+
+### `api/tests/unit/test_rate_limiter.py` (representative cases)
+
+```python
+import time
+import pytest
+from app.auth.rate_limiter import LoginRateLimiter
+
+
+def test_not_blocked_initially():
+    limiter = LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=300)
+    assert limiter.is_blocked("1.2.3.4") is False
+
+
+def test_blocked_after_threshold():
+    limiter = LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=300)
+    for _ in range(3):
+        limiter.record_failure("1.2.3.4")
+    assert limiter.is_blocked("1.2.3.4") is True
+
+
+def test_success_clears_failures():
+    limiter = LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=300)
+    limiter.record_failure("1.2.3.4")
+    limiter.record_failure("1.2.3.4")
+    limiter.record_success("1.2.3.4")
+    assert limiter.is_blocked("1.2.3.4") is False
+
+
+def test_ips_are_isolated():
+    limiter = LoginRateLimiter(max_failures=2, window_seconds=60, cooldown_seconds=300)
+    limiter.record_failure("1.1.1.1")
+    limiter.record_failure("1.1.1.1")
+    assert limiter.is_blocked("2.2.2.2") is False
+```
+
+### `api/tests/integration/test_login_rate_limit.py` (representative cases)
+
+```python
+import pytest
+from httpx import AsyncClient
+
+# Uses the 'client' fixture (NoOpAuthProvider) from conftest — sufficient for this
+# endpoint since we're testing the rate-limit layer, not auth correctness.
+# The login endpoint instantiates its own limiter via app.state, so we need
+# the full ASGI app.
+
+BAD_CREDS = {"username": "attacker", "password": "wrong"}
+
+
+@pytest.mark.asyncio
+async def test_repeated_failures_trigger_429(client: AsyncClient):
+    # Use a custom limiter with low threshold to avoid slow tests
+    # (the app.state.login_rate_limiter is set in lifespan; override for test)
+    from app.auth.rate_limiter import LoginRateLimiter
+    from app.main import app
+    original = app.state.login_rate_limiter
+    app.state.login_rate_limiter = LoginRateLimiter(
+        max_failures=3, window_seconds=60, cooldown_seconds=30
+    )
+    try:
+        for _ in range(3):
+            await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        resp = await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        assert resp.status_code == 429
+        assert resp.json()["code"] == "login_rate_limited"
+        assert "Retry-After" in resp.headers
+    finally:
+        app.state.login_rate_limiter = original
+```
+
+---
+
+## Implementation Phases
+
+### Phase 1 (MVP — P1): Blocking after repeated failures
+
+1. Add `login_max_failures`, `login_window_seconds`, `login_cooldown_seconds`, `login_trusted_proxy_ips` to `api/app/config.py`
+2. Create `api/app/auth/rate_limiter.py` with `LoginRateLimiter` and `get_client_ip()`
+3. Initialize rate limiter and parse trusted networks in `api/app/main.py` lifespan; attach both to `app.state`
+4. Update `api/app/routers/auth.py` to resolve client IP via `get_client_ip()`, then check + record outcomes
+5. Unit tests: `api/tests/unit/test_rate_limiter.py`
+6. Integration tests: `api/tests/integration/test_login_rate_limit.py`
+
+### Phase 2 (US2 — observability): Logging and response hints
+
+Delivered as part of Phase 1 (the `logger.warning(...)` call and `Retry-After` header
+are embedded in the same implementation). No separate phase needed.
+
+---
+
+## Environment Variables to Add to `.env.example`
+
+```dotenv
+# Login brute-force protection
+LOGIN_MAX_FAILURES=5
+LOGIN_WINDOW_SECONDS=300
+LOGIN_COOLDOWN_SECONDS=900
+# Comma-separated IPs/CIDRs of trusted upstream proxies (e.g. nginx ingress pod CIDR).
+# Leave empty when not behind a reverse proxy.
+LOGIN_TRUSTED_PROXY_IPS=
+```
+
+These are optional (have defaults) so existing `.env` files without them continue working.
--- a/specs/009-login-rate-limiting/quickstart.md
+++ b/specs/009-login-rate-limiting/quickstart.md
@@ -0,0 +1,112 @@
+# Quickstart: Login Brute-Force Protection
+
+## Prerequisites
+
+- API running (via `docker compose up` or locally with `.env` set)
+- `curl` available
+
+---
+
+## Scenario 1: Trigger the rate limiter
+
+Send 6 consecutive failed login attempts (default threshold is 5):
+
+```bash
+for i in $(seq 1 6); do
+  echo "Attempt $i:"
+  curl -s -o /dev/null -w "%{http_code}\n" \
+    -X POST http://localhost:8000/api/v1/auth/token \
+    -H "Content-Type: application/json" \
+    -d '{"username": "wrong", "password": "wrong"}'
+done
+```
+
+Expected output:
+```
+Attempt 1: 401
+Attempt 2: 401
+Attempt 3: 401
+Attempt 4: 401
+Attempt 5: 401
+Attempt 6: 429
+```
+
+The 6th attempt returns 429. Inspect the headers:
+
+```bash
+curl -i -X POST http://localhost:8000/api/v1/auth/token \
+  -H "Content-Type: application/json" \
+  -d '{"username": "wrong", "password": "wrong"}'
+```
+
+Expected headers include:
+```
+HTTP/1.1 429 Too Many Requests
+Retry-After: 900
+```
+
+Expected body:
+```json
+{"detail": "Too many failed login attempts. Please try again later.", "code": "login_rate_limited"}
+```
+
+---
+
+## Scenario 2: Successful login resets the counter
+
+Make some failed attempts, then log in with valid credentials:
+
+```bash
+# Fail twice
+for i in 1 2; do
+  curl -s -o /dev/null -w "fail $i: %{http_code}\n" \
+    -X POST http://localhost:8000/api/v1/auth/token \
+    -H "Content-Type: application/json" \
+    -d '{"username": "wrong", "password": "wrong"}'
+done
+
+# Succeed — resets counter
+curl -s -o /dev/null -w "success: %{http_code}\n" \
+  -X POST http://localhost:8000/api/v1/auth/token \
+  -H "Content-Type: application/json" \
+  -d '{"username": "'"$OWNER_USERNAME"'", "password": "'"$OWNER_PASSWORD"'"}'
+
+# Now fail 5 more times — counter was reset, so no 429 yet
+for i in $(seq 1 5); do
+  curl -s -o /dev/null -w "fail after reset $i: %{http_code}\n" \
+    -X POST http://localhost:8000/api/v1/auth/token \
+    -H "Content-Type: application/json" \
+    -d '{"username": "wrong", "password": "wrong"}'
+done
+```
+
+Expected: all "fail after reset" lines return 401 (not 429), confirming the counter was reset.
+
+---
+
+## Scenario 3: Observe log output
+
+While triggering the rate limiter (Scenario 1), watch API logs:
+
+```bash
+docker compose logs -f api
+```
+
+After the threshold is crossed you should see a line like:
+
+```
+WARNING  app.auth.rate_limiter:rate_limiter.py:NN Login blocked for 172.18.0.1 after 5 failures
+```
+
+---
+
+## Environment variable overrides
+
+To test with a lower threshold without code changes:
+
+```bash
+LOGIN_MAX_FAILURES=2 LOGIN_WINDOW_SECONDS=60 LOGIN_COOLDOWN_SECONDS=30 \
+  uvicorn app.main:app --reload
+```
+
+Then only 2 failures trigger the lockout, and it clears after 30 seconds.
--- a/specs/009-login-rate-limiting/research.md
+++ b/specs/009-login-rate-limiting/research.md
@@ -0,0 +1,67 @@
+# Research: Login Brute-Force Protection
+
+## Decision 1: Library vs. custom implementation
+
+**Decision**: Custom in-memory failure tracker (no new library dependency)
+
+**Rationale**: The requirement is to count *failed* login attempts specifically and reset on success — not to rate-limit all requests regardless of outcome. Popular libraries like `slowapi` count all requests to a route, which would break FR-004 (reset on success) without significant workarounds. A purpose-built 60-line class is simpler, more auditable, and has no dependency footprint.
+
+**Alternatives considered**:
+- `slowapi` (built on `limits`): Counts all requests, not failures. Requires patching the exception handler to decrement on success — fragile and non-obvious.
+- `slowapi` with a custom key function: Could be done, but the library's storage model doesn't expose a "reset this key" API in a clean way.
+- Redis-backed counter: Overkill for a single-user personal app with one instance. No new infrastructure justified.
+
+---
+
+## Decision 2: Fixed window vs. sliding window
+
+**Decision**: Fixed window with per-source reset on successful login
+
+**Rationale**: Fixed window is simpler to implement correctly and sufficient for this use case. The main attack — rapid sequential guessing — is fully addressed. The known "burst at window boundary" weakness is irrelevant here because: (a) the cooldown period is separate from the counting window, and (b) a successful login resets the counter entirely.
+
+**Alternatives considered**:
+- Sliding window: More accurate, but adds complexity (requires storing timestamps of each request). The marginal security benefit doesn't justify the implementation cost for a personal single-user app.
+
+---
+
+## Decision 3: In-memory backing store
+
+**Decision**: Python `dict` keyed by source IP, protected by a threading `Lock`
+
+**Rationale**: The application runs as a single process. In-memory storage means counters reset on restart — this is acceptable and matches the "fail open" assumption in the spec. No new infrastructure (Redis, database table) is required.
+
+**Alternatives considered**:
+- Database-backed counters: Persistent across restarts, but adds a DB round-trip to every login request (including successful ones). Not justified.
+- Redis: Distributed-safe and persistent, but requires a new service dependency. Out of scope for a personal single-instance app.
+
+---
+
+## Decision 4: Source identifier
+
+**Decision**: `request.client.host` (the TCP peer address)
+
+**Rationale**: The spec explicitly states not to trust `X-Forwarded-For` headers unless the app is known to be behind a trusted proxy. `request.client.host` in Starlette/FastAPI is the actual TCP peer IP — it cannot be spoofed by an attacker sending arbitrary headers.
+
+**Alternatives considered**:
+- `X-Forwarded-For` first value: Spoofable if the app is not behind a trusted proxy (attacker can set arbitrary header values).
+- `X-Real-IP`: Same spoofing concern.
+
+---
+
+## Decision 5: 429 response and Retry-After header
+
+**Decision**: Return HTTP 429 with `{"detail": "...", "code": "login_rate_limited"}` and a `Retry-After` header set to the configured cooldown duration in seconds
+
+**Rationale**: HTTP 429 is the standard "Too Many Requests" status. The `Retry-After` header is explicitly mentioned in the spec (US2 acceptance scenario) and is required by RFC 6585 for rate-limit responses. Setting it to the *configured* cooldown (not the exact remaining time) satisfies FR-005: it doesn't reveal precise expiry, just the maximum wait. The response body follows §3.3 of the constitution (error envelope with `detail` and `code`).
+
+---
+
+## Decision 6: Default threshold values
+
+**Decision**: `LOGIN_MAX_FAILURES=5`, `LOGIN_WINDOW_SECONDS=300` (5 min), `LOGIN_COOLDOWN_SECONDS=900` (15 min)
+
+**Rationale**: Industry standard for web apps. 5 attempts is enough for legitimate typos but makes brute-force infeasible at human scale. A 5-minute counting window matches typical "I fat-fingered my password" retry patterns. A 15-minute cooldown is a meaningful deterrent without locking out a legitimate owner indefinitely.
+
+**Alternatives considered**:
+- 3 failures / 60 s window / 300 s cooldown: More aggressive, but too likely to lock out the legitimate owner on a bad day.
+- 10 failures: Too permissive for a brute-force defense.
--- a/specs/009-login-rate-limiting/spec.md
+++ b/specs/009-login-rate-limiting/spec.md
@@ -0,0 +1,84 @@
+# Feature Specification: Login Brute-Force Protection
+
+**Feature Branch**: `009-login-rate-limiting`  
+**Created**: 2026-05-06  
+**Status**: Draft  
+**Input**: User description: "Login API endpoints should be rate limited or otherwise protected against brute force attacks"
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Repeated failed logins are blocked (Priority: P1)
+
+An attacker (or misconfigured client) sending many rapid login attempts with the wrong password is slowed or blocked before they can exhaustively guess credentials. After a threshold number of consecutive failures from the same source, the system refuses further attempts for a cooldown period and returns a clear, non-leaking error.
+
+**Why this priority**: Directly prevents credential-stuffing and brute-force attacks against the sole privileged account. Without this, the owner account is exposed to automated password guessing with no friction.
+
+**Independent Test**: Send more than the allowed number of failed login requests in quick succession and confirm that subsequent attempts are rejected with a rate-limit or lockout response — without knowing or changing the real password.
+
+**Acceptance Scenarios**:
+
+1. **Given** an attacker sends N+1 failed login attempts within the configured window, **When** the (N+1)th request arrives, **Then** the system returns an error response indicating the request is blocked (not the normal "invalid credentials" error) and does not process the login attempt.
+2. **Given** a legitimate user has been temporarily blocked after too many failures, **When** the cooldown period elapses and they retry with the correct password, **Then** they are logged in successfully.
+3. **Given** a legitimate user makes a few failed attempts and then waits beyond the cooldown window, **When** they retry within the next window, **Then** their failure counter resets and they are not blocked.
+
+---
+
+### User Story 2 - Operators can observe and reason about blocking activity (Priority: P2)
+
+When the protection triggers, the system produces enough observable signal (log entries, response metadata) that an operator can confirm the feature is working, diagnose false positives, and tune thresholds — without exposing sensitive details to the client.
+
+**Why this priority**: Invisible security controls are unmanageable. Operators need to know the system is doing what it claims, and blocked legitimate users need a clear (but not exploitable) explanation.
+
+**Independent Test**: Trigger the rate limiter and confirm that: (a) the response body or headers communicate that the request was blocked and when the client may retry; (b) the server logs an entry identifying the blocked source and the reason.
+
+**Acceptance Scenarios**:
+
+1. **Given** a source is blocked, **When** they receive the rejection response, **Then** the response indicates they should wait before retrying (e.g., a `Retry-After` hint) without disclosing the exact threshold values.
+2. **Given** the rate limiter fires, **When** an operator inspects server logs, **Then** there is a log entry at WARNING level or above recording the blocked source and timestamp.
+
+---
+
+### Edge Cases
+
+- What happens when a distributed attacker rotates IPs to avoid per-IP limits?
+- How does the system behave if the backing store for rate-limit counters is temporarily unavailable — does it fail open (allow all) or fail closed (block all)?
+- Are IPv6 addresses and IPv4-mapped-IPv6 addresses treated consistently?
+- Does a successful login reset the failure counter for that source?
+- What happens if many legitimate users share a NAT/proxy IP (e.g., corporate network)?
+- What if `TRUSTED_PROXY_IPS` is configured to include an IP that an external attacker controls? (An attacker could then spoof `X-Forwarded-For` and rotate fake source IPs to bypass the rate limiter — operators must only list genuinely trusted upstream infrastructure.)
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The system MUST enforce a maximum number of failed login attempts per source identifier (the resolved client IP address) within a rolling time window before blocking further attempts.
+- **FR-002**: Once a source exceeds the failure threshold, the system MUST reject subsequent login requests for a configurable cooldown period, returning a distinct response (not the normal invalid-credentials response).
+- **FR-003**: After the cooldown period expires, the system MUST permit the source to attempt login again, resetting its failure count.
+- **FR-004**: A successful login MUST reset the failure counter for that source, preventing accumulation of old failures from blocking future legitimate access.
+- **FR-005**: The rejection response MUST NOT reveal the specific threshold values or remaining lockout duration in a way that aids an attacker in timing their attempts, but MUST provide enough information (e.g., "try again later") for a legitimate user to understand the situation.
+- **FR-006**: The system MUST log a structured warning event whenever a source is blocked, including the source identifier and timestamp.
+- **FR-007**: Rate-limit thresholds (maximum attempts, window duration, cooldown duration) MUST be configurable without code changes.
+- **FR-008**: The system MUST support a configurable list of trusted upstream proxy IP addresses and CIDR ranges. When the TCP peer address matches a trusted proxy, the resolved client IP MUST be extracted from the `X-Forwarded-For` request header (first entry) or, if absent, `X-Real-IP`. When no trusted proxies are configured, the TCP peer address MUST be used directly and forwarded-IP headers MUST be ignored.
+
+### Key Entities
+
+- **Rate-limit record**: Tracks the number of consecutive failures and the window start time for a given source identifier; expires automatically after the cooldown period.
+- **Source identifier**: The resolved client IP address used to key rate-limit records. When `LOGIN_TRUSTED_PROXY_IPS` is empty (default), this is the TCP peer address. When one or more proxy IPs/CIDRs are configured and the TCP peer matches, the first `X-Forwarded-For` entry (or `X-Real-IP`) is used instead.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: An automated script sending 100 consecutive failed login requests completes with at least 90 of those requests rejected after the threshold is crossed — verified in a controlled test environment.
+- **SC-002**: A legitimate user who has been temporarily blocked can successfully log in within 5 minutes of the cooldown period expiring without any manual intervention.
+- **SC-003**: Zero information about threshold values or exact lockout expiry is present in blocked response bodies or headers.
+- **SC-004**: Every blocking event produces a corresponding log entry; 100% of triggered blocking events are observable in logs during testing.
+
+## Assumptions
+
+- The application has a single login endpoint used by all clients (the owner login introduced in feature 004).
+- Source identification uses the resolved client IP address. By default (when `LOGIN_TRUSTED_PROXY_IPS` is empty) this is the TCP peer address. When one or more proxy IPs/CIDRs are configured, the first entry of `X-Forwarded-For` (or `X-Real-IP`) is used instead — but only when the TCP peer is in the trusted list, preventing header spoofing by external clients.
+- If the rate-limit backing store is unavailable, the system fails open (allows the attempt through) rather than blocking all logins — this preserves the owner's access, which is critical for a single-user admin application.
+- No CAPTCHA or multi-factor step is in scope; protection is purely count/time-based.
+- The feature targets the login endpoint only; other endpoints are out of scope.
+- The single-user nature of the app means IP-based identification is sufficient — there is no need for per-username lockout, and using IP (rather than username) avoids contributing to username enumeration risk.
--- a/specs/009-login-rate-limiting/tasks.md
+++ b/specs/009-login-rate-limiting/tasks.md
@@ -0,0 +1,120 @@
+# Tasks: Login Brute-Force Protection
+
+**Input**: Design documents from `specs/009-login-rate-limiting/`
+**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/auth.md ✅, quickstart.md ✅
+
+**Tests**: TDD is non-negotiable (§5.1). Every test task appears before the implementation task it covers. For each red step, run the test and confirm it fails before proceeding to the implementation.
+
+**Organization**: Phase 1 adds env vars; Phase 2 adds config fields (shared by both stories); Phase 3 implements the core blocking behaviour (US1 MVP); Phase 4 adds observability-specific test coverage (US2); Phase 5 is polish.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel with other [P] tasks in the same phase
+- **[Story]**: Which user story this task belongs to
+- Exact file paths included in every task description
+
+---
+
+## Phase 1: Setup
+
+- [X] T001 Add a `# Login brute-force protection` comment block with `LOGIN_MAX_FAILURES=5`, `LOGIN_WINDOW_SECONDS=300`, `LOGIN_COOLDOWN_SECONDS=900`, and `LOGIN_TRUSTED_PROXY_IPS=` (empty by default, with an inline comment explaining it accepts comma-separated IPs/CIDRs) to both `.env.example` and `.env.test.example` at the repo root
+
+---
+
+## Phase 2: Foundational
+
+**Purpose**: Add the three new settings fields — required before any story implementation.
+
+- [X] T002 Add `login_max_failures: int = 5`, `login_window_seconds: int = 300`, `login_cooldown_seconds: int = 900`, `login_trusted_proxy_ips: str = ""` to the `Settings` class in `api/app/config.py` (append after `owner_password`)
+
+**Checkpoint**: `api/app/config.py` accepts all three new env vars with defaults.
+
+---
+
+## Phase 3: User Story 1 — Repeated failed logins are blocked (Priority: P1) 🎯 MVP
+
+**Goal**: After `LOGIN_MAX_FAILURES` consecutive failed login attempts from the same source IP within `LOGIN_WINDOW_SECONDS`, `POST /api/v1/auth/token` returns HTTP 429 for `LOGIN_COOLDOWN_SECONDS`. A successful login resets the counter.
+
+**Independent Test**: `cd api && python -m pytest tests/unit/test_rate_limiter.py tests/integration/test_login_rate_limit.py::test_repeated_failures_trigger_429 tests/integration/test_login_rate_limit.py::test_success_resets_counter tests/integration/test_login_rate_limit.py::test_429_has_retry_after_header tests/integration/test_login_rate_limit.py::test_xff_header_ignored_when_no_trusted_networks -v` — all pass.
+
+### Tests for User Story 1 (TDD red — write first, confirm failure before T005)
+
+- [X] T003 [P] [US1] Create `api/tests/unit/test_rate_limiter.py` with ten failing unit tests — import `LoginRateLimiter` and `get_client_ip` from `app.auth.rate_limiter`; for `LoginRateLimiter` (instantiate with `max_failures=3, window_seconds=60, cooldown_seconds=300`): `test_not_blocked_initially`, `test_blocked_after_threshold`, `test_success_clears_failures`, `test_ips_are_isolated`, `test_window_resets_after_expiry`, `test_log_warning_on_lockout` (caplog at WARNING level: call `record_failure()` until threshold, assert `"Login blocked" in caplog.text` and IP in log output); for `get_client_ip` (construct a mock using `from unittest.mock import MagicMock` and `from starlette.requests import Request`: `req = MagicMock(spec=Request); req.client.host = "10.0.0.1"; req.headers = {"X-Forwarded-For": "203.0.113.5"}`): `test_get_client_ip_no_trusted_networks_returns_peer` (empty `trusted_networks=[]` → returns `req.client.host`), `test_get_client_ip_trusted_peer_uses_xff` (peer `"10.0.0.1"` in trusted CIDR `"10.0.0.0/8"` → returns `"203.0.113.5"`), `test_get_client_ip_untrusted_peer_ignores_xff` (peer `"8.8.8.8"` not in trusted CIDR `"10.0.0.0/8"` → returns `"8.8.8.8"` despite XFF), `test_get_client_ip_trusted_peer_falls_back_to_real_ip` (peer trusted, no XFF header, `X-Real-IP: "203.0.113.9"` → returns `"203.0.113.9"`); run `python -m pytest tests/unit/test_rate_limiter.py -v` and confirm `ImportError` or `ModuleNotFoundError` (red)
+- [X] T004 [P] [US1] Create `api/tests/integration/test_login_rate_limit.py` with four failing integration tests; each must override both `app.state.login_rate_limiter` (fresh `LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=30)`) and `app.state.login_trusted_networks` (set to `[]` for all four tests — the `ASGITransport` peer is `"testclient"`, not a valid IP, so trusted-network matching can't be exercised here; proxy extraction is fully covered by T003 unit tests) via try/finally: (1) `test_repeated_failures_trigger_429` — POST three bad-credential requests then assert fourth returns 429 with `resp.json()["code"] == "login_rate_limited"`; (2) `test_success_resets_counter` — two failures → one valid login using `{"username": os.environ["OWNER_USERNAME"], "password": os.environ["OWNER_PASSWORD"]}` (matching conftest.py defaults: `testowner`/`testpassword`) → three more failures → assert all three return 401, not 429; (3) `test_429_has_retry_after_header` — trigger lockout (three failures), then assert `"Retry-After" in resp.headers` and `int(resp.headers["Retry-After"]) > 0`; (4) `test_xff_header_ignored_when_no_trusted_networks` — send three bad-cred requests with `headers={"X-Forwarded-For": "1.2.3.4"}` then a fourth with `headers={"X-Forwarded-For": "9.9.9.9"}` — assert the fourth returns 429 (not 401), proving the limiter tracked the real peer `"testclient"` for all requests and XFF was ignored; run `python -m pytest tests/integration/test_login_rate_limit.py -v` and confirm failure (red)
+
+### Implementation for User Story 1
+
+- [X] T005 [US1] Create `api/app/auth/rate_limiter.py` with two exports: (1) `get_client_ip(request: Request, trusted_networks: list[IPv4Network | IPv6Network]) -> str` — imports `ipaddress`, `from ipaddress import IPv4Network, IPv6Network`, `from starlette.requests import Request`; extracts `peer = request.client.host if request.client else "unknown"`; if `trusted_networks` is non-empty and peer is parseable as an IP address and falls within any trusted network, returns first `X-Forwarded-For` entry (strip whitespace) or `X-Real-IP` value, otherwise returns `peer`; wraps `ipaddress.ip_address(peer)` in `try/except ValueError` and falls back to `peer` on error; (2) `LoginRateLimiter` class: `__init__(self, max_failures: int = 5, window_seconds: int = 300, cooldown_seconds: int = 900)` storing params as `_max`, `_window`, `_cooldown`; `_store: dict[str, _Record]` and `_lock: threading.Lock`; `@dataclass _Record` with `failures: int = 0`, `window_start: float = field(default_factory=time.time)`, `blocked_until: float = 0.0`; `is_blocked(ip: str) -> bool`, `record_failure(ip: str) -> None` (logs WARNING on lockout), `record_success(ip: str) -> None`, `cooldown_seconds` property; stdlib imports: `import ipaddress, logging, time`, `from dataclasses import dataclass, field`, `from threading import Lock`
+- [X] T006 [US1] Update `api/app/main.py` lifespan: add `import ipaddress` at top; import `LoginRateLimiter` from `app.auth.rate_limiter`; inside `lifespan` before `engine = get_engine()`, consolidate to `settings = get_settings()` (remove the existing bare `get_settings()` call), then set `application.state.login_rate_limiter = LoginRateLimiter(max_failures=settings.login_max_failures, window_seconds=settings.login_window_seconds, cooldown_seconds=settings.login_cooldown_seconds)`; then parse `settings.login_trusted_proxy_ips` — split on `","`, strip each part, skip empty strings, call `ipaddress.ip_network(part, strict=False)` inside a `try/except ValueError` (skip invalid entries silently), collect results into `trusted_networks: list`; set `application.state.login_trusted_networks = trusted_networks`
+- [X] T007 [US1] Update `api/app/routers/auth.py` login endpoint: add `Request` to FastAPI imports and add `from fastapi.responses import JSONResponse`; add `from app.auth.rate_limiter import LoginRateLimiter, get_client_ip`; add `request: Request` as first parameter to `login()`; extract `limiter: LoginRateLimiter = request.app.state.login_rate_limiter` and `ip: str = get_client_ip(request, request.app.state.login_trusted_networks)`; add guard block — if `limiter.is_blocked(ip)`: return `JSONResponse(status_code=429, content={"detail": "Too many failed login attempts. Please try again later.", "code": "login_rate_limited"}, headers={"Retry-After": str(limiter.cooldown_seconds)})`; after `verify_credentials` returns False: call `limiter.record_failure(ip)` before the existing `HTTPException`; after `auth.create_token()`: call `limiter.record_success(ip)` before returning `TokenResponse`
+- [X] T008 [US1] Verify TDD green: run `cd api && python -m pytest tests/unit/test_rate_limiter.py -v` — all 10 pass; run `make test-integration` — all tests pass including `test_repeated_failures_trigger_429`, `test_success_resets_counter`, `test_429_has_retry_after_header`, and `test_xff_header_ignored_when_no_trusted_networks`
+
+**Checkpoint**: Brute-force blocking is live. Automated repeated failures are stopped after threshold; the owner can still log in after cooldown; unit and integration tests pass.
+
+---
+
+## Phase 4: User Story 2 — Operators can observe blocking activity (Priority: P2)
+
+**Goal**: The 429 response includes a `Retry-After` header with a positive integer; the response body `code` is `"login_rate_limited"` and contains no threshold numeric values; server logs a WARNING when blocking triggers.
+
+**Independent Test**: Trigger the rate limiter (already works from Phase 3) and assert `Retry-After` header is present in the response and `code` field is `"login_rate_limited"`.
+
+### Tests for User Story 2 (TDD red — extend existing file)
+
+- [X] T009 [US2] Add one test to `api/tests/integration/test_login_rate_limit.py` targeting observability properties not yet covered: `test_429_body_shape` — override `app.state.login_rate_limiter` with a fresh `LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=30)` via try/finally (same isolation pattern as T004), trigger lockout (three failures), then assert `resp.json() == {"detail": "Too many failed login attempts. Please try again later.", "code": "login_rate_limited"}` (exact match — confirms no threshold values leak and shape is correct); confirm this test is green immediately against the US1 implementation (T007 already returns this exact body)
+
+**Checkpoint**: US2 observability properties are explicitly exercised by integration tests; a future regression in the Retry-After header or code field will be caught.
+
+---
+
+## Phase 5: Polish & Cross-Cutting Concerns
+
+- [X] T010 Run `cd api && ruff check app/auth/rate_limiter.py app/routers/auth.py app/config.py app/main.py tests/unit/test_rate_limiter.py tests/integration/test_login_rate_limit.py` — fix any violations
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Phase 1 (Setup)**: No external dependencies — can start immediately
+- **Phase 2 (Foundational)**: No external dependencies — can start immediately (parallel with Phase 1)
+- **Phase 3 (US1)**: Depends on Phase 2 (T002 must exist before T006 can use `settings.login_max_failures`)
+- **Phase 4 (US2)**: Depends on Phase 3 (tests verify behaviour implemented in T007)
+- **Phase 5 (Polish)**: Depends on all prior phases
+
+### Within Phase 3
+
+- T003 ∥ T004 (different files, no dependency — write tests in parallel)
+- T005 after T003, T004 (implement after tests confirm they fail)
+- T006 ∥ T007 after T005 (both import from `rate_limiter.py`; write to different files — `main.py` and `auth.py`; T006 sets `app.state.login_trusted_networks` which T007's router reads)
+- T008 after T005, T006, T007 (verify all pass)
+
+### Execution Order Summary
+
+```
+Step 1: T001 ∥ T002 (setup + foundational — parallel, different files)
+Step 2: T003 ∥ T004 (write failing tests — parallel)
+Step 3: T005 (implement LoginRateLimiter — after red tests confirmed)
+Step 4: T006 ∥ T007 (wire limiter into app — parallel, different files)
+Step 5: T008 (verify green)
+Step 6: T009 (US2 observability tests — verify green immediately)
+Step 7: T010 (ruff clean)
+```
+
+---
+
+## Implementation Strategy
+
+### MVP (US1 — the blocker)
+
+1. Complete T001–T002 (config setup)
+2. Complete T003–T008 (core blocking)
+3. **Validate**: Run `make test-integration` — all 88 existing tests still pass; 2 new rate-limit tests pass
+4. US2 adds verification coverage for already-implemented observability features
+
+### Incremental Delivery
+
+- After Phase 3: Brute-force attacks on the login endpoint are blocked — core security net is in place
+- After Phase 4: Observability properties are explicitly tested — regressions in headers/logs will be caught
+- After Phase 5: Lint clean, ready for merge
--- a/specs/010-api-prod-dockerfile/checklists/requirements.md
+++ b/specs/010-api-prod-dockerfile/checklists/requirements.md
@@ -0,0 +1,34 @@
+# Specification Quality Checklist: Production-Grade API Container Image
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-05-07
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [X] No implementation details (languages, frameworks, APIs)
+- [X] Focused on user value and business needs
+- [X] Written for non-technical stakeholders
+- [X] All mandatory sections completed
+
+## Requirement Completeness
+
+- [X] No [NEEDS CLARIFICATION] markers remain
+- [X] Requirements are testable and unambiguous
+- [X] Success criteria are measurable
+- [X] Success criteria are technology-agnostic (no implementation details)
+- [X] All acceptance scenarios are defined
+- [X] Edge cases are identified
+- [X] Scope is clearly bounded
+- [X] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [X] All functional requirements have clear acceptance criteria
+- [X] User scenarios cover primary flows
+- [X] Feature meets measurable outcomes defined in Success Criteria
+- [X] No implementation details leak into specification
+
+## Notes
+
+- All items pass. Ready for `/speckit-plan`.
--- a/specs/010-api-prod-dockerfile/contracts/container.md
+++ b/specs/010-api-prod-dockerfile/contracts/container.md
@@ -0,0 +1,122 @@
+# Contract: Production API Container Image
+
+This document defines the observable interface of the `reactbin-api-prod` container image. Any orchestration layer (Kubernetes manifests, Docker Compose, CI pipeline) MUST be written against this contract.
+
+---
+
+## Network Interface
+
+| Property | Value |
+|----------|-------|
+| Protocol | HTTP/1.1 |
+| Port | 8000 (TCP) |
+| Bind address | `0.0.0.0` (all interfaces inside the container) |
+
+---
+
+## Health Check
+
+The container exposes a health check at the existing API health endpoint:
+
+```
+GET /api/v1/health
+```
+
+**Success response** (`200 OK`):
+```json
+{ "status": "ok" }
+```
+
+The container declares a built-in `HEALTHCHECK` with the following defaults:
+
+| Parameter | Value |
+|-----------|-------|
+| Interval | 30s |
+| Timeout | 5s |
+| Start period | 10s |
+| Retries | 3 |
+
+Orchestrators that define their own probes (e.g. Kubernetes `livenessProbe` / `readinessProbe`) SHOULD use this same endpoint.
+
+---
+
+## Required Environment Variables
+
+All configuration is supplied at runtime via environment variables. The image contains no defaults for secret or environment-specific values.
+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `JWT_SECRET_KEY` | HS256 signing key for bearer tokens | `change-me-long-random-string` |
+| `OWNER_USERNAME` | Username of the single owner account | `owner` |
+| `OWNER_PASSWORD` | Password of the single owner account | `change-me` |
+| `DATABASE_URL` | PostgreSQL connection URL (asyncpg scheme) | `postgresql+asyncpg://user:pass@host:5432/db` |
+| `S3_ENDPOINT_URL` | S3-compatible object storage endpoint | `https://s3.amazonaws.com` |
+| `S3_BUCKET_NAME` | Storage bucket name | `reactbin-prod` |
+| `S3_ACCESS_KEY_ID` | Storage access key | `AKIAIOSFODNN7EXAMPLE` |
+| `S3_SECRET_ACCESS_KEY` | Storage secret key | `wJalrXUtnFEMI/K7MDENG` |
+| `S3_REGION` | Storage region | `us-east-1` |
+
+**Optional environment variables** (safe defaults apply):
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `JWT_EXPIRY_SECONDS` | `86400` | Token lifetime in seconds |
+| `MAX_UPLOAD_BYTES` | `52428800` | Maximum upload file size (50 MB) |
+| `LOGIN_MAX_FAILURES` | `5` | Brute-force lock threshold |
+| `LOGIN_WINDOW_SECONDS` | `300` | Failure counting window |
+| `LOGIN_COOLDOWN_SECONDS` | `900` | Lock duration after threshold |
+| `LOGIN_TRUSTED_PROXY_IPS` | `` | Comma-separated trusted proxy CIDRs |
+| `API_BASE_URL` | _(not required at runtime)_ | Used only by client tooling |
+
+**Startup failure behaviour**: If a required variable is absent, the application exits with a non-zero code before accepting any requests. The error is logged to stderr identifying the missing variable.
+
+---
+
+## Signal Handling
+
+| Signal | Behaviour |
+|--------|-----------|
+| `SIGTERM` | Stop accepting new connections; drain in-flight requests; exit 0 within 30s |
+| `SIGKILL` | Immediate termination (OS-level; no graceful drain possible) |
+
+Kubernetes should configure `terminationGracePeriodSeconds ≥ 30` to allow the full drain window.
+
+---
+
+## Process Identity
+
+| Property | Value |
+|----------|-------|
+| User | `appuser` |
+| UID | `1001` |
+| GID | `1001` |
+| Root privileges | None |
+
+The container MUST NOT be run with `--privileged` or as UID 0.
+
+---
+
+## Filesystem
+
+- **Working directory**: `/app`
+- **Application source**: `/app/app/`
+- **Virtual environment**: `/app/.venv/`
+- **No writable state**: The container requires no persistent local storage. All state is in PostgreSQL and S3.
+- **Read-only root**: The container is compatible with `--read-only` (no writes to the filesystem at runtime).
+
+---
+
+## Logging
+
+All log output is written to **stdout** (info/debug) and **stderr** (warnings/errors). No log files are written inside the container. The container runtime log driver captures all output without additional configuration.
+
+---
+
+## Image Tags
+
+| Tag pattern | Meaning |
+|-------------|---------|
+| `reactbin-api-prod:latest` | Latest build from `master` |
+| `reactbin-api-prod:<git-sha>` | Immutable build for a specific commit |
+
+Deployments SHOULD pin to a specific git SHA tag, not `latest`.
--- a/specs/010-api-prod-dockerfile/plan.md
+++ b/specs/010-api-prod-dockerfile/plan.md
@@ -0,0 +1,242 @@
+# Implementation Plan: Production-Grade API Container Image
+
+**Branch**: `010-api-prod-dockerfile` | **Date**: 2026-05-07 | **Spec**: [spec.md](spec.md)
+**Input**: Feature specification from `specs/010-api-prod-dockerfile/spec.md`
+
+## Summary
+
+Produce a production-ready `api/Dockerfile.prod` using a two-stage build: a uv builder stage that installs lockfile-pinned, production-only dependencies into a virtual environment, and a lean `python:3.12-slim` runtime stage that contains only the venv, application source, and `curl` for health checks. The runtime process runs as a non-root user (UID 1001), handles SIGTERM gracefully via uvicorn's built-in drain, and logs exclusively to stdout/stderr. Behavioral verification is automated via a shell script (`api/tests/build/verify_production_image.sh`) written before the Dockerfile (§5.1 TDD).
+
+---
+
+## Technical Context
+
+**Language/Version**: Python 3.12 (existing API), Docker multi-stage build  
+**Build tool**: uv (lockfile: `api/uv.lock`, already committed)  
+**Base images**: `ghcr.io/astral-sh/uv:python3.12-bookworm-slim` (builder), `python:3.12-slim` (runtime)  
+**Testing**: Shell verification script (`verify_production_image.sh`) + `make verify-prod` target  
+**Target Platform**: linux/amd64 container (Kubernetes or Docker host)  
+**Performance Goals**: Container starts and passes health check within 30s; rebuild from warm cache in under 60s  
+**Constraints**: No root process, no hardcoded secrets, no dev deps in final image, compatible with `--read-only` filesystem  
+**Scale/Scope**: Single-file addition (`Dockerfile.prod`) + shell test + two Makefile targets; zero changes to existing source code
+
+---
+
+## Constitution Check
+
+*GATE: Must pass before Phase 0 research. Re-checked post-design below.*
+
+| Principle | Status | Notes |
+|-----------|--------|-------|
+| §5.1 TDD non-negotiable | **COMPLIANT** | `verify_production_image.sh` written before `Dockerfile.prod`; script fails (red) because the build file is absent, then passes (green) after |
+| §5.2 Test pyramid | **COMPLIANT** | Shell verification script is the integration-level test for this build artefact; no unit tests applicable (no Python business logic added) |
+| §5.4 CI must pass | **COMPLIANT** | `make verify-prod` target is runnable in host CI (requires Docker on the runner, which the existing `make test-integration` already requires) |
+| §6 Tech Stack — Docker | **COMPLIANT** | Docker + Docker Compose are mandated; this adds a production Docker file within that constraint |
+| §7.1 One-command local start | **COMPLIANT** | `api/Dockerfile` (dev stack) is unchanged; `docker compose up` is unaffected |
+| §7.2 Environment configuration | **COMPLIANT** | `Dockerfile.prod` contains zero hardcoded env values; all config is injected at runtime |
+| §7.3 Ruff/lint | **COMPLIANT** | No new Python files; shell script linted with `shellcheck` |
+| §2.6 No speculative abstraction | **COMPLIANT** | Single Dockerfile, no plugin system or generics |
+| §8 Scope boundaries | **COMPLIANT** | Purely infrastructure; no new API routes, data model, or UI changes |
+
+**Post-design re-check**: All gates remain green. No violations.
+
+---
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/010-api-prod-dockerfile/
+├── plan.md              # This file
+├── research.md          # Phase 0 decisions
+├── contracts/
+│   └── container.md     # Container interface contract (port, env vars, signals, user)
+├── quickstart.md        # Build and verification scenarios
+└── tasks.md             # Generated by /speckit-tasks
+```
+
+### Source Code Changes
+
+```text
+api/
+├── Dockerfile           # Existing dev/test image — UNCHANGED
+├── Dockerfile.prod      # NEW: production multi-stage image
+├── .dockerignore        # Existing — verify test files are excluded from build context
+└── tests/
+    └── build/
+        └── verify_production_image.sh   # NEW: TDD verification script (written first)
+
+Makefile                 # Root Makefile — add build-prod and verify-prod targets
+```
+
+---
+
+## Dockerfile.prod — Annotated Reference
+
+```dockerfile
+# syntax=docker/dockerfile:1
+
+# ════════════════════════════════════════════════
+# Build stage: install production deps via uv
+# ════════════════════════════════════════════════
+FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder
+
+WORKDIR /app
+
+# Pre-compile bytecode; use copy mode for cross-layer compatibility
+ENV UV_COMPILE_BYTECODE=1 \
+    UV_LINK_MODE=copy \
+    UV_PYTHON_DOWNLOADS=never
+
+# ── Layer cache split: deps only (changes rarely) ──
+COPY pyproject.toml uv.lock ./
+RUN --mount=type=cache,target=/root/.cache/uv \
+    uv sync --frozen --no-dev --no-install-project
+
+# ── Layer cache split: source (changes often) ──
+COPY app/ ./app/
+
+# ════════════════════════════════════════════════
+# Runtime stage: lean image with venv + source
+# ════════════════════════════════════════════════
+FROM python:3.12-slim
+
+WORKDIR /app
+
+# curl for HEALTHCHECK — only tool added beyond base Python
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends curl \
+    && rm -rf /var/lib/apt/lists/*
+
+# Non-root system user (UID/GID 1001)
+RUN groupadd --system --gid 1001 appgroup \
+    && useradd --system --uid 1001 --gid 1001 --no-create-home appuser
+
+# Copy venv from builder; copy source directly from build context
+COPY --from=builder --chown=appuser:appgroup /app/.venv /app/.venv
+COPY --chown=appuser:appgroup app/ ./app/
+
+USER appuser
+
+# Activate the venv by prepending its bin to PATH
+ENV PATH="/app/.venv/bin:$PATH"
+
+EXPOSE 8000
+
+HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
+    CMD curl -f http://localhost:8000/api/v1/health || exit 1
+
+# uvicorn handles SIGTERM; --timeout-graceful-shutdown gives 30s to drain requests
+CMD ["uvicorn", "app.main:app", \
+     "--host", "0.0.0.0", \
+     "--port", "8000", \
+     "--timeout-graceful-shutdown", "30"]
+```
+
+> **Note on COPY paths**: Build context is `api/` (as set by the Makefile target). `COPY app/ ./app/` in both stages refers to `api/app/`. The runtime stage copies source directly from the build context, not from the builder stage — this is simpler and avoids an extra intermediate layer.
+
+---
+
+## verify_production_image.sh — Structure
+
+```sh
+#!/usr/bin/env bash
+# TDD verification script for api/Dockerfile.prod
+# Fails (red) if Dockerfile.prod does not exist or any check fails.
+set -euo pipefail
+
+IMAGE="reactbin-api-prod:verify-$$"
+
+cleanup() { docker rm -f "$CONTAINER" 2>/dev/null || true; docker rmi "$IMAGE" 2>/dev/null || true; }
+trap cleanup EXIT
+
+# Step 1: Build — fails red if Dockerfile.prod is absent
+docker build -f api/Dockerfile.prod api/ -t "$IMAGE"
+
+# Step 2: Start container with minimal env vars
+CONTAINER=$(docker run -d -p 18000:8000 \
+  -e JWT_SECRET_KEY=verify-test-key \
+  -e OWNER_USERNAME=testowner \
+  -e OWNER_PASSWORD=testpassword \
+  -e DATABASE_URL=postgresql+asyncpg://noop:noop@noop/noop \
+  -e S3_ENDPOINT_URL=http://noop:9000 \
+  -e S3_BUCKET_NAME=noop \
+  -e S3_ACCESS_KEY_ID=noop \
+  -e S3_SECRET_ACCESS_KEY=noop \
+  -e S3_REGION=us-east-1 \
+  "$IMAGE")
+
+# Step 3: Poll health endpoint (app will fail to connect to DB, but /health is pre-DB)
+for i in $(seq 1 30); do
+  if curl -sf http://localhost:18000/api/v1/health > /dev/null; then break; fi
+  sleep 1
+  [[ $i -eq 30 ]] && { echo "FAIL: health check timed out"; exit 1; }
+done
+
+# Step 4: Assert non-root user
+UID_IN_CONTAINER=$(docker exec "$CONTAINER" id -u)
+[[ "$UID_IN_CONTAINER" -ne 0 ]] || { echo "FAIL: process running as root"; exit 1; }
+
+# Step 5: Graceful shutdown
+docker stop "$CONTAINER"          # sends SIGTERM
+EXIT_CODE=$(docker wait "$CONTAINER")
+[[ "$EXIT_CODE" -eq 0 ]] || { echo "FAIL: non-zero exit code $EXIT_CODE"; exit 1; }
+
+# Step 6: Dev deps absent
+if docker run --rm "$IMAGE" /app/.venv/bin/python -c "import pytest" 2>/dev/null; then
+  echo "FAIL: pytest importable in production image (dev deps present)"; exit 1
+fi
+
+echo "All production image checks passed."
+```
+
+> **Note on health check feasibility**: `/api/v1/health` is a simple JSON response that does not require a database connection (confirmed in `api/app/main.py`). The verification script can therefore pass even without a real PostgreSQL instance.
+
+---
+
+## Makefile Targets
+
+Add to root `Makefile`:
+
+```makefile
+.PHONY: build-prod verify-prod
+
+build-prod:
+	docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:latest
+
+verify-prod:
+	bash api/tests/build/verify_production_image.sh
+```
+
+---
+
+## `.dockerignore` Review
+
+The existing `api/.dockerignore` already excludes `.venv/`, `__pycache__/`, `.env`, etc. Two additions improve the production build context:
+
+```
+tests/
+*.egg-info/
+alembic/
+alembic.ini
+```
+
+`tests/` and `alembic/` are not needed in the production image (we `COPY app/ ./app/` explicitly). Excluding them from the build context reduces the data sent to the Docker daemon.
+
+> `*.egg-info/` is already present in the existing `.dockerignore`.
+
+---
+
+## Implementation Order
+
+Tasks are generated by `/speckit-tasks`, but the logical dependency order is:
+
+1. **Write `verify_production_image.sh`** (TDD red — build fails because `Dockerfile.prod` absent)
+2. **Add `Makefile` targets** (`build-prod`, `verify-prod`) — references the script
+3. **Write `api/Dockerfile.prod`** (implement to make TDD pass)
+4. **Update `api/.dockerignore`** (exclude `tests/`, `alembic/` from build context)
+5. **Run `make verify-prod`** (TDD green — all 6 checks pass)
+6. **Run `shellcheck`** on `verify_production_image.sh`
+
+No existing tests are modified. `make test-integration` continues to use `api/Dockerfile` unchanged.
--- a/specs/010-api-prod-dockerfile/quickstart.md
+++ b/specs/010-api-prod-dockerfile/quickstart.md
@@ -0,0 +1,138 @@
+# Quickstart: Production API Container Image
+
+## Prerequisites
+
+- Docker 24+ installed and running on the host
+- `make` available
+- A copy of `.env` (or the env vars from `.env.example`) for smoke-testing
+
+---
+
+## Build the Production Image
+
+```sh
+make build-prod
+# Equivalent: docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:latest
+```
+
+On a warm cache (deps unchanged), the build should complete in under 60 seconds because the dependency layer is reused.
+
+---
+
+## Verify the Production Image (TDD Smoke Test)
+
+```sh
+make verify-prod
+```
+
+This runs `api/tests/build/verify_production_image.sh`, which:
+1. Builds the image (fails fast if `Dockerfile.prod` is missing — the **red** TDD state)
+2. Starts the container with test env vars
+3. Polls `/api/v1/health` until it returns 200 (or times out after 30s)
+4. Asserts the API process is running as a non-root user (UID ≠ 0)
+5. Sends SIGTERM and asserts the container exits with code 0 within 30s
+6. Asserts `pytest` is NOT importable inside the container (dev deps excluded)
+
+**Expected output (green)**:
+```
+[verify] Building reactbin-api-prod:test ...
+[verify] Build OK
+[verify] Starting container ...
+[verify] Health check passed (GET /api/v1/health → 200)
+[verify] Process user: 1001 (non-root ✓)
+[verify] Sending SIGTERM ...
+[verify] Container exited with code 0 (graceful shutdown ✓)
+[verify] Dev deps absent ✓
+[verify] All checks passed.
+```
+
+---
+
+## User Story Integration Scenarios
+
+### US1 — API Runs Reliably in Production
+
+```sh
+# Start container with real (or test) env vars
+docker run --rm -d \
+  --name reactbin-test \
+  -p 8000:8000 \
+  -e JWT_SECRET_KEY=my-secret \
+  -e OWNER_USERNAME=owner \
+  -e OWNER_PASSWORD=changeme \
+  -e DATABASE_URL=postgresql+asyncpg://user:pass@host:5432/db \
+  -e S3_ENDPOINT_URL=http://minio:9000 \
+  -e S3_BUCKET_NAME=reactbin \
+  -e S3_ACCESS_KEY_ID=minioadmin \
+  -e S3_SECRET_ACCESS_KEY=minioadmin \
+  -e S3_REGION=us-east-1 \
+  reactbin-api-prod:latest
+
+# Check health
+curl http://localhost:8000/api/v1/health
+# → {"status":"ok"}
+
+# Graceful shutdown
+docker stop reactbin-test     # sends SIGTERM
+docker wait reactbin-test     # → exit code 0
+```
+
+### US2 — Minimal, Secure Container
+
+```sh
+# Verify non-root user
+docker inspect --format='{{.Config.User}}' reactbin-api-prod:latest
+# → appuser (or 1001)
+
+# Verify no dev packages (pytest should not be importable)
+docker run --rm reactbin-api-prod:latest \
+  /app/.venv/bin/python -c "import pytest" 2>&1
+# → ModuleNotFoundError: No module named 'pytest'
+
+# Verify no source control or test files in image
+docker run --rm reactbin-api-prod:latest ls /app
+# → app  .venv   (no tests/, no alembic/, no .git/)
+```
+
+### US3 — Fast, Reproducible Builds
+
+```sh
+# First build (cold): installs all deps
+time docker build --no-cache -f api/Dockerfile.prod api/ -t reactbin-api-prod:cold
+
+# Touch a source file only (no dep change)
+touch api/app/main.py
+
+# Second build: dependency layer served from cache
+time docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:warm
+# Expect: warm build < 30s; cold build varies (network-dependent)
+
+# Confirm same health response from both
+docker run --rm ... reactbin-api-prod:cold
+docker run --rm ... reactbin-api-prod:warm
+```
+
+---
+
+## Missing Env Var Behaviour
+
+```sh
+docker run --rm \
+  -e JWT_SECRET_KEY=my-secret \
+  # OWNER_USERNAME intentionally omitted
+  reactbin-api-prod:latest
+# → Container exits non-zero, stderr logs: "field required: owner_username"
+```
+
+---
+
+## Read-Only Filesystem Compatibility
+
+```sh
+docker run --rm --read-only \
+  -e JWT_SECRET_KEY=... [other env vars] \
+  reactbin-api-prod:latest &
+
+curl http://localhost:8000/api/v1/health
+# → {"status":"ok"}
+```
--- a/specs/010-api-prod-dockerfile/research.md
+++ b/specs/010-api-prod-dockerfile/research.md
@@ -0,0 +1,94 @@
+# Research: Production API Container Image
+
+## Decision 1 — Use a Separate `Dockerfile.prod`
+
+**Decision**: Add `api/Dockerfile.prod` alongside the existing `api/Dockerfile`.
+
+**Rationale**: The existing `api/Dockerfile` installs dev dependencies (`.[dev]`), mounts source with `--reload`, and is used by the Docker Compose integration test stack. Modifying it would break `make test-integration`. A separate file keeps the two images independent with zero coupling.
+
+**Alternatives considered**:
+- Build-arg flag in a single Dockerfile: adds conditional complexity and makes both files harder to read.
+- Rename existing to `Dockerfile.dev` and make `Dockerfile` the production image: would require updating `docker-compose.test.yml` with an explicit file reference — a wider change than needed for this feature.
+
+---
+
+## Decision 2 — Multi-Stage Build: uv Builder + python:3.12-slim Runtime
+
+**Decision**: Two-stage build. Stage 1 (`builder`) uses `ghcr.io/astral-sh/uv:python3.12-bookworm-slim` to install production dependencies into a virtual environment. Stage 2 (`runtime`) uses `python:3.12-slim` and copies only the `.venv` and application source from the builder. uv is not present in the final image.
+
+**Rationale**: 
+- uv's official Docker image is the fastest, most correct way to produce a pinned, bytecode-compiled venv from `uv.lock`.
+- Keeping uv out of the runtime image reduces attack surface and image size.
+- `python:3.12-slim` is a well-maintained, widely scanned base; using it for the runtime stage aligns with existing project images.
+
+**Layer caching strategy**:
+```
+COPY pyproject.toml uv.lock ./
+RUN uv sync --frozen --no-dev --no-install-project   ← cache hits when only source changes
+COPY app/ ./app/                                       ← only reaches here on source changes
+```
+`--no-install-project` installs all listed dependencies without the project package itself. The project source is then copied separately. This means a source-only change reuses the dependency layer from cache.
+
+**Environment variables for optimal builds**:
+- `UV_COMPILE_BYTECODE=1` — pre-compile `.pyc` files; slightly larger venv but faster cold starts.
+- `UV_LINK_MODE=copy` — avoids hard-link issues when copying between image layers.
+- `UV_PYTHON_DOWNLOADS=never` — ensures the builder stage uses the bundled Python, not a downloaded one.
+
+**Alternatives considered**:
+- Installing deps into the system Python (`--system`): rejected because it pollutes the base image and makes it harder to copy deps cleanly into the runtime stage.
+- Using a single `FROM python:3.12-slim` with pip: slower builds, no lockfile pinning, no bytecode compilation step.
+
+---
+
+## Decision 3 — Non-Root User (UID 1001, System User)
+
+**Decision**: Create a system user `appuser` with GID/UID 1001 in the runtime stage. All owned files are `chown`-ed at `COPY` time using `--chown=appuser:appgroup`.
+
+**Rationale**: Running as root inside a container is a container breakout risk. A numeric UID (rather than a named user that might not exist on the host) is required by some Kubernetes pod security admission policies. UID 1001 avoids collision with UID 1000 (the typical first interactive user on a Linux host) while remaining a predictable, inspectable value.
+
+**Alternatives considered**:
+- UID 1000: small risk of collision with host user when bind mounts are involved.
+- `USER nobody`: `nobody` (UID 65534) works but its name and UID are not consistent across distros.
+
+---
+
+## Decision 4 — SIGTERM Graceful Shutdown via uvicorn `--timeout-graceful-shutdown`
+
+**Decision**: Use `uvicorn`'s built-in `--timeout-graceful-shutdown 30` flag. No process supervisor (tini, s6) is required.
+
+**Rationale**: uvicorn handles SIGTERM natively when run as PID 1 in single-worker mode (the production Dockerfile runs one worker). On SIGTERM it stops accepting new connections, waits up to `--timeout-graceful-shutdown` seconds for in-flight requests to complete, then exits with code 0. No additional init system is needed.
+
+**Alternatives considered**:
+- tini: adds a small init shim that reaps zombies and forwards signals. Not necessary with a single uvicorn worker (no child processes to reap).
+- Gunicorn + uvicorn workers: more complex; appropriate for multi-worker setups but the deployment platform (Kubernetes) scales horizontally via pod replicas rather than in-process workers.
+
+---
+
+## Decision 5 — `curl` for HEALTHCHECK
+
+**Decision**: Install `curl` (via `apt-get --no-install-recommends`) in the runtime stage and use it in the `HEALTHCHECK` directive.
+
+**Rationale**: The existing dev Dockerfile already installs `curl` for the same reason. `curl -f` exits non-zero on HTTP errors, making it a reliable single-command health probe. A Python one-liner adds interpreter startup overhead (~100ms) per check; `curl` is ~5ms.
+
+**Alternatives considered**:
+- `wget -q --spider`: available on Alpine but not on Debian-slim by default; requires separate install.
+- Python `urllib.request`: no extra install, but slower and adds noise to the process table during health checks.
+
+---
+
+## Decision 6 — TDD Verification via Shell Script
+
+**Decision**: Write `api/tests/build/verify_production_image.sh` before `Dockerfile.prod`. The script builds the image and runs behavioral checks (health endpoint, non-root user, clean SIGTERM exit). It is the "failing test" per §5.1.
+
+**Rationale**: The production image is a build artifact, not Python business logic. pytest cannot test a Docker image without Docker-in-Docker, which the current CI stack does not support. A shell script run on the host (via `make verify-prod`) is the appropriate TDD vehicle for this artefact type.
+
+**Verification steps the script covers**:
+1. `docker build -f api/Dockerfile.prod api/` → fails (red) until Dockerfile.prod exists.
+2. Run container with required env vars; wait for health endpoint → `GET /api/v1/health` returns 200.
+3. Inspect running process user → UID ≠ 0 (non-root).
+4. Send SIGTERM to container; assert exit code 0 within 30s (graceful shutdown).
+5. Assert dev packages are absent: `pip show pytest` inside container must return non-zero.
+
+**Alternatives considered**:
+- pytest with docker SDK: requires `docker` Python package and DinD in CI; rejected as over-engineered for a single-file build artifact.
+- Manual verification only: rejected because §5.1 mandates automated failing tests before production code.
--- a/specs/010-api-prod-dockerfile/spec.md
+++ b/specs/010-api-prod-dockerfile/spec.md
@@ -0,0 +1,96 @@
+# Feature Specification: Production-Grade API Container Image
+
+**Feature Branch**: `010-api-prod-dockerfile`
+**Created**: 2026-05-07
+**Status**: Draft
+**Input**: User description: "We need a production-grade Dockerfile for the API to start preparing for a production deployment."
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 — API Runs Reliably in Production (Priority: P1)
+
+An operator builds and runs the API container in a production environment. The container starts successfully, serves requests, and can be health-checked by an orchestrator (e.g., Kubernetes). When the orchestrator signals shutdown, the container drains in-flight requests before exiting cleanly, avoiding dropped connections.
+
+**Why this priority**: Without a correctly functioning container, no production deployment is possible. This is the baseline that all other stories depend on.
+
+**Independent Test**: Build the image from source, run the container with required env vars, call the health endpoint, send SIGTERM, and verify the process exits cleanly with code 0. No other stories are required.
+
+**Acceptance Scenarios**:
+
+1. **Given** a built container image and all required env vars, **When** the container starts, **Then** it begins serving requests within 30 seconds and the health endpoint returns a success response.
+2. **Given** a running container, **When** a SIGTERM is received, **Then** the process finishes any in-flight requests and exits with code 0 within 30 seconds.
+3. **Given** a running container, **When** a required env var is absent, **Then** the process exits immediately with a non-zero code and logs a clear error message identifying the missing variable.
+
+---
+
+### User Story 2 — Minimal, Secure Container (Priority: P2)
+
+A security-conscious operator audits the container image before promotion to production. They verify the API process does not run as root, the image contains no development tooling or test artefacts, and no credentials are baked into the image layers.
+
+**Why this priority**: Running as root or including unnecessary tools increases the blast radius of any container breakout. This is a production-readiness requirement, not optional hardening.
+
+**Independent Test**: Inspect the built image to confirm the runtime user is non-root, confirm no dev/test files are present in the image layers, and scan the image with a standard vulnerability scanner. Passes independently of any deployment environment.
+
+**Acceptance Scenarios**:
+
+1. **Given** a built container image, **When** the running process user is inspected, **Then** the API process runs as a non-root user with a numeric UID.
+2. **Given** a built container image, **When** the image layers are inspected, **Then** no development dependencies, test files, or local configuration are present.
+3. **Given** a built container image, **When** the image layers are scanned for hardcoded secrets, **Then** no credentials, API keys, or secret values are found embedded in any layer.
+
+---
+
+### User Story 3 — Fast, Reproducible Builds (Priority: P3)
+
+A developer rebuilds the container image after a code change. The build completes quickly because unchanged layers (dependencies) are cached. Given identical source inputs, the resulting image is functionally equivalent across builds, enabling confident CI/CD promotion.
+
+**Why this priority**: Slow or non-deterministic builds reduce developer confidence and slow deployment pipelines. Important for velocity, but the container already works (P1, P2) before this is optimised.
+
+**Independent Test**: Build the image twice from the same source; confirm the second build reuses dependency layers from cache and completes significantly faster than the first.
+
+**Acceptance Scenarios**:
+
+1. **Given** an image built once, **When** only application source files change and the image is rebuilt, **Then** the dependency installation step is served from cache and the rebuild completes faster than a clean build.
+2. **Given** two builds from the same source commit, **When** the images are run, **Then** both produce identical API behaviour.
+
+---
+
+### Edge Cases
+
+- What happens when the database is unavailable at container startup?
+- What happens when the container is sent SIGKILL instead of SIGTERM (hard kill by orchestrator)?
+- What happens if the container runs out of memory mid-request?
+- How does the image behave when run read-only filesystem (`--read-only`)?
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The container image MUST start the API service and begin accepting requests without manual intervention after supplying required env vars.
+- **FR-002**: The container image MUST expose a health check that an orchestrator can poll to determine service readiness.
+- **FR-003**: The container image MUST handle the SIGTERM signal by completing in-flight requests then exiting cleanly within 30 seconds.
+- **FR-004**: The container image MUST run the API process as a non-root, non-privileged user.
+- **FR-005**: The container image MUST NOT contain development dependencies, test files, source control metadata, or local configuration files.
+- **FR-006**: The container image MUST NOT contain any hardcoded credentials, secrets, or environment-specific values — all configuration MUST be supplied via environment variables at runtime.
+- **FR-007**: The container image MUST log to standard output and standard error so logs are captured by the container runtime without additional configuration.
+- **FR-008**: The container image MUST be buildable reproducibly from the same source inputs — a rebuild from the same commit MUST produce a functionally equivalent image.
+- **FR-009**: Rebuilding the image after a source-only change (no dependency changes) MUST reuse the cached dependency installation layer.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: The container starts and serves its first successful health-check response within 30 seconds of launch with all required env vars present.
+- **SC-002**: The container exits cleanly (code 0) within 30 seconds of receiving a SIGTERM, with no in-flight requests dropped.
+- **SC-003**: The API process inside the container runs as a non-root user (inspectable via container runtime tooling).
+- **SC-004**: A rebuild after a source-only change completes in under 60 seconds on a warm cache (dependency layer reused).
+- **SC-005**: The image contains zero hardcoded secrets (verifiable by static layer inspection).
+- **SC-006**: All API logs appear on stdout/stderr and are captured by the container runtime log driver without additional sidecar or configuration.
+
+## Assumptions
+
+- The existing test Dockerfile (used by the integration test stack) is not suitable for production and will remain separate; this feature produces a distinct production image.
+- All required runtime configuration (database URL, S3 credentials, JWT secret, etc.) will be injected as environment variables by the deployment platform — the image itself carries no environment-specific values.
+- The deployment target supports OCI-compatible container images (Kubernetes, Docker, etc.).
+- No persistent local storage is needed by the API container; all state lives in the database and object storage.
+- The production image does not need to run database migrations; migrations are applied by a separate step in the deployment pipeline.
+- A single-architecture image (linux/amd64) is sufficient for the initial production target.
--- a/specs/010-api-prod-dockerfile/tasks.md
+++ b/specs/010-api-prod-dockerfile/tasks.md
@@ -0,0 +1,158 @@
+# Tasks: Production-Grade API Container Image
+
+**Input**: Design documents from `specs/010-api-prod-dockerfile/`
+**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, contracts/container.md ✅, quickstart.md ✅
+
+**Tests**: TDD is non-negotiable (§5.1). The "test" for a Docker build artefact is `api/tests/build/verify_production_image.sh`, written before `api/Dockerfile.prod` exists. Running the script immediately fails (red) because the build step cannot find the file; writing `Dockerfile.prod` turns it green.
+
+**Organization**: Phase 1 sets up Makefile targets and `.dockerignore`; Phase 3 (US1) writes the verification script and the Dockerfile; Phase 4 (US2) extends the script with security checks; Phase 5 (US3) extends it with a cache-hit check; Phase 6 polishes.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel with other [P] tasks in the same phase
+- **[Story]**: Which user story this task belongs to
+- Exact file paths included in every task description
+
+---
+
+## Phase 1: Setup
+
+- [X] T001 Add `build-prod` and `verify-prod` targets (and their `.PHONY` entries) to the root `Makefile` at `/workspace/Makefile`: `build-prod` runs `docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:latest`; `verify-prod` runs `bash api/tests/build/verify_production_image.sh`
+
+- [X] T002 Update `api/.dockerignore` at `/workspace/api/.dockerignore`: append three lines — `tests/`, `alembic/`, and `alembic.ini` — so these are excluded from the production build context (the Dockerfile.prod copies only `app/` explicitly, but excluding them from the context keeps the transfer to the Docker daemon fast)
+
+---
+
+## Phase 2: Foundational
+
+- [X] T003 Create directory `api/tests/build/` at `/workspace/api/tests/build/` with `mkdir -p` and add a `.gitkeep` so the directory is tracked
+
+**Checkpoint**: Directory structure is ready; Makefile and .dockerignore are updated.
+
+---
+
+## Phase 3: User Story 1 — API Runs Reliably in Production (Priority: P1) 🎯 MVP
+
+**Goal**: The container builds, starts, serves the health endpoint, and exits cleanly on SIGTERM.
+
+**Independent Test**: `make verify-prod` — passes when `Dockerfile.prod` exists and all US1 checks pass.
+
+### Test for User Story 1 (TDD red — write first, confirm failure before T005)
+
+- [X] T004 [US1] Create `api/tests/build/verify_production_image.sh` as an executable bash script (`chmod +x`) with `#!/usr/bin/env bash` and `set -euo pipefail`; the script MUST:
+  1. Set `IMAGE="reactbin-api-prod:verify-$$"` and `PG_CONTAINER=""` and `APP_CONTAINER=""`;
+  2. Define a `cleanup()` function that runs `docker rm -f "$APP_CONTAINER" "$PG_CONTAINER" 2>/dev/null || true` and `docker rmi "$IMAGE" 2>/dev/null || true`, and register it with `trap cleanup EXIT`;
+  3. **[US1 check 1 — build]** Run `docker build -f api/Dockerfile.prod api/ -t "$IMAGE"` — this is the line that fails **red** because `api/Dockerfile.prod` does not yet exist; print `[verify] Building $IMAGE...` before and `[verify] Build OK` after;
+  4. **[US1 check 2 — start with real DB]** Launch a throwaway postgres: `PG_CONTAINER=$(docker run -d -e POSTGRES_DB=reactbin_verify -e POSTGRES_USER=verify -e POSTGRES_PASSWORD=verify postgres:16-alpine)`; poll `docker exec "$PG_CONTAINER" pg_isready -U verify` up to 30 × 1s, fail if timeout; capture `PG_IP=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' "$PG_CONTAINER")`;
+  5. Start the production container: `APP_CONTAINER=$(docker run -d -p 18000:8000 -e JWT_SECRET_KEY=verify-key -e OWNER_USERNAME=testowner -e OWNER_PASSWORD=testpassword -e DATABASE_URL="postgresql+asyncpg://verify:verify@${PG_IP}:5432/reactbin_verify" -e S3_ENDPOINT_URL=http://noop:9000 -e S3_BUCKET_NAME=noop -e S3_ACCESS_KEY_ID=noop -e S3_SECRET_ACCESS_KEY=noop -e S3_REGION=us-east-1 "$IMAGE")`; note — S3 credentials are placeholders; the health endpoint does not require S3;
+  6. **[US1 check 3 — health endpoint]** Poll `curl -sf http://localhost:18000/api/v1/health` up to 30 × 1s, fail with a message if timeout; print `[verify] Health check passed` on success;
+  7. **[US1 check 4 — SIGTERM → exit 0]** Run `docker stop "$APP_CONTAINER"` (sends SIGTERM); capture `EXIT_CODE=$(docker wait "$APP_CONTAINER")`; assert `"$EXIT_CODE" -eq 0`, fail with `FAIL: non-zero exit $EXIT_CODE` otherwise; print `[verify] Graceful shutdown OK (exit $EXIT_CODE)`;
+  8. Print `[verify] US1 checks passed.`
+  9. **[C3 — missing env var → non-zero exit]** Run `docker run --rm -e JWT_SECRET_KEY=verify-key "$IMAGE" 2>&1`; assert the exit code is **non-zero** (OWNER_USERNAME is absent so Pydantic settings validation must fail at startup); print `[verify] Missing-env-var exit check OK`;
+  After writing the script, run `make verify-prod` and confirm it **fails** with a Docker build error (red state — `Dockerfile.prod` does not exist).
+
+### Implementation for User Story 1
+
+- [X] T005 [US1] Create `api/Dockerfile.prod` at `/workspace/api/Dockerfile.prod` — a two-stage multi-stage build:
+  **Stage 1 (builder)**: `FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder`; `WORKDIR /app`; set `ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy UV_PYTHON_DOWNLOADS=never`; `COPY pyproject.toml uv.lock ./`; `RUN --mount=type=cache,target=/root/.cache/uv uv sync --frozen --no-dev --no-install-project`; `COPY app/ ./app/`
+  **Stage 2 (runtime)**: `FROM python:3.12-slim`; `WORKDIR /app`; `RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*`; `RUN groupadd --system --gid 1001 appgroup && useradd --system --uid 1001 --gid 1001 --no-create-home appuser`; `COPY --from=builder --chown=appuser:appgroup /app/.venv /app/.venv`; `COPY --chown=appuser:appgroup app/ ./app/`; `USER appuser`; `ENV PATH="/app/.venv/bin:$PATH"`; `EXPOSE 8000`; `HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 CMD curl -f http://localhost:8000/api/v1/health || exit 1`; `CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--timeout-graceful-shutdown", "30"]`
+
+- [X] T006 [US1] Verify TDD green for US1: run `make verify-prod` and confirm all four US1 checks pass — build OK, health endpoint returns 200, SIGTERM produces exit code 0, and `[verify] US1 checks passed.` is printed.
+
+**Checkpoint**: US1 is complete. Production container builds, starts, serves traffic, and shuts down gracefully.
+
+---
+
+## Phase 4: User Story 2 — Minimal, Secure Container (Priority: P2)
+
+**Goal**: The production image runs as non-root and contains no dev dependencies or embedded secrets.
+
+**Independent Test**: US2 checks in `make verify-prod` — the same script extended with non-root and dev-deps-absent assertions.
+
+### Tests for User Story 2 (TDD extension — add checks, confirm they pass against existing Dockerfile.prod)
+
+- [X] T007 [US2] Extend `api/tests/build/verify_production_image.sh` with two US2 checks inserted after the SIGTERM check (before the final `US1 checks passed` line):
+  **[US2 check 1 — non-root]** After the container is running (before `docker stop`), run `UID_IN_CONTAINER=$(docker exec "$APP_CONTAINER" id -u)`; assert `"$UID_IN_CONTAINER" -ne 0`, fail with `FAIL: process running as root (UID 0)` if violated; print `[verify] Non-root user OK (UID $UID_IN_CONTAINER)`;
+  **[US2 check 2 — dev deps absent]** After cleanup of APP_CONTAINER but still holding the image, run `docker run --rm "$IMAGE" /app/.venv/bin/python -c "import pytest" 2>/dev/null`; assert the command returns **non-zero** (i.e., pytest is NOT importable); if it returns 0, fail with `FAIL: pytest importable in production image (dev deps present)`; print `[verify] Dev deps absent OK`;
+  **[C1 — stdout log capture]** Run `docker logs "$APP_CONTAINER" 2>&1`; assert the output is non-empty and contains `Started server` or `Application startup complete` (uvicorn startup lines); fail with `FAIL: no startup logs found on stdout/stderr` if absent; print `[verify] Stdout logging OK`; note — insert this check while APP_CONTAINER is still running, before the `docker stop` call;
+  **[C2 — no hardcoded secrets in layers]** Run `docker history --no-trunc "$IMAGE" 2>&1`; pipe through `grep -iE "(password|secret_key|api_key|token)" `; assert zero matching lines; if any match, fail with `FAIL: potential secret found in image history`; print `[verify] No secrets in image layers OK`;
+  Update the final success line to `[verify] All checks passed (US1 + US2).`; confirm `make verify-prod` passes.
+
+**Checkpoint**: US2 is verified. Image runs as UID 1001 and contains no test tooling.
+
+---
+
+## Phase 5: User Story 3 — Fast, Reproducible Builds (Priority: P3)
+
+**Goal**: Rebuilding after a source-only change reuses the dependency layer from cache.
+
+**Independent Test**: US3 check in `make verify-prod` — a timed second build after touching a source file asserts the dep layer was cached.
+
+### Tests for User Story 3 (TDD extension)
+
+- [X] T008 [US3] Extend `api/tests/build/verify_production_image.sh` with a US3 cache check appended after all other checks (before final success line):
+  **[US3 check — dep layer cached on source-only rebuild]** Set `IMAGE2="reactbin-api-prod:verify-cache-$$"`; `touch api/app/main.py`; capture the output of `docker build --progress=plain -f api/Dockerfile.prod api/ -t "$IMAGE2" 2>&1` (the `--progress=plain` flag ensures consistent `CACHED` output regardless of Docker version or TTY settings); assert the output contains the string `CACHED`; if `CACHED` is absent, fail with `FAIL: dependency layer not reused on source-only rebuild`; add `docker rmi "$IMAGE2" 2>/dev/null || true` to the `cleanup()` function; print `[verify] Dep layer cache hit confirmed (US3 OK)`;
+  Update the final success line to `[verify] All checks passed (US1 + US2 + US3).`
+
+- [X] T009 [US3] Verify TDD green for US3: run `make verify-prod` and confirm the full script passes including the cache check — the build output for the second image must contain `CACHED`, and `[verify] All checks passed (US1 + US2 + US3).` must print.
+
+**Checkpoint**: All three user stories are verified end-to-end by `make verify-prod`.
+
+---
+
+## Phase 6: Polish & Cross-Cutting Concerns
+
+- [X] T010 Run `make test-integration` from `/workspace` and confirm all 102 existing tests still pass — verifies that the `.dockerignore` additions (T002) do not break the existing test Dockerfile build or any integration test (§5.4 regression gate)
+
+- [X] T011 Run `shellcheck api/tests/build/verify_production_image.sh` and fix any violations (common: unquoted variables, `[ ]` vs `[[ ]]`, missing `--` before arguments)
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Phase 1 (Setup)**: No external dependencies — start immediately
+- **Phase 2 (Foundational)**: No dependencies — start immediately (parallel with Phase 1)
+- **Phase 3 (US1)**: Depends on Phase 1 (Makefile + .dockerignore must exist before `make verify-prod` can run) and Phase 2 (test directory must exist)
+- **Phase 4 (US2)**: Depends on Phase 3 (US1 script and Dockerfile must exist to extend)
+- **Phase 5 (US3)**: Depends on Phase 4 (full US2 script must exist to extend)
+- **Phase 6 (Polish)**: Depends on all prior phases; T010 (regression test) must precede T011 (shellcheck)
+
+### Within Phase 3
+
+- T004 before T005 (write test script before writing the Dockerfile)
+- T005 after T004 (implement Dockerfile after confirming red state)
+- T006 after T005 (verify green after implementation)
+
+### Execution Order Summary
+
+```
+Step 1: T001 ∥ T002 ∥ T003  (setup — parallel, different files)
+Step 2: T004                 (write verification script — TDD red)
+Step 3: T005                 (write Dockerfile.prod — implementation)
+Step 4: T006                 (verify US1 green)
+Step 5: T007                 (extend script with US2 checks, verify pass)
+Step 6: T008                 (extend script with US3 check)
+Step 7: T009                 (verify US3 green)
+Step 8: T010                 (make test-integration — regression gate)
+Step 9: T011                 (shellcheck polish)
+```
+
+---
+
+## Implementation Strategy
+
+### MVP (US1 — reliable production run)
+
+1. Complete T001–T003 (setup)
+2. Complete T004–T006 (core blocking: write script → write Dockerfile → verify green)
+3. **Validate**: `make verify-prod` passes; `make test-integration` still passes (no regressions)
+4. US2 and US3 add explicit verification coverage for properties already implemented
+
+### Incremental Delivery
+
+- After Phase 3: Production image builds, starts, and shuts down gracefully — safe to deploy
+- After Phase 4: Security properties (non-root, no dev deps) are explicitly verified
+- After Phase 5: Build efficiency (layer caching) is confirmed by automated check
+- After Phase 6: Script is lint-clean, ready for CI integration
--- a/specs/011-ui-prod-dockerfile/checklists/requirements.md
+++ b/specs/011-ui-prod-dockerfile/checklists/requirements.md
@@ -0,0 +1,34 @@
+# Specification Quality Checklist: Production-Grade UI Container Image
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-05-07
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [X] No implementation details (languages, frameworks, APIs)
+- [X] Focused on user value and business needs
+- [X] Written for non-technical stakeholders
+- [X] All mandatory sections completed
+
+## Requirement Completeness
+
+- [X] No [NEEDS CLARIFICATION] markers remain
+- [X] Requirements are testable and unambiguous
+- [X] Success criteria are measurable
+- [X] Success criteria are technology-agnostic (no implementation details)
+- [X] All acceptance scenarios are defined
+- [X] Edge cases are identified
+- [X] Scope is clearly bounded
+- [X] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [X] All functional requirements have clear acceptance criteria
+- [X] User scenarios cover primary flows
+- [X] Feature meets measurable outcomes defined in Success Criteria
+- [X] No implementation details leak into specification
+
+## Notes
+
+- All items pass. Spec is ready for `/speckit-plan`.
--- a/specs/011-ui-prod-dockerfile/contracts/container.md
+++ b/specs/011-ui-prod-dockerfile/contracts/container.md
@@ -0,0 +1,90 @@
+# Container Interface Contract: UI Production Image
+
+## Image Identity
+
+| Property    | Value                        |
+|-------------|------------------------------|
+| Image name  | `reactbin-ui-prod`           |
+| Runtime     | nginx-unprivileged (Alpine)  |
+| Listen port | `8080`                       |
+| Run user    | non-root (UID ≠ 0)           |
+
+## Runtime Inputs
+
+### Environment Variables
+
+The UI container is a static file server. It has **no required environment variables at runtime** — all configuration is compiled into the static assets at build time by the Angular build toolchain.
+
+> Note: The API base URL is baked in at build time via Angular's environment configuration. A future iteration may introduce runtime environment injection via a served `config.json`, but this is out of scope for v1.
+
+## Runtime Outputs
+
+### HTTP Interface
+
+| Route pattern      | Behaviour                                                         |
+|--------------------|-------------------------------------------------------------------|
+| `/`                | Returns `index.html` with HTTP 200                               |
+| `/` (any SPA path) | Returns `index.html` with HTTP 200 (SPA fallback via `try_files`)|
+| `/main.*.js`       | Returns fingerprinted JS bundle with long-lived cache headers     |
+| `/styles.*.css`    | Returns fingerprinted CSS with long-lived cache headers           |
+| `/assets/*`        | Returns static assets                                             |
+| Any path not found | Returns `index.html` with HTTP 200 (Angular router handles 404)  |
+
+### Cache Headers
+
+| Asset type                          | Cache-Control header                          |
+|-------------------------------------|-----------------------------------------------|
+| Fingerprinted bundles (`.js`, `.css`, fonts) | `public, max-age=31536000, immutable` |
+| `index.html`                        | `no-store, no-cache, must-revalidate`         |
+
+### Process Exit
+
+| Signal   | Expected exit code | Maximum wait |
+|----------|--------------------|--------------|
+| SIGTERM  | 0                  | 30 seconds   |
+| SIGKILL  | non-zero           | immediate    |
+
+## Health Check
+
+| Property        | Value                          |
+|-----------------|--------------------------------|
+| Command         | `wget -qO- http://localhost:8080/` |
+| Interval        | 30 seconds                     |
+| Timeout         | 5 seconds                      |
+| Start period    | 15 seconds                     |
+| Retries         | 3                              |
+
+The health check passes when nginx responds with any 2xx status on the root path.
+
+## Image Constraints
+
+| Constraint              | Requirement                                   |
+|-------------------------|-----------------------------------------------|
+| Node.js runtime present | MUST NOT be present in runtime image          |
+| `node_modules/` present | MUST NOT be present in runtime image          |
+| Source TypeScript files | MUST NOT be present in runtime image          |
+| Secrets in layer history| MUST NOT appear in any `docker history` layer |
+| Run as root             | MUST NOT — process UID MUST be non-zero       |
+
+## Build Interface
+
+| Property        | Value                                        |
+|-----------------|----------------------------------------------|
+| Dockerfile path | `ui/Dockerfile.prod`                         |
+| Build context   | `ui/` directory                              |
+| Build command   | `docker build -f ui/Dockerfile.prod ui/ -t reactbin-ui-prod:latest` |
+
+### Build Context Exclusions (`.dockerignore`)
+
+The following MUST be excluded from the build context to keep transfers fast and avoid leaking dev state:
+
+- `node_modules/` — always rebuilt via `npm ci` in the build stage
+- `dist/` — always rebuilt; must not pollute the build stage
+- `.git/` — not needed for build
+- `*.spec.ts` — test files not compiled into production output
+- `.env*` — dev environment files
+- `src/**/*.spec.ts` — test specs
+
+## Verification
+
+The contract is verified end-to-end by `ui/tests/build/verify_production_image.sh`. Running `make verify-ui-prod` MUST pass all contract checks.
--- a/specs/011-ui-prod-dockerfile/plan.md
+++ b/specs/011-ui-prod-dockerfile/plan.md
@@ -0,0 +1,152 @@
+# Implementation Plan: Production-Grade UI Container Image
+
+**Branch**: `011-ui-prod-dockerfile` | **Date**: 2026-05-07 | **Spec**: [spec.md](spec.md)
+**Input**: Feature specification from `specs/011-ui-prod-dockerfile/spec.md`
+
+## Summary
+
+Build a production-grade multi-stage Docker image for the Angular UI. A `node:22-slim` build stage compiles the Angular app into static assets; an `nginxinc/nginx-unprivileged:alpine` runtime stage serves those assets on port 8080 as a non-root user with SPA fallback routing, long-lived cache headers for fingerprinted bundles, and clean SIGTERM handling. The image is verified by a TDD shell script that covers all three user stories (reliable service, security, build caching) in one `make verify-ui-prod` run.
+
+## Technical Context
+
+**Language/Version**: Node.js 22 (build stage); no runtime language in the final image
+**Primary Dependencies**: Angular CLI 19 (`npm run build`); nginx-unprivileged (runtime web server)
+**Storage**: None — container serves pre-compiled static files
+**Testing**: `ui/tests/build/verify_production_image.sh` (shell script TDD artefact, same pattern as `api/tests/build/verify_production_image.sh`)
+**Target Platform**: Linux container (amd64); Docker 23+ with BuildKit enabled (default); `--mount=type=cache` used for npm cache layer
+**Project Type**: Static file server (SPA)
+**Performance Goals**: Cold build < 3 minutes; warm (source-only) rebuild < 30 seconds; health check response < 500ms
+**Constraints**: Non-root process (UID ≠ 0); Node.js absent from runtime image; no secrets in image layers
+**Scale/Scope**: Single container; no horizontal scaling concerns at this stage
+
+## Constitution Check
+
+### Pre-research gates
+
+| Principle | Requirement | Status |
+|-----------|-------------|--------|
+| §5.1 TDD | Failing test (verify script) must exist before `Dockerfile.prod` | ✅ Plan includes TDD-first task ordering |
+| §5.3 Tests next to code | `ui/tests/build/` mirrors `api/tests/build/` | ✅ Correct location |
+| §5.4 CI before done | All tasks marked done only after verify passes | ✅ Enforced in task ordering |
+| §7.1 One-command start | `docker compose up` must still work | ✅ Only adds prod Dockerfile; dev Dockerfile unchanged |
+| §7.2 Env config | No hardcoded credentials in Dockerfile | ✅ No runtime env vars needed; build-time config via Angular environment files |
+| §7.3 Linting | shellcheck on verify script | ✅ T011 in task plan |
+| §8 Scope | Server-side rendering, OIDC, multi-user — not addressed | ✅ Spec scoped to static asset serving only |
+
+**No violations. All gates pass.**
+
+### Post-design re-check
+
+Same gates apply. No design decisions introduced in Phase 1 conflict with the constitution.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/011-ui-prod-dockerfile/
+├── plan.md           ← this file
+├── research.md       ← technology decisions (10 decisions)
+├── contracts/
+│   └── container.md  ← container interface contract
+├── quickstart.md     ← build and verify scenarios
+└── tasks.md          ← generated by /speckit-tasks
+```
+
+### Source Code Changes
+
+```text
+ui/
+├── Dockerfile.prod             ← NEW (multi-stage production build)
+├── nginx.conf                  ← NEW (SPA routing + cache headers)
+├── .dockerignore               ← NEW (does not exist yet; created for production build)
+└── tests/
+    └── build/
+        ├── .gitkeep            ← NEW (track directory in git)
+        └── verify_production_image.sh  ← NEW (TDD verification script)
+
+Makefile                        ← MODIFIED (add build-ui-prod, verify-ui-prod targets)
+```
+
+## Dockerfile Design
+
+### Stage 1 — Builder (`node:22-slim`)
+
+```
+COPY package.json package-lock.json ./   # layer: deps (cached until lockfile changes)
+RUN --mount=type=cache,target=/root/.npm npm ci  # reproducible install; npm cache mounted
+COPY . .                                 # layer: source (invalidated on every change)
+RUN npm run build                        # ng build --configuration production
+```
+
+Output of `npm run build`: `dist/reactbin-ui/browser/` (confirmed: Angular 19 application builder creates `browser/` subdirectory under `outputPath`).
+
+### Stage 2 — Runtime (`nginxinc/nginx-unprivileged:alpine`)
+
+- Runs as non-root by design (no manual `useradd` needed)
+- Listens on port 8080
+- `COPY --from=builder /app/dist/reactbin-ui/browser /usr/share/nginx/html`
+- `COPY nginx.conf /etc/nginx/conf.d/default.conf`
+- HEALTHCHECK via `wget` (curl not present in Alpine nginx-unprivileged)
+- No CMD override needed — the base image entrypoint starts nginx
+
+### nginx.conf
+
+```nginx
+server {
+    listen 8080;
+
+    root /usr/share/nginx/html;
+    index index.html;
+
+    # SPA fallback — unmatched paths return app shell
+    location / {
+        try_files $uri $uri/ /index.html;
+    }
+
+    # Long-lived cache for fingerprinted assets
+    location ~* \.(js|css|woff2?|ttf|eot|svg|png|jpg|jpeg|gif|ico)$ {
+        expires 1y;
+        add_header Cache-Control "public, immutable";
+    }
+
+    # Never cache the entry point
+    location = /index.html {
+        add_header Cache-Control "no-store, no-cache, must-revalidate";
+    }
+}
+```
+
+## Verification Script Design (`ui/tests/build/verify_production_image.sh`)
+
+Mirrors `api/tests/build/verify_production_image.sh` structure:
+
+| Check | Story | Description |
+|-------|-------|-------------|
+| Build | US1 | `docker build -f ui/Dockerfile.prod ui/` succeeds |
+| Health endpoint | US1 | `wget -q http://localhost:18080/` returns 200 within 30s |
+| SPA routing | US1 | `curl http://localhost:18080/library` returns 200 |
+| Graceful shutdown | US1 | `docker stop` → exit code 0 |
+| Non-root user | US2 | `docker exec id -u` ≠ 0 |
+| Node.js absent | US2 | `docker run node --version` exits non-zero |
+| No secrets in history | US2 | `docker history --no-trunc` contains no secret-like strings |
+| Dep layer cache hit | US3 | `touch ui/src/app/app.component.ts` + rebuild → output contains `CACHED` |
+
+## Makefile Additions
+
+```makefile
+build-ui-prod:
+    docker build -f ui/Dockerfile.prod ui/ -t reactbin-ui-prod:latest
+
+verify-ui-prod:
+    bash ui/tests/build/verify_production_image.sh
+```
+
+## Dependencies & Risks
+
+| Item | Risk | Mitigation |
+|------|------|------------|
+| `dist/reactbin-ui/browser/` path | If Angular changes the output directory structure in a future version, the COPY path breaks | Path is verified in research; a test build during verify catches drift |
+| `nginxinc/nginx-unprivileged` UID | UID may vary between image versions | Check is `UID ≠ 0`, not a specific UID value |
+| `wget` availability | Alpine images may change toolset | HEALTHCHECK is tested as part of US1 verify |
+| Port 18080 collision | Another process may use 18080 during verify | Acceptable risk for a dev-time test; port is not a system service |
--- a/specs/011-ui-prod-dockerfile/quickstart.md
+++ b/specs/011-ui-prod-dockerfile/quickstart.md
@@ -0,0 +1,100 @@
+# Quickstart: UI Production Image
+
+## Prerequisites
+
+- Docker with BuildKit enabled (default in Docker 23+)
+- `make` available in the shell
+
+## Build the Image
+
+```bash
+make build-ui-prod
+# Equivalent: docker build -f ui/Dockerfile.prod ui/ -t reactbin-ui-prod:latest
+```
+
+Expected: Build completes in ~2 minutes on first run (npm install), ~15 seconds on subsequent source-only changes.
+
+## Run the Container
+
+```bash
+docker run --rm -p 4200:8080 reactbin-ui-prod:latest
+```
+
+Open http://localhost:4200 — the app shell loads. Navigate to `/library` or `/tags` — the page loads (SPA routing returns `index.html`).
+
+## Verify All Production Checks
+
+```bash
+make verify-ui-prod
+```
+
+This runs `ui/tests/build/verify_production_image.sh`, which exercises all three user stories:
+
+```
+[verify] Building reactbin-ui-prod:verify-<PID>...
+[verify] Build OK
+[verify] Polling health endpoint...
+[verify] Health check passed
+[verify] SPA routing OK (/library → 200)
+[verify] Non-root user OK (UID <n>)
+[verify] Stdout logging OK
+[verify] Graceful shutdown OK (exit 0)
+[verify] Node.js absent in runtime image OK
+[verify] No secrets in image layers OK
+[verify] Dep layer cache hit confirmed (US3 OK)
+[verify] All checks passed (US1 + US2 + US3).
+```
+
+## Integration Test Scenarios
+
+### Scenario 1: Initial Build (Cold Cache)
+
+```bash
+docker rmi reactbin-ui-prod:latest 2>/dev/null || true
+make build-ui-prod
+```
+
+Expected: `npm ci` runs fully (~30–90s depending on network). All packages installed from lockfile.
+
+### Scenario 2: Source-Only Rebuild (Warm Cache)
+
+```bash
+touch ui/src/app/app.component.ts
+make build-ui-prod
+```
+
+Expected: `npm ci` step is CACHED (skipped). Only the Angular compilation runs (~10–20s).
+
+### Scenario 3: Dependency Change (Cache Invalidation)
+
+```bash
+# Simulate a lockfile change
+touch ui/package-lock.json
+make build-ui-prod
+```
+
+Expected: `npm ci` runs fresh (cache miss is intentional and correct).
+
+### Scenario 4: SPA Deep-Link Routing
+
+```bash
+docker run --rm -d -p 4200:8080 --name ui-test reactbin-ui-prod:latest
+curl -sf http://localhost:4200/library        # 200 + index.html
+curl -sf http://localhost:4200/tags           # 200 + index.html
+curl -sf http://localhost:4200/nonexistent    # 200 + index.html (Angular handles 404)
+docker stop ui-test
+```
+
+### Scenario 5: Non-Root Assertion
+
+```bash
+docker run --rm reactbin-ui-prod:latest id
+# Must NOT output uid=0(root)
+```
+
+### Scenario 6: No Node.js in Runtime Image
+
+```bash
+docker run --rm reactbin-ui-prod:latest node --version 2>&1
+# Must exit non-zero (node not found)
+```
--- a/specs/011-ui-prod-dockerfile/research.md
+++ b/specs/011-ui-prod-dockerfile/research.md
@@ -0,0 +1,69 @@
+# Research: Production-Grade UI Container Image
+
+## Decision 1: Build-stage base image
+
+**Decision**: `node:22-slim`
+**Rationale**: Matches the version in the existing dev `ui/Dockerfile`. Slim variant reduces the builder layer size and attack surface relative to the full Debian image.
+**Alternatives considered**: `node:22-alpine` — lighter, but can introduce musl/glibc compatibility issues with some native npm packages; `node:22-bookworm-slim` — functionally equivalent to `node:22-slim`, same image.
+
+## Decision 2: Runtime base image
+
+**Decision**: `nginxinc/nginx-unprivileged:alpine`
+**Rationale**: Runs fully as a non-root user on port 8080 out of the box — no manual user creation or privilege workarounds required. Alpine-based keeps the final image small. The official `nginx:alpine` image requires the master process to run as root to bind port 80; `nginx-unprivileged` avoids this by binding to 8080 instead.
+**Alternatives considered**:
+- `nginx:alpine` — master process must be root (violates FR-005); workers run as `nginx` user but `id -u` inside container still shows 0 for PID 1.
+- `caddy:alpine` — also supports non-root but adds Caddy's Go runtime footprint unnecessarily for pure static serving.
+
+## Decision 3: Container port
+
+**Decision**: Expose port `8080` in the container; external orchestrators (docker-compose, Kubernetes ingress) map it to port 80 or 4200 as needed.
+**Rationale**: `nginxinc/nginx-unprivileged` defaults to port 8080; deviating would require overriding nginx config with no benefit. Port remapping is standard practice — containers should not run as root just to bind to a privileged port.
+**Alternatives considered**: Running nginx on port 80 requires either root or Linux capabilities (`CAP_NET_BIND_SERVICE`), both of which increase the attack surface.
+
+## Decision 4: Angular build output directory
+
+**Decision**: COPY `dist/reactbin-ui/browser/` into the nginx document root.
+**Rationale**: The Angular 19 `@angular-devkit/build-angular:application` builder (esbuild-based) places browser assets in `dist/{projectName}/browser/` — confirmed by inspecting the existing `dist/reactbin-ui/browser/` directory in the repo. The parent `dist/reactbin-ui/` also contains `prerendered-routes.json` and `3rdpartylicenses.txt` which must not be served as the web root.
+**Alternatives considered**: Serving from `dist/reactbin-ui/` directly — would expose the `3rdpartylicenses.txt` file at the root and include the prerendering metadata file.
+
+## Decision 5: Dependency install command
+
+**Decision**: `npm ci` (not `npm install`)
+**Rationale**: `npm ci` installs exactly what `package-lock.json` specifies — reproducible, faster on CI, and fails loudly on lockfile mismatches. All dependencies (including `devDependencies`) are needed in the build stage because Angular CLI and build tools are `devDependencies`.
+**Alternatives considered**: `npm install` — non-deterministic across environments; `npm install --omit=dev` — would break the Angular build since `@angular/cli` is a devDependency.
+
+## Decision 6: Layer cache strategy
+
+**Decision**: Two COPY layers — lockfiles first, then source.
+```
+COPY package.json package-lock.json ./   # invalidated only on dep changes
+RUN npm ci                               # expensive step, cached when lockfiles unchanged
+COPY . .                                 # invalidated on every source change
+RUN npm run build
+```
+**Rationale**: Mirrors the proven pattern used in the API's `Dockerfile.prod`. Dependency installation (30s–2min) is cached independently from source compilation.
+**Alternatives considered**: Single COPY of all source — trivial source changes would always re-run `npm ci`.
+
+## Decision 7: SPA routing
+
+**Decision**: nginx `try_files $uri $uri/ /index.html` fallback in a custom `nginx.conf`.
+**Rationale**: Angular is a single-page application. All non-asset routes (e.g., `/library`, `/tags`, `/login`) must return `index.html` so Angular's router can handle them client-side. Without this, direct navigation to any deep link returns 404.
+**Alternatives considered**: Redirect to `/` — would break deep linking; returning 404 — breaks client-side routing entirely.
+
+## Decision 8: Cache-control headers
+
+**Decision**: Long-lived `Cache-Control: public, max-age=31536000, immutable` for fingerprinted JS/CSS/font assets; `Cache-Control: no-store` for `index.html`.
+**Rationale**: Angular's production build fingerprints all bundles (e.g., `main.a1b2c3d4.js`). These are safe to cache indefinitely. `index.html` is never fingerprinted and must always be fresh so users pick up new deployments.
+**Alternatives considered**: No cache-control headers — acceptable for MVP but fails FR-008.
+
+## Decision 9: Health check probe
+
+**Decision**: Use `wget -qO- http://localhost:8080/` as the HEALTHCHECK command (no `curl` in `nginx-unprivileged:alpine`).
+**Rationale**: The `nginxinc/nginx-unprivileged:alpine` image is minimal and does not include `curl`. `wget` is available in Alpine. The health check tests that nginx is accepting connections and returning the app shell.
+**Alternatives considered**: Installing `curl` via `apk add` — adds package manager overhead and unnecessary tooling to the runtime image.
+
+## Decision 10: TDD verification approach
+
+**Decision**: Shell script `ui/tests/build/verify_production_image.sh` mirrors the approach used for the API in feature 010.
+**Rationale**: There is no pytest equivalent for Docker build artifacts. A shell script that fails because `Dockerfile.prod` does not exist satisfies §5.1 TDD (the script is the failing test; writing the Dockerfile turns it green).
+**Alternatives considered**: No TDD — violates §5.1; a Python test with subprocess — overkill when a shell script is simpler and already proven.
--- a/specs/011-ui-prod-dockerfile/spec.md
+++ b/specs/011-ui-prod-dockerfile/spec.md
@@ -0,0 +1,110 @@
+# Feature Specification: Production-Grade UI Container Image
+
+**Feature Branch**: `011-ui-prod-dockerfile`
+**Created**: 2026-05-07
+**Status**: Draft
+**Input**: User description: "Production-grade UI container image build"
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - UI Serves Reliably in Production (Priority: P1)
+
+A production deployment starts the UI container and it serves the compiled application correctly — returning the app shell for all routes, responding quickly, and shutting down cleanly when the orchestrator stops it.
+
+**Why this priority**: A container that can't serve traffic is not deployable. All other properties (security, build speed) are meaningless without a running service.
+
+**Independent Test**: Build the image, start the container, and verify the root path returns a 200 response. Stopping the container produces a clean exit. This alone constitutes a deployable MVP.
+
+**Acceptance Scenarios**:
+
+1. **Given** a built production image, **When** the container starts, **Then** it serves the application on port 8080 within 30 seconds.
+2. **Given** the container is running, **When** a request is made to any client-side route (e.g., `/library`, `/tags`), **Then** the server returns the app shell (200 OK) so client-side routing can take over.
+3. **Given** the container is running, **When** a static asset is requested, **Then** it is returned with appropriate caching headers.
+4. **Given** a running container, **When** the orchestrator sends a stop signal, **Then** the container exits with code 0 within a reasonable timeout.
+5. **Given** the production image, **When** a health probe is issued to a designated endpoint, **Then** the container reports healthy.
+
+---
+
+### User Story 2 - Minimal, Secure Container (Priority: P2)
+
+The production image contains only what is needed to serve static files — no build tools, no source code, no `node_modules`. It runs as a non-privileged user.
+
+**Why this priority**: Shipping build tools and source code in production images increases attack surface and image size. Running as root violates least-privilege principles.
+
+**Independent Test**: Inspect the running container — confirm the process user is non-root; attempt to import or run a Node.js binary inside the image and confirm it is absent.
+
+**Acceptance Scenarios**:
+
+1. **Given** the production image, **When** the running process user is inspected, **Then** it is not root (UID ≠ 0).
+2. **Given** the production image, **When** the image contents are inspected, **Then** `node_modules/`, source TypeScript files, and the Node.js runtime are absent.
+3. **Given** the production image, **When** image layer history is inspected, **Then** no secrets, API keys, or credentials appear in any layer command.
+4. **Given** the production image, **When** the image size is measured, **Then** it is substantially smaller than a single-stage image that includes the Node.js toolchain.
+
+---
+
+### User Story 3 - Fast, Reproducible Builds (Priority: P3)
+
+Rebuilding the image after a source-only change (no dependency changes) reuses the dependency installation layer from cache, completing in seconds rather than minutes.
+
+**Why this priority**: Slow builds impede the development feedback loop and CI pipeline throughput. Dependency installs are the dominant time cost.
+
+**Independent Test**: Build once, then change a source file and build again — the build output confirms the dependency layer was served from cache.
+
+**Acceptance Scenarios**:
+
+1. **Given** the image has been built once, **When** only a source file is changed and the image is rebuilt, **Then** the dependency installation step is skipped (cache hit).
+2. **Given** a dependency file is changed, **When** the image is rebuilt, **Then** the dependency installation step runs fresh (cache miss is correct behaviour).
+3. **Given** two successive builds with identical inputs, **Then** both produce functionally identical output.
+
+---
+
+### Edge Cases
+
+- What happens when the container starts but the built assets are missing or corrupted?
+- How does the server handle requests for non-existent routes that should fall back to the app shell (SPA routing)?
+- What happens when the container receives a stop signal while actively serving requests?
+- What happens if the port is already in use at startup?
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The production image MUST be built via a multi-stage process — a build stage compiles the application into static assets, and a separate runtime stage serves only those assets.
+- **FR-002**: The runtime stage MUST NOT contain the Node.js runtime, npm, source TypeScript, or `node_modules/`.
+- **FR-003**: The container MUST serve the application on port 8080. External orchestrators (docker-compose, Kubernetes ingress) map this to port 80 as needed.
+- **FR-004**: The container MUST handle SPA (single-page application) routing by returning the app shell for any unmatched path, so client-side routing works correctly.
+- **FR-005**: The container MUST run as a non-root user.
+- **FR-006**: The container MUST expose a health-check endpoint that returns success when the service is ready to accept traffic.
+- **FR-007**: The container MUST exit with code 0 when sent a graceful stop signal.
+- **FR-008**: Static assets MUST be served with cache-control headers that enable client-side caching for fingerprinted assets.
+- **FR-009**: The Dockerfile MUST structure layers so that dependency installation is cached independently from source code changes.
+- **FR-010**: The build MUST be reproducible — given the same source and lockfile, successive builds produce equivalent images.
+- **FR-011**: No credentials, secrets, or API keys MUST appear in any image layer.
+
+### Key Entities
+
+- **Build Stage**: The intermediate container that installs dependencies and compiles source into static assets; discarded after build.
+- **Static Assets**: The compiled output (HTML, JS bundles, CSS, fonts, images) that the runtime stage serves.
+- **Runtime Stage**: The minimal final image containing only a web server and the compiled static assets.
+- **Production Image**: The tagged, distributable image produced by the build; used directly in deployment.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: The container serves a 200 response on port 8080 within 30 seconds of starting.
+- **SC-002**: The production image is substantially smaller than a single-stage image that retains the Node.js toolchain. A manual size comparison after the initial build confirms the multi-stage approach delivers a meaningful reduction (expected: >60% reduction).
+- **SC-003**: A source-only rebuild completes in under 30 seconds (dependency layer served from cache).
+- **SC-004**: All 11 functional requirements pass automated verification on every build.
+- **SC-005**: The running container process has UID ≠ 0, confirmed by automated check.
+- **SC-006**: No existing integration tests regress after the Dockerfile and supporting files are introduced.
+
+## Assumptions
+
+- The Angular application is built for production using the standard build toolchain (`ng build --configuration production` or equivalent), producing a `dist/` output directory.
+- The production web server is responsible for SPA fallback routing (returning the app shell for unmatched paths).
+- Gzip or Brotli compression at the web server layer is desirable but not mandatory for the initial implementation.
+- The UI container does not need to proxy API requests — it communicates with the API directly from the browser (the Angular proxy config is only used in local development).
+- The container listens on port 8080 (non-privileged, enabling non-root operation). External load balancers or ingress controllers map this to port 80. TLS termination occurs upstream.
+- The build context is the `ui/` directory; files excluded from the build context (source maps in CI, `node_modules/` already present locally) are managed via `.dockerignore`.
+- The same verification approach used for the API image (a shell script as the TDD artefact) applies here.
--- a/specs/011-ui-prod-dockerfile/tasks.md
+++ b/specs/011-ui-prod-dockerfile/tasks.md
@@ -0,0 +1,166 @@
+# Tasks: Production-Grade UI Container Image
+
+**Input**: Design documents from `specs/011-ui-prod-dockerfile/`
+**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, contracts/container.md ✅, quickstart.md ✅
+
+**Tests**: TDD is non-negotiable (§5.1). The "test" for a Docker build artefact is `ui/tests/build/verify_production_image.sh`, written before `ui/Dockerfile.prod` exists. Running the script immediately fails (red) because the build step cannot find the file; writing `Dockerfile.prod` turns it green.
+
+**Organization**: Phase 1 sets up Makefile targets, `.dockerignore`, and supporting files; Phase 3 (US1) writes the verification script and the Dockerfile; Phase 4 (US2) extends the script with security checks; Phase 5 (US3) extends it with a cache-hit check; Phase 6 polishes.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel with other [P] tasks in the same phase
+- **[Story]**: Which user story this task belongs to
+- Exact file paths included in every task description
+
+---
+
+## Phase 1: Setup
+
+- [X] T001 Add `build-ui-prod` and `verify-ui-prod` targets (and their `.PHONY` entries) to the root `Makefile` at `/workspace/Makefile`: `build-ui-prod` runs `docker build -f ui/Dockerfile.prod ui/ -t reactbin-ui-prod:latest`; `verify-ui-prod` runs `bash ui/tests/build/verify_production_image.sh`
+
+- [X] T002 Create `ui/.dockerignore` at `/workspace/ui/.dockerignore` with the following exclusions (the file does not yet exist — create it fresh): `.git/`, `node_modules/`, `dist/`, `.angular/`, `coverage/`, `*.spec.ts`, `.env`, `.env.*`, `!.env.example`, `tests/`; these keep the build context transfer fast and prevent dev state from leaking into the production image
+
+- [X] T003 Create directory `ui/tests/build/` at `/workspace/ui/tests/build/` with `mkdir -p` and add a `.gitkeep` so the directory is tracked in git
+
+---
+
+**Checkpoint**: Directory structure is ready; Makefile and .dockerignore are created.
+
+---
+
+## Phase 2: Foundational
+
+No blocking foundational prerequisites exist for this feature — the setup tasks in Phase 1 directly enable all user story phases. Phase 2 is intentionally omitted.
+
+---
+
+## Phase 3: User Story 1 — UI Serves Reliably in Production (Priority: P1) 🎯 MVP
+
+**Goal**: The container builds, starts, serves the health endpoint and SPA routes, and exits cleanly on SIGTERM.
+
+**Independent Test**: `make verify-ui-prod` — passes when `Dockerfile.prod` and `nginx.conf` exist and all US1 checks pass.
+
+### Test for User Story 1 (TDD red — write first, confirm failure before T005)
+
+- [X] T004 [US1] Create `ui/tests/build/verify_production_image.sh` as an executable bash script (`chmod +x`) with `#!/usr/bin/env bash` and `set -euo pipefail`; the script MUST:
+  1. Set `IMAGE="reactbin-ui-prod:verify-$$"` and `IMAGE2="reactbin-ui-prod:verify-cache-$$"` and `APP_CONTAINER=""`;
+  2. Define a `cleanup()` function that runs `docker rm -f "$APP_CONTAINER" 2>/dev/null || true`, `docker rmi "$IMAGE" 2>/dev/null || true`, and `docker rmi "$IMAGE2" 2>/dev/null || true`, then register it with `trap cleanup EXIT`;
+  3. **[US1 check 1 — build]** Run `docker build -f ui/Dockerfile.prod ui/ -t "$IMAGE"` — this is the line that fails **red** because `ui/Dockerfile.prod` does not yet exist; print `[verify] Building $IMAGE...` before and `[verify] Build OK` after;
+  4. **[US1 check 2 — start container]** Start the production container: `APP_CONTAINER=$(docker run -d -p 18080:8080 "$IMAGE")`; print `[verify] Starting production container...`;
+  5. **[US1 check 3 — health endpoint]** Poll `curl -sf http://localhost:18080/` up to 30 × 1s, fail with `FAIL: health check timed out after 30s` if timeout; print `[verify] Health check passed` on success;
+  6. **[US1 check 4 — SPA routing]** Run `curl -sf http://localhost:18080/library > /dev/null`; assert exit code is 0 (200 response); fail with `FAIL: SPA routing check failed (/library did not return 200)` if violated; print `[verify] SPA routing OK (/library → 200)`;
+  7. **[US1 check 5 — SIGTERM → exit 0]** Run `docker stop "$APP_CONTAINER"` (sends SIGTERM); capture `EXIT_CODE=$(docker wait "$APP_CONTAINER")`; assert `"$EXIT_CODE" -eq 0`, fail with `FAIL: non-zero exit code $EXIT_CODE after SIGTERM` otherwise; print `[verify] Graceful shutdown OK (exit $EXIT_CODE)`;
+  8. Print `[verify] US1 checks passed.`
+  After writing the script, run `make verify-ui-prod` and confirm it **fails** with a Docker build error (red state — `ui/Dockerfile.prod` does not exist).
+
+### Implementation for User Story 1
+
+- [X] T005 [US1] Create `ui/nginx.conf` at `/workspace/ui/nginx.conf` — an nginx server block that: listens on port `8080`; sets `root /usr/share/nginx/html` and `index index.html`; adds a `location /` block with `try_files $uri $uri/ /index.html` for SPA fallback routing; adds a `location ~* \.(js|css|woff2?|ttf|eot|svg|png|jpg|jpeg|gif|ico)$` block with `expires 1y` and `add_header Cache-Control "public, immutable"` for fingerprinted assets; adds a `location = /index.html` block with `add_header Cache-Control "no-store, no-cache, must-revalidate"` so the entry point is never cached
+
+- [X] T006 [US1] Create `ui/Dockerfile.prod` at `/workspace/ui/Dockerfile.prod` — a two-stage multi-stage build:
+  **Stage 1 (builder)**: `FROM node:22-slim AS builder`; `WORKDIR /app`; `COPY package.json package-lock.json ./`; `RUN --mount=type=cache,target=/root/.npm npm ci`; `COPY . .`; `RUN npm run build`
+  **Stage 2 (runtime)**: `FROM nginxinc/nginx-unprivileged:alpine`; `COPY --from=builder /app/dist/reactbin-ui/browser /usr/share/nginx/html`; `COPY nginx.conf /etc/nginx/conf.d/default.conf`; `EXPOSE 8080`; `HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 CMD wget -qO- http://localhost:8080/ || exit 1`
+
+- [X] T007 [US1] Verify TDD green for US1: run `make verify-ui-prod` and confirm all five US1 checks pass — build OK, health endpoint returns 200, SPA routing returns 200, SIGTERM produces exit code 0, and `[verify] US1 checks passed.` is printed.
+
+**Checkpoint**: US1 is complete. Production container builds, starts, serves traffic (including SPA routes), and shuts down gracefully.
+
+---
+
+## Phase 4: User Story 2 — Minimal, Secure Container (Priority: P2)
+
+**Goal**: The production image runs as non-root and contains no Node.js runtime, source, or embedded secrets.
+
+**Independent Test**: US2 checks in `make verify-ui-prod` — the same script extended with non-root, node-absent, and secrets-free assertions.
+
+### Tests for User Story 2 (TDD extension — add checks, confirm they pass against existing Dockerfile.prod)
+
+- [X] T008 [US2] Extend `ui/tests/build/verify_production_image.sh` with US2 checks inserted after the health/SPA/SIGTERM checks (before the final `US1 checks passed` line) and update the final success message to `[verify] All checks passed (US1 + US2).`:
+  **[US2 check 1 — non-root]** Before `docker stop`, run `UID_IN_CONTAINER=$(docker exec "$APP_CONTAINER" id -u)`; assert `"$UID_IN_CONTAINER" -ne 0`, fail with `FAIL: process running as root (UID 0)` if violated; print `[verify] Non-root user OK (UID $UID_IN_CONTAINER)`;
+  **[C1 — stdout log capture]** Run `LOGS=$(docker logs "$APP_CONTAINER" 2>&1)`; assert `"$LOGS"` is non-empty, fail with `FAIL: no output on stdout/stderr` if empty; print `[verify] Stdout logging OK`; insert this check before `docker stop`;
+  **[US2 check 2 — Node.js absent]** After SIGTERM cleanup, run `docker run --rm "$IMAGE" node --version 2>/dev/null`; assert the exit code is **non-zero** (node not present in runtime image); if it returns 0, fail with `FAIL: node runtime found in production image`; print `[verify] Node.js absent in runtime image OK`;
+  **[C2 — no hardcoded secrets in layers]** Run `docker history --no-trunc "$IMAGE" 2>&1`; pipe through `grep -qiE "(password|secret_key|api_key|token)"`; assert zero matching lines; if any match, fail with `FAIL: potential secret found in image history`; print `[verify] No secrets in image layers OK`;
+  **[FR-008 — cache-control headers on assets]** While APP_CONTAINER is running, find the first JS bundle filename: `JS_FILE=$(docker run --rm "$IMAGE" ls /usr/share/nginx/html | grep -E '\.js$' | head -1)`; run `curl -sI "http://localhost:18080/${JS_FILE}"`; assert the response contains `Cache-Control` with `immutable` or `max-age=31536000`, fail with `FAIL: cache-control header not set on fingerprinted asset` if absent; print `[verify] Cache-Control header OK`;
+  Confirm `make verify-ui-prod` passes with the extended checks.
+
+**Checkpoint**: US2 is verified. Image runs as a non-root user and contains no Node.js toolchain.
+
+---
+
+## Phase 5: User Story 3 — Fast, Reproducible Builds (Priority: P3)
+
+**Goal**: Rebuilding after a source-only change reuses the `npm ci` dependency layer from cache.
+
+**Independent Test**: US3 check in `make verify-ui-prod` — a second build after touching a source file asserts the dep layer was cached.
+
+### Tests for User Story 3 (TDD extension)
+
+- [X] T009 [US3] Extend `ui/tests/build/verify_production_image.sh` with a US3 cache check appended after all other checks (before the final success line):
+  **[US3 check — dep layer cached on source-only rebuild]** Print `[verify] Testing cache hit on source-only rebuild...`; `touch ui/src/app/app.component.ts`; capture `BUILD2_OUTPUT=$(docker build --progress=plain -f ui/Dockerfile.prod ui/ -t "$IMAGE2" 2>&1)` (the `--progress=plain` flag ensures consistent `CACHED` output regardless of Docker version or TTY); assert the output contains the string `CACHED`; if absent, fail with `FAIL: dependency layer not reused on source-only rebuild`; print `[verify] Dep layer cache hit confirmed (US3 OK)`;
+  Update the final success line to `[verify] All checks passed (US1 + US2 + US3).`
+
+- [X] T010 [US3] Verify TDD green for US3: run `make verify-ui-prod` and confirm the full script passes including the cache check — the build output for the second image must contain `CACHED`, and `[verify] All checks passed (US1 + US2 + US3).` must print.
+
+**Checkpoint**: All three user stories are verified end-to-end by `make verify-ui-prod`.
+
+---
+
+## Phase 6: Polish & Cross-Cutting Concerns
+
+- [X] T011 Run `make test-integration` from `/workspace` and confirm all 102 existing tests still pass — verifies that the new files (Makefile targets, ui/.dockerignore, ui/tests/build/) do not break the existing test Dockerfile build or any integration test (§5.4 regression gate)
+
+- [X] T012 Confirm image size reduction (SC-002): run `docker images reactbin-ui-prod:latest --format "{{.Size}}"` and compare against a reference single-stage image built from `FROM node:22-slim` + `npm ci` + `npm run build` to confirm the production image is substantially smaller (expected >60% reduction); document the sizes in a comment or log line
+
+- [X] T013 Run `shellcheck ui/tests/build/verify_production_image.sh` and fix any violations (common: unquoted variables, `[ ]` vs `[[ ]]`, missing `--` before arguments); also verify `make verify-ui-prod` still passes after any fixes
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Phase 1 (Setup)**: No external dependencies — start immediately
+- **Phase 3 (US1)**: Depends on Phase 1 (Makefile + .dockerignore must exist before `make verify-ui-prod` can run) and directory must exist (T003)
+- **Phase 4 (US2)**: Depends on Phase 3 (US1 script and Dockerfile must exist to extend)
+- **Phase 5 (US3)**: Depends on Phase 4 (full US2 script must exist to extend)
+- **Phase 6 (Polish)**: Depends on all prior phases; T011 before T012
+
+### Within Phase 3
+
+- T004 before T005/T006 (write test script before writing the nginx config and Dockerfile)
+- T005 and T006 can run in parallel (different files, no mutual dependency)
+- T007 after T005 and T006 (verify green after both implementation files exist)
+
+### Execution Order Summary
+
+```
+Step 1: T001 ∥ T002 ∥ T003  (setup — parallel, different files)
+Step 2: T004                 (write verification script — TDD red)
+Step 3: T005 ∥ T006          (write nginx.conf and Dockerfile.prod — parallel)
+Step 4: T007                 (verify US1 green)
+Step 5: T008                 (extend script with US2 checks, verify pass)
+Step 6: T009                 (extend script with US3 check)
+Step 7: T010                 (verify US3 green)
+Step 8: T011                 (make test-integration — regression gate)
+Step 9: T012                 (image size comparison — SC-002)
+Step 10: T013                (shellcheck polish)
+```
+
+---
+
+## Implementation Strategy
+
+### MVP (US1 — reliable production run)
+
+1. Complete T001–T003 (setup)
+2. Complete T004–T007 (core: write script → write nginx.conf + Dockerfile → verify green)
+3. **Validate**: `make verify-ui-prod` passes; `make test-integration` still passes
+4. US2 and US3 add explicit verification coverage for properties already implemented by the two-stage build
+
+### Incremental Delivery
+
+- After Phase 3: Production image builds, starts, serves traffic with SPA routing — safe to deploy
+- After Phase 4: Security properties (non-root, no Node.js runtime) are explicitly verified
+- After Phase 5: Build efficiency (npm ci layer caching) is confirmed by automated check
+- After Phase 6: Script is lint-clean, ready for CI integration
--- a/specs/012-api-docs-gate/checklists/requirements.md
+++ b/specs/012-api-docs-gate/checklists/requirements.md
@@ -0,0 +1,34 @@
+# Specification Quality Checklist: API Documentation Visibility Gate
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-05-07
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [X] No implementation details (languages, frameworks, APIs)
+- [X] Focused on user value and business needs
+- [X] Written for non-technical stakeholders
+- [X] All mandatory sections completed
+
+## Requirement Completeness
+
+- [X] No [NEEDS CLARIFICATION] markers remain
+- [X] Requirements are testable and unambiguous
+- [X] Success criteria are measurable
+- [X] Success criteria are technology-agnostic (no implementation details)
+- [X] All acceptance scenarios are defined
+- [X] Edge cases are identified
+- [X] Scope is clearly bounded
+- [X] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [X] All functional requirements have clear acceptance criteria
+- [X] User scenarios cover primary flows
+- [X] Feature meets measurable outcomes defined in Success Criteria
+- [X] No implementation details leak into specification
+
+## Notes
+
+- All items pass. Spec is ready for `/speckit-plan`.
--- a/specs/012-api-docs-gate/contracts/docs-endpoints.md
+++ b/specs/012-api-docs-gate/contracts/docs-endpoints.md
@@ -0,0 +1,40 @@
+# Contract: API Documentation Endpoints
+
+These three endpoints exist in FastAPI by default. This feature makes their availability conditional on a runtime configuration flag.
+
+## Affected Endpoints
+
+| Endpoint | Default path | Purpose |
+|----------|-------------|---------|
+| Swagger UI | `GET /docs` | Interactive browser-based API documentation |
+| ReDoc UI | `GET /redoc` | Alternative read-only API documentation |
+| OpenAPI schema | `GET /openapi.json` | Raw JSON schema of the entire API surface |
+
+## Behaviour by Flag State
+
+### `API_DOCS_ENABLED=true` (default)
+
+All three endpoints respond exactly as they did before this feature. No change.
+
+| Endpoint | Response |
+|----------|----------|
+| `GET /docs` | `200 OK` — Swagger UI HTML |
+| `GET /redoc` | `200 OK` — ReDoc UI HTML |
+| `GET /openapi.json` | `200 OK` — OpenAPI schema JSON |
+
+### `API_DOCS_ENABLED=false`
+
+All three endpoints are unregistered. Requests fall through to the framework's default 404 handler.
+
+| Endpoint | Response |
+|----------|----------|
+| `GET /docs` | `404 Not Found` |
+| `GET /redoc` | `404 Not Found` |
+| `GET /openapi.json` | `404 Not Found` |
+
+## Invariants
+
+- All other endpoints are unaffected in both flag states.
+- The `GET /api/v1/health` endpoint always returns `200 OK` regardless of the flag.
+- Internal OpenAPI schema generation (used for request/response validation) is not disabled — only the HTTP routes serving it are removed.
+- The flag is read once at application startup. A running process does not respond to live changes; a restart is required.
--- a/specs/012-api-docs-gate/plan.md
+++ b/specs/012-api-docs-gate/plan.md
@@ -0,0 +1,138 @@
+# Implementation Plan: API Documentation Visibility Gate
+
+**Branch**: `012-api-docs-gate` | **Date**: 2026-05-07 | **Spec**: [spec.md](spec.md)
+**Input**: Feature specification from `specs/012-api-docs-gate/spec.md`
+
+## Summary
+
+Add `API_DOCS_ENABLED` (boolean, default `true`) to `app/config.py`. When `false`, pass `docs_url=None`, `redoc_url=None`, `openapi_url=None` to the `FastAPI()` constructor in `app/main.py`, making all three documentation routes return 404. A field validator provides graceful fallback for invalid flag values. Two new integration tests verify both flag states; the existing unit test suite is extended with two settings tests.
+
+## Technical Context
+
+**Language/Version**: Python 3.12
+**Primary Dependencies**: FastAPI (constructor params), pydantic-settings (field validator)
+**Storage**: None
+**Testing**: pytest unit (`api/tests/unit/test_config.py`), pytest + ASGI test client (`api/tests/integration/test_docs_gate.py`)
+**Target Platform**: API container (same as existing)
+**Project Type**: Web service configuration change
+**Performance Goals**: No measurable impact — one boolean read at startup
+**Constraints**: Default must be `true` (backwards compatible); invalid env var value must not crash startup; no other routes affected
+**Scale/Scope**: Three files changed (`config.py`, `main.py`, `.env.example`); one new test file; one existing test file extended
+
+## Constitution Check
+
+| Principle | Requirement | Status |
+|-----------|-------------|--------|
+| §5.1 TDD | Failing tests written before implementation | ✅ Tasks order tests first |
+| §5.2 Integration tests | New integration tests follow existing pattern | ✅ |
+| §5.3 Tests next to code | `api/tests/unit/` and `api/tests/integration/` | ✅ |
+| §5.4 CI before done | All tests pass before task marked done | ✅ |
+| §7.2 Env config | Flag via environment variable, not hardcoded | ✅ |
+| §7.3 Linting | `ruff` passes on all changed files | ✅ Enforced in polish task |
+| §2.6 No speculative abstraction | One boolean field, no plugin system | ✅ |
+
+**No violations. All gates pass.**
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/012-api-docs-gate/
+├── plan.md                    ← this file
+├── research.md                ← 6 decisions
+├── contracts/
+│   └── docs-endpoints.md     ← behaviour contract for 3 affected endpoints
+├── quickstart.md              ← 4 test scenarios
+└── tasks.md                  ← generated by /speckit-tasks
+```
+
+### Source Code Changes
+
+```text
+api/
+├── app/
+│   ├── config.py              ← MODIFIED: add api_docs_enabled field + validator
+│   └── main.py                ← MODIFIED: conditional docs_url/redoc_url/openapi_url
+├── tests/
+│   ├── unit/
+│   │   └── test_config.py     ← MODIFIED: 2 new tests for api_docs_enabled
+│   └── integration/
+│       └── test_docs_gate.py  ← NEW: 2 integration tests (disabled + enabled)
+
+.env.example                   ← MODIFIED: document API_DOCS_ENABLED
+```
+
+## Implementation Design
+
+### `app/config.py` — new field with graceful fallback validator
+
+```python
+from pydantic import field_validator
+
+class Settings(BaseSettings):
+    # ... existing fields ...
+    api_docs_enabled: bool = True
+
+    @field_validator('api_docs_enabled', mode='before')
+    @classmethod
+    def coerce_docs_enabled(cls, v):
+        if isinstance(v, bool):
+            return v
+        try:
+            from pydantic import TypeAdapter
+            return TypeAdapter(bool).validate_python(v)
+        except Exception:
+            return True  # FR-007: invalid value → safe default (enabled)
+```
+
+### `app/main.py` — conditional docs URLs
+
+```python
+_settings = get_settings()
+
+app = FastAPI(
+    title="Reactbin API",
+    version="1.0.0",
+    lifespan=lifespan,
+    docs_url="/docs" if _settings.api_docs_enabled else None,
+    redoc_url="/redoc" if _settings.api_docs_enabled else None,
+    openapi_url="/openapi.json" if _settings.api_docs_enabled else None,
+)
+```
+
+### Integration test pattern
+
+The `app` object is constructed at module import time. Tests reload the module with the env var pre-set:
+
+```python
+def test_docs_disabled(monkeypatch, _base_env):
+    monkeypatch.setenv("API_DOCS_ENABLED", "false")
+    from app.config import get_settings
+    get_settings.cache_clear()
+    import importlib, app.main as m
+    importlib.reload(m)
+    client = TestClient(m.app)
+    assert client.get("/docs").status_code == 404
+    assert client.get("/redoc").status_code == 404
+    assert client.get("/openapi.json").status_code == 404
+    assert client.get("/api/v1/health").status_code == 200
+```
+
+`get_settings.cache_clear()` is required before the reload so the new env var is picked up.
+
+### `.env.example` addition
+
+```bash
+# API documentation endpoints (Swagger UI, ReDoc, OpenAPI schema)
+# Set to false in production to avoid exposing the API surface publicly.
+API_DOCS_ENABLED=true
+```
+
+## Dependencies & Risks
+
+| Item | Risk | Mitigation |
+|------|------|------------|
+| `@lru_cache` on `get_settings()` | Tests may pick up cached settings across reloads | Always call `get_settings.cache_clear()` before reloading `app.main` in tests |
+| Module-level `get_settings()` in `main.py` | Import fails if required settings are absent (pre-existing behaviour) | Not a new risk; same as today |
+| `openapi_url=None` | Disables HTTP route but not internal schema generation | Intentional; request validation is unaffected |
--- a/specs/012-api-docs-gate/quickstart.md
+++ b/specs/012-api-docs-gate/quickstart.md
@@ -0,0 +1,42 @@
+# Quickstart: API Documentation Visibility Gate
+
+## Verify docs are disabled
+
+```bash
+# Start API with docs disabled
+API_DOCS_ENABLED=false uvicorn app.main:app --reload
+
+curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/docs        # → 404
+curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/redoc       # → 404
+curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/openapi.json # → 404
+curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/api/v1/health # → 200
+```
+
+## Verify docs are enabled (default)
+
+```bash
+# Start API without the flag (or with it set to true)
+uvicorn app.main:app --reload
+
+curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/docs         # → 200
+curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/redoc        # → 200
+curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/openapi.json # → 200
+```
+
+## Integration test scenarios
+
+### Scenario 1: flag disabled — all three docs endpoints return 404
+
+Start a test client with `API_DOCS_ENABLED=false` injected into settings. Assert each of the three endpoint paths returns 404. Assert `/api/v1/health` returns 200.
+
+### Scenario 2: flag enabled (default) — docs endpoints return 200
+
+Start a test client without the flag (or with `API_DOCS_ENABLED=true`). Assert each of the three endpoint paths returns 200.
+
+### Scenario 3: invalid flag value — app starts, docs enabled
+
+Set `API_DOCS_ENABLED=not-a-bool`. The app must start without error. Docs must be accessible (safe fallback to enabled).
+
+### Scenario 4: flag absent — docs enabled (backwards compatibility)
+
+Start the app with no `API_DOCS_ENABLED` variable set. Assert docs endpoints return 200 — identical to pre-feature behaviour.
--- a/specs/012-api-docs-gate/research.md
+++ b/specs/012-api-docs-gate/research.md
@@ -0,0 +1,36 @@
+# Research: API Documentation Visibility Gate
+
+## Decision 1: Env var name
+
+**Decision**: `API_DOCS_ENABLED` (boolean, default `true`)
+**Rationale**: Consistent with the existing `API_BASE_URL` naming convention in the project. The positive-phrasing default (`true` = enabled) preserves backwards compatibility — existing deployments that don't set the variable get the same behaviour as today.
+**Alternatives considered**: `HIDE_API_DOCS=false` (negative phrasing) — inverted booleans are error-prone and confusing in `.env` files; `DOCS_ENABLED` — too generic, could collide with other tools in a multi-service env file.
+
+## Decision 2: FastAPI docs suppression mechanism
+
+**Decision**: Pass `docs_url=None`, `redoc_url=None`, `openapi_url=None` to the `FastAPI()` constructor when the flag is disabled.
+**Rationale**: This is the official FastAPI-supported mechanism. Setting these to `None` causes FastAPI to register no routes for those paths — requests to them fall through to the default 404 handler. The internal OpenAPI schema is still generated in memory (for request validation), but no HTTP route exposes it.
+**Alternatives considered**: Route-level middleware that intercepts and returns 404 — more complex, not the canonical approach; removing routers at runtime — impossible, routers are registered at import time.
+
+## Decision 3: Settings read at module level
+
+**Decision**: Read `get_settings()` once at module import time in `main.py` to configure the `FastAPI()` constructor.
+**Rationale**: `FastAPI()` is instantiated at module level; the docs URL parameters must be known at that point. `get_settings()` is already `@lru_cache` so calling it at module level is cheap and consistent with calling it again inside `lifespan`. Tests that need to change the flag must reload the module or override `get_settings`.
+**Alternatives considered**: Lazy initialisation of `app` inside a factory function — would require restructuring `main.py` and all imports; not worth the complexity for this change.
+
+## Decision 4: Graceful fallback for invalid flag values (FR-007)
+
+**Decision**: Add a `@field_validator('api_docs_enabled', mode='before')` in `Settings` that wraps Pydantic's bool coercion in a try/except and returns `True` on any `ValueError`.
+**Rationale**: Pydantic v2 raises `ValidationError` for unrecognised boolean strings (e.g., `API_DOCS_ENABLED=maybe`). FR-007 requires the app to start rather than fail. The validator intercepts the invalid value before Pydantic's own coercion and returns the safe default.
+**Alternatives considered**: Using `Optional[bool] = True` without a validator — Pydantic would still raise on invalid input; using `str` field with manual parsing — duplicates Pydantic's boolean parsing logic unnecessarily.
+
+## Decision 5: Integration test approach
+
+**Decision**: Test both enabled and disabled states by overriding `get_settings` in integration tests using `app.dependency_overrides`, or by constructing a local `FastAPI` app instance with the appropriate `docs_url`/`redoc_url`/`openapi_url` values.
+**Rationale**: The `app` in `app.main` is created at import time. Since the unit tests already use `monkeypatch` + `importlib.reload` for config changes, the integration tests for docs visibility can follow the same pattern — reload `app.main` with the env var set before importing `app`. Alternatively, test the URL routing behaviour directly by constructing a minimal test app.
+**Alternatives considered**: Patching `app.docs_url` after import — FastAPI does not re-register routes when these attributes are changed post-construction; no effect on routing.
+
+## Decision 6: Production documentation
+
+**Decision**: Update `.env.example` to include `API_DOCS_ENABLED=true` with a comment recommending `false` for production. No changes to `api/Dockerfile.prod` (env vars are supplied by the deployment environment, not the image).
+**Rationale**: The Dockerfile intentionally contains no runtime secrets or config. The `.env.example` is the canonical documentation for operators. A comment is sufficient; the production Dockerfile.prod already has no docs-related config.
--- a/specs/012-api-docs-gate/spec.md
+++ b/specs/012-api-docs-gate/spec.md
@@ -0,0 +1,80 @@
+# Feature Specification: API Documentation Visibility Gate
+
+**Feature Branch**: `012-api-docs-gate`
+**Created**: 2026-05-07
+**Status**: Draft
+**Input**: User description: "Add an environment variable flag to disable the FastAPI Swagger and ReDoc documentation endpoints (and the raw OpenAPI schema) in production. When disabled, all three endpoints return 404. When enabled (the default), behaviour is unchanged. The flag should be off by default in production and on by default in development."
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Documentation Hidden in Production (Priority: P1)
+
+An operator deploys the API to a production environment and wants to ensure that the interactive documentation UI and the raw API schema are not publicly reachable. Setting a configuration flag causes all three documentation endpoints to return "not found", as if they do not exist.
+
+**Why this priority**: Exposing the full API schema and interactive console to anonymous users in production reveals the attack surface of the application. Hiding it is a low-effort, high-value hardening step.
+
+**Independent Test**: Start the API with the flag set to disabled. Request each of the three documentation endpoints. All three must return 404.
+
+**Acceptance Scenarios**:
+
+1. **Given** the API is started with documentation disabled, **When** a client requests the interactive documentation UI, **Then** the response is 404 Not Found.
+2. **Given** the API is started with documentation disabled, **When** a client requests the alternative documentation UI, **Then** the response is 404 Not Found.
+3. **Given** the API is started with documentation disabled, **When** a client requests the raw OpenAPI schema endpoint, **Then** the response is 404 Not Found.
+4. **Given** the API is started with documentation disabled, **When** a client requests any other API endpoint (e.g., the health check), **Then** the response is unaffected — normal behaviour continues.
+
+---
+
+### User Story 2 - Documentation Available in Development (Priority: P2)
+
+A developer runs the API locally without setting the flag. The documentation endpoints remain fully accessible — no change in behaviour from before this feature.
+
+**Why this priority**: Developer productivity depends on the interactive docs being available during local development. The default must not break existing workflows.
+
+**Independent Test**: Start the API without the flag set (or with it explicitly enabled). Request each of the three documentation endpoints. All three must respond successfully with their normal content.
+
+**Acceptance Scenarios**:
+
+1. **Given** the API is started without the flag set, **When** a client requests any documentation endpoint, **Then** the response is the same as it was before this feature was introduced.
+2. **Given** the API is started with the flag explicitly set to enabled, **When** a client requests any documentation endpoint, **Then** the response is the same as it was before this feature was introduced.
+3. **Given** the flag is changed from enabled to disabled (or vice versa), **When** the API is restarted, **Then** the new state takes effect immediately with no other changes required.
+
+---
+
+### Edge Cases
+
+- What happens if the flag is set to an unrecognised value (e.g., a typo)?
+- What happens if the flag is absent entirely — is the default enabled or disabled?
+- Does disabling documentation affect any other behaviour (e.g., internal schema generation used for validation)?
+- If a monitoring tool scrapes the schema endpoint for API drift detection, does disabling break it?
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The system MUST support a configuration flag that controls whether the API documentation endpoints are reachable.
+- **FR-002**: When the flag is set to disabled, all three documentation endpoints (interactive UI, alternative UI, and raw schema) MUST return 404 Not Found.
+- **FR-003**: When the flag is set to enabled, the behaviour of all three documentation endpoints MUST be identical to the behaviour before this feature was introduced.
+- **FR-004**: The flag MUST default to **enabled** when not explicitly set (preserving backwards compatibility for existing deployments).
+- **FR-005**: Disabling documentation MUST NOT affect any other API endpoint, including the health check, authentication, and all resource endpoints.
+- **FR-006**: The flag MUST be configurable via an environment variable without requiring a code change or rebuild.
+- **FR-007**: An unrecognised or missing flag value MUST fall back to the enabled default rather than causing a startup failure.
+- **FR-008**: The existing `.env.example` file MUST be updated to document the flag and its default value.
+- **FR-009**: The production environment configuration MUST set the flag to disabled by default.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: With the flag disabled, all three documentation endpoints return 404, confirmed by automated test.
+- **SC-002**: With the flag enabled (or absent), all three documentation endpoints respond successfully, confirmed by automated test.
+- **SC-003**: All existing tests continue to pass — zero regressions introduced.
+- **SC-004**: The flag takes effect on restart with no other intervention required.
+- **SC-005**: The `.env.example` file documents the flag so any developer setting up the project discovers it without reading source code.
+
+## Assumptions
+
+- There are exactly three documentation-related endpoints to gate: the primary interactive UI, the alternative documentation UI, and the raw OpenAPI schema JSON. No other endpoints are affected.
+- The flag is read once at application startup; a running process does not need to respond to live changes.
+- Internal schema generation (used by the framework for request validation) is not affected by hiding the documentation endpoints — only the public-facing HTTP routes are removed.
+- The production Dockerfile (`api/Dockerfile.prod`) does not hardcode the flag; it is supplied via the deployment environment (docker-compose, Kubernetes secret, etc.).
+- "Off by default in production" means the recommended value for production is disabled, documented in `.env.example` and in the production docker-compose or deployment config; it does not mean the application auto-detects its environment.
--- a/specs/012-api-docs-gate/tasks.md
+++ b/specs/012-api-docs-gate/tasks.md
@@ -0,0 +1,100 @@
+# Tasks: API Documentation Visibility Gate
+
+**Input**: Design documents from `specs/012-api-docs-gate/`
+**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, contracts/docs-endpoints.md ✅, quickstart.md ✅
+
+**Tests**: TDD is non-negotiable (§5.1). Failing tests are written before implementation code in each phase.
+
+**Organization**: No setup or foundational phases — this feature modifies three existing files and adds one new test file. Phase 3 (US1) covers the disable path; Phase 4 (US2) verifies the enable/default path using the same implementation; Phase 5 polishes.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel with other [P] tasks in the same phase
+- **[Story]**: Which user story this task belongs to
+- Exact file paths included in every task description
+
+---
+
+## Phase 3: User Story 1 — Documentation Hidden in Production (Priority: P1) 🎯 MVP
+
+**Goal**: When `API_DOCS_ENABLED=false`, all three documentation endpoints (`/docs`, `/redoc`, `/openapi.json`) return 404. All other endpoints are unaffected.
+
+**Independent Test**: `make test-unit` passes the new settings tests; `make test-integration` passes the new `test_docs_disabled` integration test.
+
+### Tests for User Story 1 (TDD — write first, confirm failure before T003)
+
+- [X] T001 [US1] Add three failing unit tests to `api/tests/unit/test_config.py` using the existing `_apply_env`/`_BASE_ENV` pattern:
+  1. `test_api_docs_enabled_default` — call `Settings()` with `_BASE_ENV` only (no `API_DOCS_ENABLED`); assert `s.api_docs_enabled is True`
+  2. `test_api_docs_enabled_false` — call `Settings()` with `_BASE_ENV` + `{"API_DOCS_ENABLED": "false"}`; assert `s.api_docs_enabled is False`
+  3. `test_api_docs_invalid_value_defaults_to_enabled` — call `Settings()` with `_BASE_ENV` + `{"API_DOCS_ENABLED": "not-a-bool"}`; assert `s.api_docs_enabled is True` (graceful fallback, FR-007)
+  All three tests fail before T003 because `api_docs_enabled` does not yet exist on `Settings`.
+
+- [X] T002 [US1] Create `api/tests/integration/test_docs_gate.py` with two failing integration tests; the file MUST set up a minimal app client using `from starlette.testclient import TestClient` and the `importlib.reload` + `get_settings.cache_clear()` pattern shown in plan.md:
+  1. `test_docs_hidden_when_flag_disabled(monkeypatch)` — set `API_DOCS_ENABLED=false` via monkeypatch + all required env vars (`DATABASE_URL`, `JWT_SECRET_KEY`, `OWNER_USERNAME`, `OWNER_PASSWORD`, `S3_ENDPOINT_URL`, `S3_BUCKET_NAME`, `S3_ACCESS_KEY_ID`, `S3_SECRET_ACCESS_KEY`); call `get_settings.cache_clear()`; `importlib.reload(app.main)`; create `TestClient(app.main.app)`; assert `/docs` → 404, `/redoc` → 404, `/openapi.json` → 404, `/api/v1/health` → 200; after test, call `get_settings.cache_clear()` again as cleanup
+  2. `test_docs_visible_when_flag_enabled(monkeypatch)` — same setup but with `API_DOCS_ENABLED=true` (or omit it); assert `/docs` → 200, `/redoc` → 200, `/openapi.json` → 200
+  Both tests fail before T003/T004 because `api_docs_enabled` does not exist on `Settings`.
+
+### Implementation for User Story 1
+
+- [X] T003 [US1] Add `api_docs_enabled: bool = True` field and a `coerce_docs_enabled` field validator to the `Settings` class in `api/app/config.py`: the validator MUST use `mode='before'`, be a `@classmethod`, and wrap Pydantic bool coercion in a try/except that returns `True` on any exception (implements FR-007); import `field_validator` from `pydantic` at the top of the file; the field goes after the existing `login_trusted_proxy_ips` field.
+
+- [X] T004 [US1] Update `api/app/main.py`: before the `app = FastAPI(...)` call, add `_settings = get_settings()`; add `docs_url="/docs" if _settings.api_docs_enabled else None`, `redoc_url="/redoc" if _settings.api_docs_enabled else None`, and `openapi_url="/openapi.json" if _settings.api_docs_enabled else None` as keyword arguments to the `FastAPI()` constructor; the existing module-level defaults for `app.state` (after the `app = FastAPI(...)` line) are unchanged.
+
+- [X] T005 [US1] Verify TDD green for US1: run `cd api && python -m pytest tests/unit/ -v -k "docs"` and confirm all three new unit tests pass; then run `cd api && python -m pytest tests/unit/ -v` to confirm no regressions in the full 102-test unit suite.
+
+**Checkpoint**: US1 is complete. With `API_DOCS_ENABLED=false` the three docs endpoints return 404; all other endpoints are unaffected.
+
+---
+
+## Phase 4: User Story 2 — Documentation Available in Development (Priority: P2)
+
+**Goal**: Without the flag set (or with it set to `true`), docs endpoints behave identically to before this feature. Default is backwards compatible.
+
+**Independent Test**: `make test-integration` — the `test_docs_visible_when_flag_enabled` test written in T002 passes, confirming the enabled/default path.
+
+- [X] T006 [US2] Verify TDD green for US2: run `make test-integration` from `/workspace` and confirm all integration tests pass, including `test_docs_gate.py::test_docs_visible_when_flag_enabled` and the full existing suite (102 tests + 2 new = 104 total).
+
+**Checkpoint**: Both user stories verified. Flag disabled → 404; flag enabled or absent → unchanged behaviour.
+
+---
+
+## Phase 5: Polish & Cross-Cutting Concerns
+
+- [X] T007 Add documentation for `API_DOCS_ENABLED` to `/workspace/.env.example`: insert a new section after the `LOGIN_TRUSTED_PROXY_IPS` block with a comment and `API_DOCS_ENABLED=true`; the comment MUST note that this should be set to `false` in production to avoid publicly exposing the API schema
+
+- [X] T008 Run `ruff check api/app/config.py api/app/main.py api/tests/integration/test_docs_gate.py` from `/workspace/api` and fix any lint violations; then run `ruff check api/` to confirm the full API directory is clean
+
+---
+
+## Dependencies & Execution Order
+
+- T001 and T002 can run in parallel (different files, both TDD-red before implementation)
+- T003 must complete before T004 (main.py reads from config.py)
+- T005 after T003 and T004
+- T006 after T005
+- T007 and T008 can run in parallel (different files, after all tests pass)
+
+### Execution Order Summary
+
+```
+Step 1: T001 ∥ T002   (write failing tests — TDD red)
+Step 2: T003          (implement config.py — turns T001 green)
+Step 3: T004          (implement main.py — turns T002 green)
+Step 4: T005          (verify unit tests green)
+Step 5: T006          (verify integration tests green — regression gate)
+Step 6: T007 ∥ T008   (polish — .env.example + ruff)
+```
+
+---
+
+## Implementation Strategy
+
+### MVP (US1 + US2 — one implementation covers both)
+
+1. Write failing tests (T001, T002)
+2. Add `api_docs_enabled` to `config.py` (T003)
+3. Update `FastAPI()` constructor in `main.py` (T004)
+4. Verify all tests green (T005, T006)
+5. Polish (T007, T008)
+
+US1 and US2 share the same implementation — the flag controls both paths. There is no separate implementation for US2; the default value of `true` is the entire implementation of US2.
--- a/specs/013-k8s-manifests/checklists/requirements.md
+++ b/specs/013-k8s-manifests/checklists/requirements.md
@@ -0,0 +1,35 @@
+# Specification Quality Checklist: Kubernetes Production Manifests
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-05-07
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [x] No implementation details (languages, frameworks, APIs)
+- [x] Focused on user value and business needs
+- [x] Written for non-technical stakeholders
+- [x] All mandatory sections completed
+
+## Requirement Completeness
+
+- [x] No [NEEDS CLARIFICATION] markers remain
+- [x] Requirements are testable and unambiguous
+- [x] Success criteria are measurable
+- [x] Success criteria are technology-agnostic (no implementation details)
+- [x] All acceptance scenarios are defined
+- [x] Edge cases are identified
+- [x] Scope is clearly bounded
+- [x] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [x] All functional requirements have clear acceptance criteria
+- [x] User scenarios cover primary flows
+- [x] Feature meets measurable outcomes defined in Success Criteria
+- [x] No implementation details leak into specification
+
+## Notes
+
+- FR-014 (migration files in production image) is a prerequisite code change to `Dockerfile.prod`, not a manifest. Included in scope as it is required for the init container to function.
+- Image tag placeholder strategy is documented in Assumptions; the specifics of tag substitution (kustomize, sed, etc.) are left to planning.
--- a/specs/013-k8s-manifests/contracts/operator-deploy.md
+++ b/specs/013-k8s-manifests/contracts/operator-deploy.md
@@ -0,0 +1,59 @@
+# Contract: Operator Deployment Interface
+
+The manifests in `k8s/` define the operator's deployment interface — the inputs required before applying and the observable outputs after applying.
+
+## Pre-deployment Prerequisites (Operator-supplied)
+
+| Prerequisite | Details |
+|---|---|
+| Vault KV v2 secret at `reactbin/api/config` | Must contain keys: `DATABASE_URL`, `JWT_SECRET_KEY`, `OWNER_USERNAME`, `OWNER_PASSWORD`, `S3_ENDPOINT_URL`, `S3_BUCKET_NAME`, `S3_ACCESS_KEY_ID`, `S3_SECRET_ACCESS_KEY`, `API_BASE_URL` |
+| Vault KV v2 secret at `reactbin/minio/credentials` | Must contain keys: `MINIO_ROOT_USER`, `MINIO_ROOT_PASSWORD` |
+| Vault Kubernetes auth role | A role in the Vault Kubernetes auth mount bound to the `default` service account in the `reactbin` namespace with read access to both paths above |
+| `VaultConnection` resource | Named `default` in the operator's VSO namespace pointing to the Vault server address |
+| External PostgreSQL database | A dedicated database and user created; `DATABASE_URL` in Vault reflects the credentials |
+| DNS | The production domain resolves to the cluster ingress IP |
+| `ClusterIssuer` | A cert-manager `ClusterIssuer` named `letsencrypt-prod` exists in the cluster |
+| Image tags | The operator substitutes the `latest` placeholder in `k8s/api/deployment.yaml` and `k8s/ui/deployment.yaml` with the real image tag before applying |
+
+## Apply Command
+
+```bash
+# Substitute image tags
+sed -i 's|reactbin-api:latest|reactbin-api:<tag>|g' k8s/api/deployment.yaml
+sed -i 's|reactbin-ui:latest|reactbin-ui:<tag>|g' k8s/ui/deployment.yaml
+
+# Apply all manifests
+kubectl apply -f k8s/
+```
+
+Applying is idempotent — safe to re-run on every deployment.
+
+## Observable Outputs (Post-apply)
+
+| Resource | Expected State |
+|---|---|
+| `Namespace/reactbin` | Active |
+| `Deployment/api` in `reactbin` | 1/1 Ready (init container completes first) |
+| `Deployment/ui` in `reactbin` | 1/1 Ready |
+| `StatefulSet/minio` in `reactbin` | 1/1 Ready |
+| `Job/minio-init-bucket` in `reactbin` | Completed |
+| `Secret/api-env` in `reactbin` | Created by VSO, populated with all API env keys |
+| `Secret/minio-credentials` in `reactbin` | Created by VSO, populated with MinIO root keys |
+| `Certificate/reactbin-tls` in `reactbin` | Issued (may take up to 2 minutes on first apply) |
+| `Ingress/reactbin` in `reactbin` | Address populated with cluster ingress IP |
+
+## Verification Commands
+
+```bash
+# All pods running
+kubectl get pods -n reactbin
+
+# API health
+curl -sf https://<domain>/api/v1/health
+
+# UI reachable
+curl -sf https://<domain>/
+
+# Docs correctly gated (should return 404)
+curl -o /dev/null -w "%{http_code}" https://<domain>/docs
+```
--- a/specs/013-k8s-manifests/plan.md
+++ b/specs/013-k8s-manifests/plan.md
@@ -0,0 +1,238 @@
+# Implementation Plan: Kubernetes Production Manifests
+
+**Branch**: `013-k8s-manifests` | **Date**: 2026-05-07 | **Spec**: [spec.md](spec.md)
+**Input**: Feature specification from `specs/013-k8s-manifests/spec.md`
+
+## Summary
+
+Write Kubernetes manifests deploying Reactbin to k3s: a `Namespace`, API `Deployment` (with Alembic init container) + `Service`, UI `Deployment` + `Service`, a shared `Ingress` with Let's Encrypt TLS, a MinIO `StatefulSet` + `Service` + bucket-init `Job`, and three VSO CRDs (`VaultConnection`, `VaultAuth`, `VaultStaticSecret` × 2) to sync secrets from Vault. A small update to `api/Dockerfile.prod` includes Alembic migration files in the production image so the init container can run them.
+
+## Technical Context
+
+**Language/Version**: YAML (Kubernetes manifests); Python 3.12 (Dockerfile.prod touch)
+**Primary Dependencies**: Kubernetes 1.29+ API, nginx Ingress controller, cert-manager (ClusterIssuer `letsencrypt-prod`), Vault Secrets Operator (`secrets.hashicorp.com/v1beta1`), MinIO
+**Storage**: MinIO StatefulSet with ReadWriteOnce PVC (cluster default storage class); external PostgreSQL (operator-provisioned)
+**Testing**: `kubectl apply --dry-run=client` for schema validation; `yamllint` for formatting
+**Target Platform**: k3s cluster (Kubernetes 1.29+, Linux)
+**Performance Goals**: No measurable impact — manifests are declarative config, not runtime code
+**Constraints**: All secrets must come from Vault (no plaintext in manifests); all containers run non-root; MinIO is ClusterIP-only (no external Ingress)
+**Scale/Scope**: 11 YAML files across `k8s/`; one Dockerfile.prod change; one Makefile target
+
+## Constitution Check
+
+| Principle | Requirement | Status |
+|-----------|-------------|--------|
+| §5.1 TDD | Failing tests before implementation | ✅ Dry-run validation script written before manifests |
+| §5.4 CI before done | All tests pass before task marked done | ✅ kubectl dry-run + yamllint gate |
+| §7.2 Env config | No hardcoded secrets or hostnames | ✅ All secrets via VSO; domain is operator-substituted placeholder |
+| §7.3 Linting | `ruff` / linting passes | ✅ `yamllint` on all manifests |
+| §2.6 No speculative abstraction | No Kustomize overlays or Helm chart | ✅ Plain YAML, single environment |
+| §8 Scope boundaries | No multi-user, no OIDC, no OR/NOT tags | ✅ Not affected |
+
+**No violations. All gates pass.**
+
+*Post-design re-check*: The Dockerfile.prod change (FR-014) adds `alembic/` to the runtime stage only — no builder-stage change, no new dependencies, no behaviour change to the running API. Constitution unchanged.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/013-k8s-manifests/
+├── plan.md                          ← this file
+├── research.md                      ← 8 decisions
+├── contracts/
+│   └── operator-deploy.md           ← prerequisites + verification commands
+├── quickstart.md                    ← deploy + verify + scenario walkthroughs
+└── tasks.md                         ← generated by /speckit-tasks
+```
+
+### Source Code Changes
+
+```text
+k8s/                                 ← NEW directory
+├── namespace.yaml                   ← Namespace: reactbin
+├── api/
+│   ├── deployment.yaml              ← Deployment: api (with alembic init container)
+│   └── service.yaml                 ← Service: api (ClusterIP, port 8000)
+├── ui/
+│   ├── deployment.yaml              ← Deployment: ui
+│   └── service.yaml                 ← Service: ui (ClusterIP, port 8080)
+├── ingress.yaml                     ← Ingress: /api/ → api, / → ui, TLS via cert-manager
+├── minio/
+│   ├── statefulset.yaml             ← StatefulSet: minio (volumeClaimTemplates)
+│   ├── service.yaml                 ← Service: minio (ClusterIP, port 9000)
+│   └── init-job.yaml                ← Job: minio-init-bucket (mc mb --ignore-existing)
+└── vault/
+    ├── vault-auth.yaml              ← VaultAuth: kubernetes method, reactbin SA
+    ├── api-secret.yaml              ← VaultStaticSecret → K8s Secret: api-env
+    └── minio-secret.yaml            ← VaultStaticSecret → K8s Secret: minio-credentials
+
+api/Dockerfile.prod                  ← MODIFIED: add alembic/ and alembic.ini to runtime stage
+Makefile                             ← MODIFIED: add dry-run validation target
+```
+
+## Implementation Design
+
+### `api/Dockerfile.prod` — runtime stage addition
+
+```dockerfile
+# In the runtime stage, after copying app/:
+COPY --chown=appuser:appgroup alembic/ ./alembic/
+COPY --chown=appuser:appgroup alembic.ini .
+```
+
+No builder-stage change. No new base image. The init container uses the same image and `workingDir: /app`.
+
+### `k8s/namespace.yaml`
+
+```yaml
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: reactbin
+```
+
+### `k8s/vault/vault-auth.yaml`
+
+```yaml
+apiVersion: secrets.hashicorp.com/v1beta1
+kind: VaultAuth
+metadata:
+  name: reactbin-auth
+  namespace: reactbin
+spec:
+  method: kubernetes
+  mount: kubernetes
+  kubernetes:
+    role: reactbin
+    serviceAccount: default
+    audiences:
+      - https://kubernetes.default.svc
+```
+
+Note: `VaultConnection` is not included in the `k8s/` tree — it lives in the VSO operator's namespace and is operator-managed infrastructure, not application manifests.
+
+### `k8s/vault/api-secret.yaml`
+
+```yaml
+apiVersion: secrets.hashicorp.com/v1beta1
+kind: VaultStaticSecret
+metadata:
+  name: api-secret
+  namespace: reactbin
+spec:
+  vaultAuthRef: reactbin-auth
+  mount: secret
+  type: kv-v2
+  path: reactbin/api/config
+  refreshAfter: 1h
+  destination:
+    name: api-env
+    create: true
+```
+
+The API Deployment then uses `envFrom: [{secretRef: {name: api-env}}]`.
+
+### `k8s/vault/minio-secret.yaml`
+
+Same pattern, path `reactbin/minio/credentials`, destination `minio-credentials`.
+
+### `k8s/api/deployment.yaml` — init container
+
+```yaml
+initContainers:
+  - name: alembic-migrate
+    image: reactbin-api:latest          # same tag as main container
+    command: ["alembic", "upgrade", "head"]
+    workingDir: /app
+    envFrom:
+      - secretRef:
+          name: api-env
+containers:
+  - name: api
+    image: reactbin-api:latest
+    ports:
+      - containerPort: 8000
+    envFrom:
+      - secretRef:
+          name: api-env
+    env:
+      - name: API_DOCS_ENABLED
+          value: "false"
+    livenessProbe:
+      httpGet: {path: /api/v1/health, port: 8000}
+      initialDelaySeconds: 10
+      periodSeconds: 30
+    readinessProbe:
+      httpGet: {path: /api/v1/health, port: 8000}
+      initialDelaySeconds: 5
+      periodSeconds: 10
+    securityContext:
+      runAsNonRoot: true
+      runAsUser: 1001
+```
+
+### `k8s/ingress.yaml`
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: reactbin
+  namespace: reactbin
+  annotations:
+    cert-manager.io/cluster-issuer: letsencrypt-prod
+    nginx.ingress.kubernetes.io/ssl-redirect: "true"
+spec:
+  ingressClassName: nginx
+  tls:
+    - hosts: [<your-domain>]
+      secretName: reactbin-tls
+  rules:
+    - host: <your-domain>
+      http:
+        paths:
+          - path: /api/
+            pathType: Prefix
+            backend:
+              service: {name: api, port: {number: 8000}}
+          - path: /
+            pathType: Prefix
+            backend:
+              service: {name: ui, port: {number: 8080}}
+```
+
+`/api/` must be listed before `/`.
+
+### `k8s/minio/statefulset.yaml` — StatefulSet (not Deployment)
+
+StatefulSet gives stable pod name `minio-0` and automatic PVC reattachment via `volumeClaimTemplates`. ReadWriteOnce, default storage class.
+
+Health probes: `GET /minio/health/live:9000` (liveness), `GET /minio/health/ready:9000` (readiness).
+
+### `k8s/minio/init-job.yaml`
+
+```yaml
+command: ["sh", "-c", "mc alias set local http://minio:9000 $MINIO_ROOT_USER $MINIO_ROOT_PASSWORD && mc mb --ignore-existing local/reactbin"]
+```
+
+`restartPolicy: OnFailure`. `--ignore-existing` makes the job idempotent.
+
+### Makefile addition
+
+```makefile
+validate-k8s:
+    yamllint k8s/
+    kubectl apply --dry-run=client -f k8s/
+```
+
+## Dependencies & Risks
+
+| Item | Risk | Mitigation |
+|------|------|------------|
+| `VaultConnection` not in `k8s/` | Operator may not have it pre-created | Documented as prerequisite in contracts/operator-deploy.md |
+| `letsencrypt-prod` ClusterIssuer name | May differ in operator's cluster | Documented as prerequisite; easy to sed-replace |
+| Image tag placeholder `latest` | Operator forgets to substitute | `validate-k8s` dry-run will succeed but notes in quickstart.md and task descriptions warn explicitly |
+| MinIO PVC storage class | Default may be unsuitable (e.g., ephemeral) | Noted in Assumptions; operator can patch `storageClassName` |
+| `<your-domain>` placeholder in Ingress | `kubectl apply --dry-run=client` validates everything except host value | Noted in quickstart; hostname must be substituted before applying |
--- a/specs/013-k8s-manifests/quickstart.md
+++ b/specs/013-k8s-manifests/quickstart.md
@@ -0,0 +1,92 @@
+# Quickstart: Kubernetes Production Deployment
+
+## Before You Apply
+
+1. Store API secrets in Vault at `reactbin/api/config` (KV v2):
+   ```
+   DATABASE_URL          = postgresql+asyncpg://reactbin:<pw>@<host>:5432/reactbin
+   JWT_SECRET_KEY        = <long-random-string>
+   OWNER_USERNAME        = <your-username>
+   OWNER_PASSWORD        = <your-password>
+   S3_ENDPOINT_URL       = http://minio.reactbin.svc.cluster.local:9000
+   S3_BUCKET_NAME        = reactbin
+   S3_ACCESS_KEY_ID      = <same as MINIO_ROOT_USER>
+   S3_SECRET_ACCESS_KEY  = <same as MINIO_ROOT_PASSWORD>
+   API_BASE_URL          = https://<your-domain>
+   API_DOCS_ENABLED      = false
+   ```
+
+2. Store MinIO credentials in Vault at `reactbin/minio/credentials` (KV v2):
+   ```
+   MINIO_ROOT_USER     = <choose a strong username>
+   MINIO_ROOT_PASSWORD = <choose a strong password>
+   ```
+
+3. Create a Vault Kubernetes auth role bound to the `default` service account in the `reactbin` namespace with read access to both paths above.
+
+4. Confirm DNS resolves to the cluster ingress IP and the `letsencrypt-prod` ClusterIssuer exists.
+
+## Deploy
+
+```bash
+# Substitute the real image tags
+sed -i 's|reactbin-api:latest|reactbin-api:v1.0.0|g' k8s/api/deployment.yaml
+sed -i 's|reactbin-ui:latest|reactbin-ui:v1.0.0|g' k8s/ui/deployment.yaml
+
+# Apply everything
+kubectl apply -f k8s/
+```
+
+## Verify
+
+```bash
+# Watch pods come up (init container runs first on the API pod)
+kubectl get pods -n reactbin -w
+
+# API health
+curl -sf https://<your-domain>/api/v1/health && echo "API OK"
+
+# UI reachable
+curl -sf -o /dev/null -w "%{http_code}\n" https://<your-domain>/
+
+# Docs correctly gated
+curl -o /dev/null -w "%{http_code}\n" https://<your-domain>/docs    # → 404
+curl -o /dev/null -w "%{http_code}\n" https://<your-domain>/redoc   # → 404
+
+# Check migration init container ran
+kubectl logs -n reactbin -l app=api -c alembic-migrate
+```
+
+## Scenario: Migration fails on deploy
+
+```bash
+# Pod will be stuck in Init state
+kubectl get pods -n reactbin
+# NAME          READY   STATUS                  RESTARTS
+# api-xxx-yyy   0/1     Init:CrashLoopBackOff   2
+
+# See why
+kubectl logs -n reactbin <pod-name> -c alembic-migrate
+
+# Fix the issue (e.g. correct DATABASE_URL in Vault, wait for VSO to resync)
+# Then delete the pod to force a fresh rollout
+kubectl rollout restart deployment/api -n reactbin
+```
+
+## Scenario: Update to a new image version
+
+```bash
+kubectl set image deployment/api api=reactbin-api:v1.1.0 -n reactbin
+kubectl set image deployment/ui ui=reactbin-ui:v1.1.0 -n reactbin
+# Kubernetes rolls out new pods; init container runs migrations before traffic switches
+```
+
+## Scenario: Restore after MinIO pod restart
+
+MinIO uses a PersistentVolumeClaim. Pod restarts do not affect stored data. Verify:
+
+```bash
+kubectl delete pod -n reactbin minio-0
+kubectl get pods -n reactbin -w   # minio-0 restarts, PVC reattaches
+# Previously uploaded images should still be accessible via the API
+```
--- a/specs/013-k8s-manifests/research.md
+++ b/specs/013-k8s-manifests/research.md
@@ -0,0 +1,63 @@
+# Research: Kubernetes Production Manifests
+
+## Decision 1: VSO CRD chain (VaultConnection → VaultAuth → VaultStaticSecret)
+
+**Decision**: Use three CRDs — `VaultConnection`, `VaultAuth`, and `VaultStaticSecret` — all under `apiVersion: secrets.hashicorp.com/v1beta1`.
+**Rationale**: This is the required VSO resource chain. `VaultConnection` points to the Vault server address. `VaultAuth` declares the Kubernetes auth method (role, service account, mount path). `VaultStaticSecret` references a `VaultAuth` via `vaultAuthRef` and declares the Vault KV path and the destination K8s Secret name. VSO syncs all Vault keys to the K8s Secret 1:1 by default — no explicit key mapping needed.
+**Alternatives considered**: `VaultAuthGlobal` for cross-namespace sharing — not needed; all resources are in the same `reactbin` namespace.
+
+Key fields:
+- `VaultStaticSecret.spec.type`: `kv-v2` (standard for modern Vault)
+- `VaultStaticSecret.spec.refreshAfter`: `1h` (Go duration string)
+- `VaultStaticSecret.spec.destination.create: true` — VSO creates the K8s Secret if absent
+- `VaultAuth.spec.kubernetes.role` — a Vault role the operator must pre-create and bind to the `reactbin` namespace service account
+
+## Decision 2: MinIO as StatefulSet (not Deployment)
+
+**Decision**: Run MinIO as a `StatefulSet` with `volumeClaimTemplates`.
+**Rationale**: StatefulSet gives the pod a stable name (`minio-0`) and automatically reattaches its PVC on pod recreation. A Deployment would require a manually-created PVC and is prone to PVC binding issues on reschedule. The marginal complexity of a StatefulSet over a Deployment is acceptable. `ReadWriteOnce` PVC is correct for single-replica MinIO.
+**Alternatives considered**: Deployment with explicit PVC — works but PVC lifecycle is decoupled from the pod, creating operational risk.
+
+MinIO health probes:
+- Liveness: `GET /minio/health/live:9000`
+- Readiness: `GET /minio/health/ready:9000`
+
+MinIO env vars: `MINIO_ROOT_USER`, `MINIO_ROOT_PASSWORD` (injected from a K8s Secret synced by VSO).
+
+## Decision 3: Bucket initialisation via Kubernetes Job with `minio/mc`
+
+**Decision**: A one-off `Job` using `minio/mc:latest` runs `mc mb --ignore-existing` to create the bucket idempotently.
+**Rationale**: This is the standard in-cluster pattern. `--ignore-existing` makes the job safe to re-apply (exits 0 if bucket already exists). `restartPolicy: OnFailure` retries transient failures (e.g. MinIO not yet ready).
+**Alternatives considered**: Init container on the API pod — tightly couples bucket creation to API startup; a Job is cleaner and independently rerunnable.
+
+## Decision 4: Ingress — single resource, `/api/` path before `/`
+
+**Decision**: One `Ingress` resource with `ingressClassName: nginx`, two path entries in a single rule: `/api/` (Prefix) → API Service, `/` (Prefix) → UI Service; `/api/` must be listed first.
+**Rationale**: nginx ingress evaluates paths in declaration order; the more specific `/api/` prefix must appear before `/` or all traffic is routed to the UI. No path rewriting annotation is needed — the API already handles full `/api/v1/...` paths.
+**TLS**: cert-manager annotation `cert-manager.io/cluster-issuer: letsencrypt-prod` triggers automatic certificate provisioning into a K8s Secret named in `spec.tls[].secretName`. HTTP→HTTPS redirect is on by default when TLS is configured (`nginx.ingress.kubernetes.io/ssl-redirect: "true"` is explicit but redundant).
+**Alternatives considered**: Two separate Ingress resources (one per service) — works but harder to reason about routing order; single Ingress is canonical.
+
+## Decision 5: Alembic init container — same image, workdir `/app`
+
+**Decision**: The API Deployment includes an init container with the same image as the main container, `command: ["alembic", "upgrade", "head"]`, and `workingDir: /app`. It shares the API's env secret via `envFrom` so it can read `DATABASE_URL`.
+**Rationale**: Alembic needs `DATABASE_URL` to connect and `alembic.ini` + `alembic/` to find migrations. Both are available in the production image once `Dockerfile.prod` is updated. Using the same image guarantees the migration files match the running version.
+**Dockerfile.prod update required**: Add `COPY --chown=appuser:appgroup alembic/ ./alembic/` and `COPY --chown=appuser:appgroup alembic.ini .` in the runtime stage (not the builder stage — no compilation needed).
+**Alternatives considered**: Separate migration image — adds a second image to build and push on every release; unnecessary when the source image already has everything.
+
+## Decision 6: Image tag strategy — placeholder `latest`, substituted at deploy time
+
+**Decision**: Manifests reference image tags using `latest` as a documented placeholder. The operator substitutes the real tag with `kubectl set image` or a `sed` one-liner before applying.
+**Rationale**: Kustomize's `images` transformer is the clean alternative, but introduces a tooling dependency. For a personal single-operator deployment, `sed` or `kubectl set image` after `kubectl apply` is simpler and requires no additional setup. The placeholder is documented in the operator guide (quickstart.md).
+**Alternatives considered**: Kustomize overlays — appropriate for multi-environment setups; over-engineered for one environment.
+
+## Decision 7: Two VaultStaticSecrets (API env and MinIO credentials)
+
+**Decision**: Separate VaultStaticSecret resources for API env vars and MinIO root credentials, syncing into `api-env` and `minio-credentials` K8s Secrets respectively.
+**Rationale**: The API's env secret contains database, JWT, and S3 access credentials. MinIO's root credentials are a different concern with a different rotation lifecycle. Keeping them separate makes Vault policies simpler (least privilege) and avoids giving the API pod access to MinIO's root password.
+**Vault paths assumed**: `reactbin/api/config` (KV v2) for API env; `reactbin/minio/credentials` (KV v2) for MinIO root credentials.
+
+## Decision 8: Namespace manifest included in `k8s/`
+
+**Decision**: `k8s/namespace.yaml` creates the `reactbin` namespace as part of the manifest set.
+**Rationale**: Makes the full deployment self-contained — operator runs `kubectl apply -f k8s/` without a prerequisite namespace creation step.
+**Note**: If the namespace already exists, `kubectl apply` is idempotent.
--- a/specs/013-k8s-manifests/spec.md
+++ b/specs/013-k8s-manifests/spec.md
@@ -0,0 +1,124 @@
+# Feature Specification: Kubernetes Production Manifests
+
+**Feature Branch**: `013-k8s-manifests`
+**Created**: 2026-05-07
+**Status**: Draft
+**Input**: User description: "Kubernetes manifests for production deployment to k3s: Deployment, Service, and Ingress for the API and UI; VaultStaticSecret CRDs to sync secrets from HashiCorp Vault; Alembic init container on the API Deployment for schema migrations. The cluster uses an nginx ingress controller with Let's Encrypt TLS, a shared external Postgres instance, MinIO running in-cluster, and VSO (Vault Secrets Operator) for secret management."
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 — Application Reachable in Production (Priority: P1)
+
+As an operator, I can apply the manifests to my k3s cluster and have both the API and UI reachable at the production domain over HTTPS, with all health checks passing.
+
+**Why this priority**: This is the core deployment goal. Nothing else matters if the application is not reachable.
+
+**Independent Test**: Apply the API and UI manifests with a manually-created K8s Secret (bypassing Vault). Confirm the UI loads at the domain root and the API health endpoint returns 200 at `/api/v1/health`. Confirm HTTPS is enforced and HTTP redirects to HTTPS.
+
+**Acceptance Scenarios**:
+
+1. **Given** the manifests are applied to the cluster, **When** a browser navigates to `https://<domain>/`, **Then** the UI loads successfully with a valid TLS certificate.
+2. **Given** the manifests are applied, **When** a request is made to `https://<domain>/api/v1/health`, **Then** a 200 response is returned.
+3. **Given** the API docs flag is disabled, **When** a request is made to `https://<domain>/docs`, **Then** a 404 is returned.
+4. **Given** the API pod is restarted, **When** it comes back up, **Then** it passes readiness checks before receiving traffic.
+5. **Given** a request for an unknown path, **When** it is made to the UI, **Then** the SPA serves the index page (client-side routing is preserved).
+
+---
+
+### User Story 2 — Secrets Sourced from Vault (Priority: P2)
+
+As an operator, no secrets are stored in version-controlled manifest files. All sensitive values are declared in Vault and synced automatically into the cluster as Kubernetes Secrets by the Vault Secrets Operator.
+
+**Why this priority**: Security prerequisite for production. Hardcoded secrets in manifests are a material risk.
+
+**Independent Test**: Run `git grep` for known secret patterns across `k8s/` and confirm zero matches. Confirm VaultStaticSecret CRDs reference a Vault path and that the synced K8s Secret is created and the API pod's environment is populated from it.
+
+**Acceptance Scenarios**:
+
+1. **Given** Vault contains the required secret values at the declared path, **When** VSO is running, **Then** a K8s Secret is created in the cluster namespace with the declared keys.
+2. **Given** the K8s Secret exists, **When** the API pod starts, **Then** its environment variables are populated from that secret.
+3. **Given** a `git grep` for plaintext credentials across `k8s/`, **When** run against the committed manifests, **Then** no plaintext secrets are found.
+
+---
+
+### User Story 3 — Schema Migrations Run Before API Starts (Priority: P3)
+
+As an operator, every time the API is deployed, database migrations run automatically in an init container before the main application container starts. A failed migration prevents the pod from starting, protecting against schema drift.
+
+**Why this priority**: Prevents the API from serving requests against a stale or incompatible schema. Safe deployment ordering is essential for production.
+
+**Independent Test**: Deploy with the init container pointing at a valid database. Confirm migrations run and the API starts. Simulate a failing migration by pointing the init container at an unreachable database and confirm the pod stays in init state and does not serve traffic.
+
+**Acceptance Scenarios**:
+
+1. **Given** the API Deployment is applied, **When** the pod starts, **Then** the init container completes `alembic upgrade head` before the main container starts.
+2. **Given** the schema is already current, **When** the pod starts, **Then** the migration init container exits successfully with no changes applied.
+3. **Given** the migration fails, **When** the pod starts, **Then** the init container exits non-zero, the main container does not start, and the pod enters a visible error state.
+
+---
+
+### User Story 4 — MinIO Runs In-Cluster with Persistent Storage (Priority: P4)
+
+As an operator, MinIO runs inside the cluster with a PersistentVolumeClaim for durable storage, is not externally reachable, and has the required bucket initialised on first deployment.
+
+**Why this priority**: Required for image storage, but decoupled from the other manifests — the S3 endpoint is just a config value the API reads.
+
+**Independent Test**: Confirm the MinIO pod is running and has no external Ingress. Confirm the required bucket exists. Restart the MinIO pod and confirm previously stored objects are still accessible.
+
+**Acceptance Scenarios**:
+
+1. **Given** the MinIO manifests are applied, **When** the MinIO pod starts, **Then** the required bucket is created and the API can store and retrieve images.
+2. **Given** the MinIO pod is restarted, **When** it comes back up, **Then** all previously stored objects remain accessible (PVC-backed storage persists).
+3. **Given** no Ingress is defined for MinIO, **When** a connection is attempted from outside the cluster, **Then** MinIO is not reachable.
+
+---
+
+### Edge Cases
+
+- What if Vault is unavailable when VSO tries to sync? VSO retries on a configurable interval; the pod will not start until the K8s Secret exists.
+- What if the database is unreachable during migration? The init container exits non-zero; the pod does not start and Kubernetes retries with backoff.
+- What if the MinIO PVC runs out of space? MinIO will fail writes; the API will return upload errors. Capacity monitoring is out of scope for this feature.
+- What if migrations and the main container use different image tags? They use the same tag in the same Deployment spec, so they are always in sync.
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: All manifests MUST target a single configurable namespace (default: `reactbin`).
+- **FR-002**: The API MUST be deployed as a Deployment with liveness and readiness probes on `/api/v1/health`.
+- **FR-003**: The API Deployment MUST include an init container using the same image that runs database schema migrations before the main container starts.
+- **FR-004**: The API Deployment MUST set `API_DOCS_ENABLED=false`.
+- **FR-005**: The UI MUST be deployed as a Deployment with a liveness probe confirming the nginx process is serving.
+- **FR-006**: A single Ingress MUST route `https://<domain>/api/` to the API Service and all other paths to the UI Service, with TLS termination via a cert-manager Let's Encrypt certificate.
+- **FR-007**: HTTP requests MUST be redirected to HTTPS via the Ingress.
+- **FR-008**: All API secrets MUST be declared in a VaultStaticSecret CRD and synced into a K8s Secret; no secret value MUST appear as plaintext in any manifest file.
+- **FR-009**: The API Deployment MUST source all environment variables from the synced K8s Secret via `envFrom`.
+- **FR-010**: MinIO MUST be deployed as a StatefulSet with a PersistentVolumeClaim using the cluster's default storage class.
+- **FR-011**: A Kubernetes Job MUST create the required S3 bucket in MinIO on first deployment and MUST be idempotent on re-apply.
+- **FR-012**: MinIO MUST have no Ingress; it MUST only be accessible within the cluster via ClusterIP.
+- **FR-013**: All containers MUST run as non-root users.
+- **FR-014**: The API production image MUST include migration files so the init container can run migrations without a separate image.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: The application is accessible at the production domain within 120 seconds of `kubectl apply`.
+- **SC-002**: Schema migrations complete and the API begins serving traffic without manual operator intervention on every deployment.
+- **SC-003**: A `git grep` across `k8s/` finds zero plaintext secret values in committed files.
+- **SC-004**: A simulated migration failure holds the pod in init state and the application never serves traffic.
+- **SC-005**: Restarting the MinIO pod does not result in data loss — previously uploaded images remain accessible.
+
+## Assumptions
+
+- The k3s cluster is running with the nginx ingress controller installed.
+- cert-manager is installed and a `ClusterIssuer` named `letsencrypt-prod` is already configured.
+- The Vault Secrets Operator is installed in the cluster.
+- A HashiCorp Vault instance is accessible from the cluster and the required secret values are stored at the declared Vault path before deployment.
+- A shared external PostgreSQL instance is available; the operator creates a dedicated database and user before deploying.
+- DNS for the production domain is already pointing at the cluster ingress IP.
+- Manifests are stored in a `k8s/` directory at the repository root.
+- The cluster's default storage class supports ReadWriteOnce (sufficient for single-replica MinIO).
+- All Deployments run a single replica (personal tool, no HA requirement).
+- Image tags are managed externally; manifests use a placeholder tag that the operator substitutes at deploy time.
+- The `API_DOCS_ENABLED` flag exists on the API (implemented in feature 012).
--- a/specs/013-k8s-manifests/tasks.md
+++ b/specs/013-k8s-manifests/tasks.md
@@ -0,0 +1,174 @@
+# Tasks: Kubernetes Production Manifests
+
+**Input**: Design documents from `specs/013-k8s-manifests/`
+**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, contracts/operator-deploy.md ✅, quickstart.md ✅
+
+**Tests**: K8s manifests have no unit test framework. Validation is via `yamllint` (format) and `kubectl apply --dry-run=client` (schema). Each phase ends with a validation step. The TDD analogue is: write the validate-k8s Makefile target (Phase 1) before any manifest exists, so it immediately fails — then manifests are written to make it pass.
+
+**Organization**: Phase 1 creates the directory structure and validation target. Phase 2 creates the namespace and Vault CRDs (foundational — required by all user story deployments). Phases 3–6 implement user stories. Phase 7 polishes.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel with other [P] tasks in the same phase
+- **[Story]**: Which user story this task belongs to
+- Exact file paths included in every task description
+
+---
+
+## Phase 1: Setup
+
+**Goal**: Create the `k8s/` directory structure and the validation Makefile target before any manifests exist.
+
+- [X] T001 Create the `k8s/` directory tree: `mkdir -p k8s/api k8s/ui k8s/minio k8s/vault` from the repository root; confirm the four subdirectories exist
+
+- [X] T002 Add a `validate-k8s` target to `Makefile` immediately after the existing `verify-ui-prod` target: the target MUST run `yamllint -d relaxed k8s/` then `kubectl apply --dry-run=client -f k8s/`; add `validate-k8s` to the `.PHONY` line; note in a comment that `kubectl apply --dry-run=client` requires a kubeconfig with cluster access — offline validation uses `yamllint` only
+
+---
+
+## Phase 2: Foundational (Namespace + Vault CRDs)
+
+**Goal**: Namespace and Vault secret-sync resources that every other manifest depends on.
+
+**⚠️ CRITICAL**: No user story manifest can be applied until this phase is complete — the namespace must exist before any namespaced resource, and the Vault CRDs must exist before the API or MinIO pods can start.
+
+- [X] T003 Create `k8s/namespace.yaml`: a single `Namespace` resource with `name: reactbin` and no additional labels
+
+- [X] T004 [P] Create `k8s/vault/vault-auth.yaml`: a `VaultAuth` resource (`apiVersion: secrets.hashicorp.com/v1beta1`) with `name: reactbin-auth`, `namespace: reactbin`, `spec.method: kubernetes`, `spec.mount: kubernetes`, `spec.kubernetes.role: reactbin`, `spec.kubernetes.serviceAccount: default`, `spec.kubernetes.audiences: [https://kubernetes.default.svc]`; add a comment noting the operator must create the Vault role and bind it to the `default` SA in the `reactbin` namespace with read access to both secret paths
+
+- [X] T005 [P] Create `k8s/vault/api-secret.yaml`: a `VaultStaticSecret` resource with `name: api-secret`, `namespace: reactbin`, `spec.vaultAuthRef: reactbin-auth`, `spec.mount: secret`, `spec.type: kv-v2`, `spec.path: reactbin/api/config`, `spec.refreshAfter: 1h`, `spec.destination.name: api-env`, `spec.destination.create: true`; add a comment listing all required Vault keys: `DATABASE_URL`, `JWT_SECRET_KEY`, `OWNER_USERNAME`, `OWNER_PASSWORD`, `S3_ENDPOINT_URL`, `S3_BUCKET_NAME`, `S3_ACCESS_KEY_ID`, `S3_SECRET_ACCESS_KEY`, `API_BASE_URL`
+
+- [X] T006 [P] Create `k8s/vault/minio-secret.yaml`: same structure as T005 but `name: minio-secret`, `spec.path: reactbin/minio/credentials`, `spec.destination.name: minio-credentials`; comment listing required Vault keys: `MINIO_ROOT_USER`, `MINIO_ROOT_PASSWORD`
+
+**Checkpoint**: Foundational resources complete. User story implementation can now begin.
+
+---
+
+## Phase 3: User Story 1 — Application Reachable in Production (Priority: P1) 🎯 MVP
+
+**Goal**: API and UI are deployed and reachable at the production domain via HTTPS with TLS from cert-manager.
+
+**Independent Test**: Apply all Phase 2 + Phase 3 manifests. Confirm `kubectl get pods -n reactbin` shows api and ui pods Running. Confirm `curl https://<domain>/api/v1/health` returns 200 and `curl https://<domain>/` returns 200.
+
+- [X] T007 [P] [US1] Create `k8s/api/service.yaml`: `Service`, `name: api`, `namespace: reactbin`, `type: ClusterIP`, `selector: {app: api}`, `ports: [{port: 8000, targetPort: 8000, name: http}]`
+
+- [X] T008 [P] [US1] Create `k8s/ui/service.yaml`: `Service`, `name: ui`, `namespace: reactbin`, `type: ClusterIP`, `selector: {app: ui}`, `ports: [{port: 8080, targetPort: 8080, name: http}]`
+
+- [X] T009 [P] [US1] Create `k8s/ui/deployment.yaml`: `Deployment`, `name: ui`, `namespace: reactbin`, 1 replica, `selector.matchLabels: {app: ui}`; container `name: ui`, `image: reactbin-ui:latest` (placeholder — operator substitutes real tag), `ports: [{containerPort: 8080}]`; `livenessProbe: {httpGet: {path: /, port: 8080}, initialDelaySeconds: 10, periodSeconds: 30}`; `securityContext: {runAsNonRoot: true, runAsUser: 101}` (UID 101 is the nginxinc/nginx-unprivileged user); add comment: `# Replace 'latest' with the real image tag before applying`
+
+- [X] T010 [US1] Create `k8s/api/deployment.yaml`: `Deployment`, `name: api`, `namespace: reactbin`, 1 replica, `selector.matchLabels: {app: api}`; container `name: api`, `image: reactbin-api:latest` (placeholder), `ports: [{containerPort: 8000}]`; `envFrom: [{secretRef: {name: api-env}}]`; `env: [{name: API_DOCS_ENABLED, value: "false"}]`; `livenessProbe: {httpGet: {path: /api/v1/health, port: 8000}, initialDelaySeconds: 10, periodSeconds: 30}`; `readinessProbe: {httpGet: {path: /api/v1/health, port: 8000}, initialDelaySeconds: 5, periodSeconds: 10}`; `securityContext: {runAsNonRoot: true, runAsUser: 1001}`; add comment: `# initContainers block added in US3 (T015)`; add comment: `# Replace 'latest' with the real image tag before applying`
+
+- [X] T011 [US1] Create `k8s/ingress.yaml`: `Ingress`, `name: reactbin`, `namespace: reactbin`; `annotations: {"cert-manager.io/cluster-issuer": "letsencrypt-prod", "nginx.ingress.kubernetes.io/ssl-redirect": "true"}`; `spec.ingressClassName: nginx`; `spec.tls: [{hosts: ["<your-domain>"], secretName: reactbin-tls}]`; `spec.rules: [{host: "<your-domain>", http: {paths: [{path: /api/, pathType: Prefix, backend: {service: {name: api, port: {number: 8000}}}}, {path: /, pathType: Prefix, backend: {service: {name: ui, port: {number: 8080}}}}]}}]`; IMPORTANT — `/api/` path entry MUST appear before `/` in the YAML (nginx evaluates in declaration order); add comment: `# Replace <your-domain> with the real domain before applying`
+
+- [X] T012 [US1] Verify US1: run `yamllint -d relaxed k8s/` from the repository root and confirm no errors; run `kubectl apply --dry-run=client -f k8s/` (requires cluster kubeconfig) and confirm all resources in namespace.yaml, vault/, api/, ui/, and ingress.yaml are accepted; if no cluster is available, yamllint passing is sufficient for this checkpoint
+
+**Checkpoint**: US1 complete. API and UI manifests are schema-valid and ready to apply.
+
+---
+
+## Phase 4: User Story 2 — Secrets Sourced from Vault (Priority: P2)
+
+**Goal**: Confirm that no plaintext secret values appear in any committed manifest file. The implementation (VaultAuth + VaultStaticSecret × 2) was completed in Phase 2.
+
+**Independent Test**: `git grep` across `k8s/` finds no plaintext credential values.
+
+- [X] T013 [US2] Verify US2: run `git grep -rn "password\|secret_key\|access_key\|DATABASE_URL" k8s/` and confirm that only key names (in comments) and Vault path references appear — no actual values; also confirm that `k8s/vault/api-secret.yaml` and `k8s/vault/minio-secret.yaml` reference Vault paths under `spec.path` and that `spec.destination.create: true` is set so VSO creates the K8s Secrets
+
+**Checkpoint**: US2 complete. Zero plaintext secrets in manifests; all secrets flow through Vault.
+
+---
+
+## Phase 5: User Story 3 — Schema Migrations Run Before API Starts (Priority: P3)
+
+**Goal**: The API Deployment includes an Alembic init container. `api/Dockerfile.prod` is updated to include migration files.
+
+**Independent Test**: `docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:test` succeeds and `docker run --rm reactbin-api-prod:test ls /app/alembic` shows migration files. `make validate-k8s` confirms the init container spec is accepted by the Kubernetes schema.
+
+- [X] T014 [US3] Update `api/Dockerfile.prod`: in the **runtime stage** (the `FROM python:3.12-slim` stage), after the line `COPY --chown=appuser:appgroup app/ ./app/`, add two new lines: `COPY --chown=appuser:appgroup alembic/ ./alembic/` and `COPY --chown=appuser:appgroup alembic.ini .`; the builder stage is unchanged; verify with `docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:test && docker run --rm reactbin-api-prod:test ls /app/alembic /app/alembic.ini`
+
+- [X] T015 [US3] Update `k8s/api/deployment.yaml`: add an `initContainers` block to the pod spec (before the `containers` block) containing one init container: `name: alembic-migrate`, `image: reactbin-api:latest` (same placeholder tag as the main container), `command: ["alembic", "upgrade", "head"]`, `workingDir: /app`, `envFrom: [{secretRef: {name: api-env}}]`, `securityContext: {runAsNonRoot: true, runAsUser: 1001}`; remove the `# initContainers block added in US3 (T015)` comment added in T010
+
+- [X] T016 [US3] Verify US3: run `make validate-k8s` (or `yamllint -d relaxed k8s/`) and confirm the updated deployment.yaml with the init container passes validation; run `docker build -f api/Dockerfile.prod api/ -t reactbin-api-prod:test` and confirm it succeeds; run `docker run --rm reactbin-api-prod:test ls /app/alembic.ini` and confirm the file is present
+
+**Checkpoint**: US3 complete. API Deployment includes Alembic init container; production image includes migration files.
+
+---
+
+## Phase 6: User Story 4 — MinIO In-Cluster with Persistent Storage (Priority: P4)
+
+**Goal**: MinIO runs as a StatefulSet with a PVC, is accessible only within the cluster, and has the required bucket created by a Job.
+
+**Independent Test**: `make validate-k8s` confirms all MinIO manifests pass schema validation. On a live cluster: MinIO pod reaches Running state, bucket exists, no external Ingress for MinIO.
+
+- [X] T017 [P] [US4] Create `k8s/minio/service.yaml`: `Service`, `name: minio`, `namespace: reactbin`, `type: ClusterIP`, `selector: {app: minio}`, `ports: [{port: 9000, targetPort: 9000, name: s3}]`; add comment: `# No Ingress for MinIO — internal access only (FR-012)`
+
+- [X] T018 [US4] Create `k8s/minio/statefulset.yaml`: `StatefulSet` (NOT Deployment — StatefulSet ensures stable PVC binding on pod recreation), `name: minio`, `namespace: reactbin`, `replicas: 1`, `selector.matchLabels: {app: minio}`, `serviceName: minio`; pod `securityContext: {runAsUser: 1000, runAsGroup: 1000, fsGroup: 1000}`; container `name: minio`, `image: minio/minio:latest`, `args: ["server", "/data", "--console-address", ":9001"]`, `ports: [{containerPort: 9000, name: s3}]`; `env: [{name: MINIO_ROOT_USER, valueFrom: {secretKeyRef: {name: minio-credentials, key: MINIO_ROOT_USER}}}, {name: MINIO_ROOT_PASSWORD, valueFrom: {secretKeyRef: {name: minio-credentials, key: MINIO_ROOT_PASSWORD}}}]`; `livenessProbe: {httpGet: {path: /minio/health/live, port: 9000}, initialDelaySeconds: 30, periodSeconds: 20}`; `readinessProbe: {httpGet: {path: /minio/health/ready, port: 9000}, initialDelaySeconds: 15, periodSeconds: 10}`; `volumeMounts: [{name: minio-data, mountPath: /data}]`; `volumeClaimTemplates: [{metadata: {name: minio-data}, spec: {accessModes: [ReadWriteOnce], resources: {requests: {storage: 10Gi}}}}]`; add comment: `# storageClassName omitted — uses cluster default; override if needed`
+
+- [X] T019 [US4] Create `k8s/minio/init-job.yaml`: `Job`, `name: minio-init-bucket`, `namespace: reactbin`; `spec.template.spec.restartPolicy: OnFailure`; container `name: mc`, `image: minio/mc:latest`, `command: ["sh", "-c"]`, `args: ["mc alias set local http://minio.reactbin.svc.cluster.local:9000 $MINIO_ROOT_USER $MINIO_ROOT_PASSWORD && mc mb --ignore-existing local/reactbin"]`; `env: [{name: MINIO_ROOT_USER, valueFrom: {secretKeyRef: {name: minio-credentials, key: MINIO_ROOT_USER}}}, {name: MINIO_ROOT_PASSWORD, valueFrom: {secretKeyRef: {name: minio-credentials, key: MINIO_ROOT_PASSWORD}}}]`; `securityContext: {runAsNonRoot: false}` with comment `# minio/mc runs as root by default; FR-013 exception for this one-off init Job`; add comment: `# --ignore-existing makes this Job idempotent — safe to re-apply`
+
+- [X] T020 [US4] Verify US4: run `make validate-k8s` (or `yamllint -d relaxed k8s/`) and confirm all three MinIO manifests (statefulset.yaml, service.yaml, init-job.yaml) pass validation; confirm no Ingress resource references MinIO
+
+**Checkpoint**: All four user stories complete.
+
+---
+
+## Phase 7: Polish & Cross-Cutting Concerns
+
+- [X] T021 [P] Run `yamllint -d relaxed k8s/` from the repository root and fix any YAML formatting violations across all 12 manifest files; confirm output shows no errors
+
+- [X] T022 [P] Add `.yamllint.yml` at the repository root (if not already present) with `extends: relaxed` and `rules: {line-length: {max: 120}}` to keep line length reasonable for verbose K8s YAML
+
+- [X] T023 Run `make build-prod` to confirm `api/Dockerfile.prod` still builds cleanly after the T014 addition; run `docker run --rm reactbin-api-prod:latest ls /app/alembic.ini /app/alembic/` and confirm both are present in the production image
+
+---
+
+## Dependencies & Execution Order
+
+- T001 and T002 can run in parallel (directory creation vs Makefile edit)
+- T003, T004, T005, T006 can run in parallel after T001 (different files, same phase)
+- T007, T008, T009 can run in parallel after Phase 2 completes
+- T010 after T007 (deployment references service name, easier to write with service done) — but they're different files so technically parallel; keep sequential for clarity
+- T011 after T007 and T008 (Ingress references both service names)
+- T012 after T007–T011
+- T013 after Phase 2 (Vault CRDs exist to inspect)
+- T014 and T015 can run in parallel (different files: Dockerfile.prod vs deployment.yaml)
+- T016 after T014 and T015
+- T017, T018, T019 can run in parallel after Phase 2 completes
+- T020 after T017–T019
+- T021, T022, T023 can run in parallel
+
+### Execution Order Summary
+
+```
+Step 1: T001 ∥ T002                          (setup)
+Step 2: T003 ∥ T004 ∥ T005 ∥ T006           (foundational: namespace + Vault CRDs)
+Step 3: T007 ∥ T008 ∥ T009                  (US1: services + UI deployment)
+Step 4: T010                                  (US1: API deployment)
+Step 5: T011                                  (US1: Ingress)
+Step 6: T012                                  (US1: validate)
+Step 7: T013                                  (US2: verify no plaintext secrets)
+Step 8: T014 ∥ T015                          (US3: Dockerfile.prod + init container)
+Step 9: T016                                  (US3: verify)
+Step 10: T017 ∥ T018 ∥ T019                 (US4: MinIO manifests)
+Step 11: T020                                 (US4: validate MinIO)
+Step 12: T021 ∥ T022 ∥ T023                 (polish)
+```
+
+---
+
+## Implementation Strategy
+
+### MVP (US1 + US2 — application is reachable with Vault-backed secrets)
+
+1. Phase 1 (Setup) + Phase 2 (Foundational)
+2. Phase 3 (US1 — API, UI, Ingress)
+3. Phase 4 (US2 — verify no plaintext secrets)
+4. **STOP and VALIDATE**: apply to cluster, confirm `https://<domain>/` and `/api/v1/health` return 200
+5. Deploy MVP
+
+### Incremental Delivery
+
+1. Setup + Foundational → Apply → namespace and Vault sync ready
+2. Add US1 (API + UI + Ingress) → Deploy → application reachable at domain
+3. Add US3 (Alembic init container) → Deploy → migrations run automatically on rollout
+4. Add US4 (MinIO) → Deploy → persistent image storage in-cluster
+5. Polish → clean YAML, confirmed builds
--- a/specs/014-r2-cdn-serving/checklists/requirements.md
+++ b/specs/014-r2-cdn-serving/checklists/requirements.md
@@ -0,0 +1,34 @@
+# Specification Quality Checklist: CDN Image Serving
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-05-08
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [X] No implementation details (languages, frameworks, APIs)
+- [X] Focused on user value and business needs
+- [X] Written for non-technical stakeholders
+- [X] All mandatory sections completed
+
+## Requirement Completeness
+
+- [X] No [NEEDS CLARIFICATION] markers remain
+- [X] Requirements are testable and unambiguous
+- [X] Success criteria are measurable
+- [X] Success criteria are technology-agnostic (no implementation details)
+- [X] All acceptance scenarios are defined
+- [X] Edge cases are identified
+- [X] Scope is clearly bounded
+- [X] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [X] All functional requirements have clear acceptance criteria
+- [X] User scenarios cover primary flows
+- [X] Feature meets measurable outcomes defined in Success Criteria
+- [X] No implementation details leak into specification
+
+## Notes
+
+- All items pass. Ready for `/speckit-plan`.
--- a/specs/014-r2-cdn-serving/contracts/image-response.md
+++ b/specs/014-r2-cdn-serving/contracts/image-response.md
@@ -0,0 +1,54 @@
+# Contract: Image Metadata Response
+
+**Version**: 2.0 (adds `file_url`, `thumbnail_url`)
+**Endpoints affected**: `GET /api/v1/images`, `GET /api/v1/images/{id}`, `POST /api/v1/images`, `PATCH /api/v1/images/{id}/tags`
+
+## Response Schema
+
+```json
+{
+  "id": "550e8400-e29b-41d4-a716-446655440000",
+  "hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
+  "filename": "reaction.gif",
+  "mime_type": "image/gif",
+  "size_bytes": 204800,
+  "width": 480,
+  "height": 270,
+  "storage_key": "e3b0c44298fc1c149afbf4c8996fb924",
+  "thumbnail_key": "e3b0c44298fc1c149afbf4c8996fb924.thumb",
+  "file_url": "https://cdn.reactbin.juggalol.com/e3b0c44298fc1c149afbf4c8996fb924",
+  "thumbnail_url": "https://cdn.reactbin.juggalol.com/e3b0c44298fc1c149afbf4c8996fb924.thumb",
+  "created_at": "2026-05-08T12:00:00.000000",
+  "tags": ["funny", "reaction"]
+}
+```
+
+## Field Descriptions
+
+| Field | Type | Nullable | Notes |
+|-------|------|----------|-------|
+| `id` | string (UUID) | No | Stable image identifier |
+| `hash` | string (hex) | No | SHA-256 of file content; deduplication key |
+| `filename` | string | No | Original upload filename |
+| `mime_type` | string | No | One of: `image/jpeg`, `image/png`, `image/gif`, `image/webp` |
+| `size_bytes` | integer | No | File size in bytes |
+| `width` | integer | No | Image width in pixels |
+| `height` | integer | No | Image height in pixels |
+| `storage_key` | string | No | Object storage key (retained for backward compat) |
+| `thumbnail_key` | string | Yes | Thumbnail object storage key; null if generation failed |
+| `file_url` | string | No | Full URL to fetch the image file — CDN URL in production, API proxy path in local dev |
+| `thumbnail_url` | string | Yes | Full URL to fetch the thumbnail — CDN URL in production, API proxy path in local dev; null if no thumbnail |
+| `created_at` | string (ISO 8601) | No | Upload timestamp |
+| `tags` | string[] | No | Lowercase normalised tag list |
+| `duplicate` | boolean | Yes | Present only on upload responses; true if hash matched an existing image |
+
+## URL Behaviour
+
+| Configuration | `file_url` example | `thumbnail_url` example |
+|---------------|--------------------|------------------------|
+| `S3_PUBLIC_BASE_URL` set | `https://cdn.reactbin.juggalol.com/{storage_key}` | `https://cdn.reactbin.juggalol.com/{thumbnail_key}` |
+| `S3_PUBLIC_BASE_URL` not set | `/api/v1/images/{id}/file` | `/api/v1/images/{id}/thumbnail` |
+
+## UI Contract
+
+The UI MUST use `file_url` and `thumbnail_url` from the response to render images. The UI MUST NOT construct image URLs from `id`, `storage_key`, or `thumbnail_key` directly. The UI MUST treat `thumbnail_url: null` as "no thumbnail available" and fall back to `file_url` for display.
--- a/specs/014-r2-cdn-serving/plan.md
+++ b/specs/014-r2-cdn-serving/plan.md
@@ -0,0 +1,137 @@
+# Implementation Plan: CDN Image Serving
+
+**Branch**: `014-r2-cdn-serving` | **Date**: 2026-05-08 | **Spec**: [spec.md](spec.md)
+**Input**: Feature specification from `specs/014-r2-cdn-serving/spec.md`
+
+## Summary
+
+Extend the image metadata API response to include `file_url` and `thumbnail_url` fields. When `S3_PUBLIC_BASE_URL` is configured, these fields contain CDN URLs pointing directly to Cloudflare R2. When unconfigured, they fall back to the existing API proxy paths so local development requires no setup changes. The UI is updated to use these response fields instead of constructing proxy URLs client-side. Proxy endpoints are retained unchanged.
+
+## Technical Context
+
+**Language/Version**: Python 3.12 (API), TypeScript strict mode (UI)
+**Primary Dependencies**: FastAPI, SQLAlchemy 2.x async, Angular (latest stable), pydantic-settings
+**Storage**: PostgreSQL (image metadata), S3-compatible object storage (R2 in production, MinIO in dev)
+**Testing**: pytest (unit + integration), Angular component tests
+**Target Platform**: Linux (k3s), local Docker Compose
+**Project Type**: Web service (API) + SPA (UI)
+**Performance Goals**: No additional latency on API responses; image load latency reduced by eliminating API proxy hop in production
+**Constraints**: No breaking changes to existing API response fields; proxy endpoints must remain functional
+**Scale/Scope**: Single-owner app; ~100 existing images migrated to R2 prior to this feature
+
+## Constitution Check
+
+| Principle | Status | Notes |
+|-----------|--------|-------|
+| §2.1 Strict separation of concerns | PASS | URL construction stays in router layer; storage backend unchanged |
+| §2.3 Storage abstraction | PASS | No changes to `StorageBackend` interface or `S3StorageBackend` |
+| §2.6 No speculative abstraction | PASS | No new interfaces introduced; URL logic is a simple helper |
+| §3.1 API versioning (`/api/v1/`) | PASS | Adding fields to response is non-breaking per §3.1 |
+| §3.2 OpenAPI as contract | PASS | New fields documented in contracts/image-response.md |
+| §5.1 Tests alongside implementation | REQUIRED | Unit tests for URL construction; integration tests for response fields |
+| §7.2 Environment configuration | PASS | `S3_PUBLIC_BASE_URL` via env var; no hardcoded URLs |
+
+No constitution violations. All gates pass.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/014-r2-cdn-serving/
+├── plan.md                     # This file
+├── research.md                 # Technical decisions
+├── contracts/
+│   └── image-response.md       # Updated image response schema
+├── quickstart.md               # Integration test scenarios
+└── tasks.md                    # Phase 2 output (speckit-tasks)
+```
+
+### Source Code Changes
+
+```text
+api/
+├── app/
+│   ├── config.py               # Add: s3_public_base_url: str | None = None
+│   └── routers/
+│       └── images.py           # Update: _image_to_dict gains cdn_base param;
+│                               #         add file_url + thumbnail_url to response;
+│                               #         pass cdn_base from get_settings() at endpoint level
+├── tests/
+│   ├── unit/
+│   │   └── test_url_construction.py   # New: pure unit tests for URL logic
+│   └── integration/
+│       └── test_images.py      # Update: assert file_url + thumbnail_url present in responses
+
+ui/src/app/
+├── services/
+│   └── image.service.ts        # Update: add file_url/thumbnail_url to ImageRecord;
+│                               #         remove getFileUrl()/getThumbnailUrl() methods
+├── library/
+│   └── library.component.ts    # Update: use img.thumbnail_url instead of getThumbnailUrl(img.id)
+└── detail/
+    └── detail.component.ts     # Update: use img.file_url instead of getFileUrl(img.id)
+
+.env.example                    # Add: S3_PUBLIC_BASE_URL= (empty = local dev proxy fallback)
+```
+
+## Key Implementation Details
+
+### URL construction logic (`api/app/routers/images.py`)
+
+`_image_to_dict` gains a `cdn_base: str | None` parameter:
+
+```python
+def _image_to_dict(image: Image, *, cdn_base: str | None = None, duplicate: bool | None = None):
+    base = cdn_base.rstrip("/") if cdn_base else None
+    file_url = f"{base}/{image.storage_key}" if base else f"/api/v1/images/{image.id}/file"
+    thumbnail_url = (
+        (f"{base}/{image.thumbnail_key}" if base else f"/api/v1/images/{image.id}/thumbnail")
+        if image.thumbnail_key else None
+    )
+    return {
+        ...,          # existing fields unchanged
+        "file_url": file_url,
+        "thumbnail_url": thumbnail_url,
+    }
+```
+
+Each endpoint calls `get_settings()` once and passes `settings.s3_public_base_url` as `cdn_base`.
+
+### Config addition (`api/app/config.py`)
+
+```python
+s3_public_base_url: str | None = None
+```
+
+No validator needed — `None` is the valid "not configured" state.
+
+### UI changes (`ui/src/app/services/image.service.ts`)
+
+`ImageRecord` gains two new fields:
+```typescript
+file_url: string;
+thumbnail_url: string | null;
+```
+
+`getFileUrl(id)` and `getThumbnailUrl(id)` methods are removed. Components use `image.file_url` and `image.thumbnail_url` directly.
+
+## Phase Breakdown
+
+### Phase 1: API — config + URL construction (US1 foundation)
+- Add `s3_public_base_url` to config
+- Update `_image_to_dict` with `cdn_base` parameter
+- Update all call sites to pass `cdn_base` from settings
+- Unit tests for URL construction (both CDN and fallback paths)
+- Integration tests verifying `file_url`/`thumbnail_url` in all image responses
+
+### Phase 2: UI — consume response URLs (US1 + US2)
+- Update `ImageRecord` interface
+- Remove `getFileUrl`/`getThumbnailUrl` methods from service
+- Update library component
+- Update detail component
+- Update service tests
+
+### Phase 3: Config + docs
+- Add `S3_PUBLIC_BASE_URL` to `.env.example`
+- Manual end-to-end verification (local dev + production)
--- a/specs/014-r2-cdn-serving/quickstart.md
+++ b/specs/014-r2-cdn-serving/quickstart.md
@@ -0,0 +1,66 @@
+# Quickstart: CDN Image Serving
+
+## Local development (no CDN)
+
+No configuration change required. `S3_PUBLIC_BASE_URL` is unset by default.
+
+```bash
+docker compose up
+```
+
+Upload an image and inspect the API response:
+
+```bash
+curl -s http://localhost:8000/api/v1/images | jq '.items[0] | {file_url, thumbnail_url}'
+```
+
+Expected (local dev — relative proxy paths):
+```json
+{
+  "file_url": "/api/v1/images/550e8400-.../file",
+  "thumbnail_url": "/api/v1/images/550e8400-.../thumbnail"
+}
+```
+
+The UI loads images via these relative paths, which hit the API proxy as before.
+
+---
+
+## Production (CDN configured)
+
+Add `S3_PUBLIC_BASE_URL` to the Vault secret bundle at `reactbin/api/config`:
+
+```
+S3_PUBLIC_BASE_URL = https://cdn.reactbin.juggalol.com
+```
+
+Force VSO sync and restart:
+
+```bash
+kubectl annotate vaultstaticsecret api-secret -n reactbin \
+  secrets.hashicorp.com/force-sync=$(date +%s) --overwrite
+
+kubectl rollout restart deployment/api -n reactbin
+```
+
+Upload a test image and inspect the response:
+
+```bash
+curl -s https://reactbin.juggalol.com/api/v1/images | jq '.items[0] | {file_url, thumbnail_url}'
+```
+
+Expected (production — CDN URLs):
+```json
+{
+  "file_url": "https://cdn.reactbin.juggalol.com/e3b0c44...",
+  "thumbnail_url": "https://cdn.reactbin.juggalol.com/e3b0c44....thumb"
+}
+```
+
+Open the browser network panel on the library page and confirm image requests go to `cdn.reactbin.juggalol.com`, not `/api/`.
+
+---
+
+## Verifying existing images after migration
+
+All existing images were migrated to R2 with the same object keys before this feature was deployed. Once `S3_PUBLIC_BASE_URL` is configured, the API will return CDN URLs for all images immediately — no per-image migration step is needed.
--- a/specs/014-r2-cdn-serving/research.md
+++ b/specs/014-r2-cdn-serving/research.md
@@ -0,0 +1,51 @@
+# Research: CDN Image Serving
+
+## Decision 1: Where does URL construction logic live?
+
+**Decision**: In the image router's `_image_to_dict` helper, not in the `StorageBackend`.
+
+**Rationale**: The `StorageBackend` interface is responsible for put/get/delete of object bytes. Adding URL construction there conflates two concerns — storage operations and HTTP URL generation — and would require the storage abstraction to know about CDN configuration. The router already has access to application settings via `get_settings()` and knows the image ID and storage key, making it the natural place to construct URLs.
+
+**Alternatives considered**: Adding a `get_url(key)` method to `StorageBackend` — rejected because it leaks HTTP/CDN concerns into the storage abstraction, violating §2.3.
+
+---
+
+## Decision 2: Fallback URL format in local development
+
+**Decision**: Relative paths (`/api/v1/images/{id}/file`, `/api/v1/images/{id}/thumbnail`) when `S3_PUBLIC_BASE_URL` is not set.
+
+**Rationale**: Relative paths work regardless of the host the app is running on, require no additional configuration, and match how the UI currently constructs these URLs via `getFileUrl(id)` and `getThumbnailUrl(id)`. An absolute fallback would require `API_BASE_URL` to be set in local dev, adding unnecessary setup friction.
+
+**Alternatives considered**: Absolute URL fallback using `API_BASE_URL` — rejected because it adds a mandatory config dependency where none exists today.
+
+---
+
+## Decision 3: Trailing slash normalisation
+
+**Decision**: Strip trailing slash from `S3_PUBLIC_BASE_URL` at construction time using `rstrip('/')` in the config validator or at point of use.
+
+**Rationale**: Prevents double-slash URLs (`https://cdn.example.com//key`) if the operator includes a trailing slash in the configured value. Simple, defensive, zero-cost.
+
+---
+
+## Decision 4: Proxy endpoints retained or removed?
+
+**Decision**: Retained, fully functional, unchanged.
+
+**Rationale**: Spec FR-005 explicitly requires them. They serve as the local dev fallback and a safety net if the CDN is temporarily unavailable or misconfigured. Removing them would break local development immediately.
+
+---
+
+## Decision 5: `storage_key` and `thumbnail_key` in API response
+
+**Decision**: Keep both fields in the response alongside the new `file_url` and `thumbnail_url`.
+
+**Rationale**: Removing them is a breaking API change. The UI currently reads `thumbnail_key` to decide whether a thumbnail exists. After this change the UI will use `thumbnail_url` (null when no thumbnail), but the keys remain in the response for backward compatibility with any tooling.
+
+---
+
+## Decision 6: Settings access in `_image_to_dict`
+
+**Decision**: Pre-compute the CDN base URL string once per request at the endpoint level and pass it into `_image_to_dict` as a parameter, rather than calling `get_settings()` inside the helper.
+
+**Rationale**: Keeps `_image_to_dict` a pure function (easier to test), avoids calling `get_settings()` inside a helper that is called in a loop (image list endpoint), and makes the dependency explicit.
--- a/specs/014-r2-cdn-serving/spec.md
+++ b/specs/014-r2-cdn-serving/spec.md
@@ -0,0 +1,93 @@
+# Feature Specification: CDN Image Serving
+
+**Feature Branch**: `014-r2-cdn-serving`
+**Created**: 2026-05-08
+**Status**: Draft
+**Input**: User description: "R2 CDN image serving with local dev fallback to API proxy"
+
+## Overview
+
+Images and thumbnails are currently served by proxying bytes through the API. This feature changes image delivery so that clients receive direct URLs pointing to a CDN edge network, eliminating the API as a middleman for image content. In local development, where no CDN is available, the API proxy endpoints remain as a fallback so the developer experience is unchanged.
+
+---
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Images Load Directly from CDN (Priority: P1)
+
+When a visitor views the image library or opens an image detail page, images and thumbnails are fetched directly from the CDN rather than through the application server. The page loads faster because image bytes no longer pass through the API.
+
+**Why this priority**: Core value of the feature. Reduces API load and improves image load speed for all users.
+
+**Independent Test**: Upload an image, open the library page, and inspect the network requests. Image and thumbnail requests should go directly to the CDN domain, not to `/api/`. The API response for the image list should include direct CDN URLs for each image and thumbnail.
+
+**Acceptance Scenarios**:
+
+1. **Given** a published image, **When** the visitor loads the image library, **Then** each thumbnail `src` URL points to the CDN domain and loads without passing through the API
+2. **Given** a published image, **When** the visitor opens the detail page, **Then** the full image `src` URL points to the CDN domain
+3. **Given** the API returns image metadata, **When** the response is inspected, **Then** it includes a `file_url` and `thumbnail_url` field containing full CDN URLs
+
+---
+
+### User Story 2 - Local Development Works Without CDN (Priority: P2)
+
+In local development, where no CDN is configured, images continue to load via the existing API proxy endpoints. No additional setup is required to run the application locally.
+
+**Why this priority**: Developer experience must not regress. The proxy endpoints must remain functional and be used automatically when no CDN is configured.
+
+**Independent Test**: Run the application locally without setting a public base URL. Upload an image. Verify the library and detail pages load images correctly via the API proxy endpoints, with no errors or broken images.
+
+**Acceptance Scenarios**:
+
+1. **Given** no CDN base URL is configured, **When** the API returns image metadata, **Then** `file_url` and `thumbnail_url` point to the API proxy paths (e.g. `/api/v1/images/{id}/file`)
+2. **Given** no CDN base URL is configured, **When** a visitor views the library, **Then** thumbnails load via the API proxy with no broken images
+3. **Given** a CDN base URL is configured, **When** the application starts, **Then** all image URLs use the CDN domain instead of the proxy paths
+
+---
+
+### Edge Cases
+
+- What happens when the CDN base URL is set but the object does not exist in CDN storage? The browser receives a 404 from the CDN — the API does not re-proxy the content.
+- What happens if an image has no thumbnail (thumbnail generation failed)? The `thumbnail_url` field is absent or null; the UI falls back to the full image URL as it does today.
+- What happens if the CDN base URL has a trailing slash? The system normalises the URL to avoid double slashes in constructed paths.
+
+---
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The API MUST include a `file_url` field in all image metadata responses, containing the full URL from which the image file can be fetched
+- **FR-002**: The API MUST include a `thumbnail_url` field in all image metadata responses when a thumbnail exists, containing the full URL from which the thumbnail can be fetched
+- **FR-003**: When a CDN base URL is configured, `file_url` and `thumbnail_url` MUST point to the CDN domain
+- **FR-004**: When no CDN base URL is configured, `file_url` and `thumbnail_url` MUST point to the existing API proxy endpoints so local development continues to work without additional setup
+- **FR-005**: The existing API proxy endpoints (`/images/{id}/file`, `/images/{id}/thumbnail`) MUST remain functional regardless of whether a CDN base URL is configured
+- **FR-006**: The UI MUST use `file_url` and `thumbnail_url` from the API response to render images, rather than constructing proxy URLs client-side
+- **FR-007**: The CDN base URL MUST be configurable via environment variable; no value is required in local development
+- **FR-008**: A trailing slash in the configured CDN base URL MUST NOT result in double slashes in constructed image URLs
+- **FR-009**: When `thumbnail_url` is null, the UI MUST fall back to `file_url` for thumbnail display rather than rendering a broken image
+
+### Key Entities
+
+- **Image metadata response**: Extended to include `file_url` and `thumbnail_url` fields alongside existing fields (`id`, `filename`, `tags`, `width`, `height`, `mime_type`, etc.)
+
+---
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: In production, zero image or thumbnail requests pass through the API server — all are served directly by the CDN
+- **SC-002**: Local development requires no additional configuration beyond what is already required — `docker compose up` continues to work with images loading correctly
+- **SC-003**: All existing image-related API integration tests continue to pass after the change
+- **SC-004**: Image metadata responses include `file_url` and `thumbnail_url` fields for 100% of images that have been successfully stored
+
+---
+
+## Assumptions
+
+- The CDN storage bucket and public domain are already configured and operational before this feature is deployed — this feature only changes how URLs are constructed and served, not how objects are stored
+- Object keys in CDN storage are identical to those used in the existing storage backend — no key remapping is needed
+- The CDN serves objects publicly without authentication — no signed URL generation is required
+- The existing API proxy endpoints are retained as functional fallbacks; the UI stops calling them in production but they are not removed
+- Local development uses the existing MinIO-backed proxy and does not require a locally running CDN
--- a/specs/014-r2-cdn-serving/tasks.md
+++ b/specs/014-r2-cdn-serving/tasks.md
@@ -0,0 +1,116 @@
+# Tasks: CDN Image Serving
+
+**Input**: Design documents from `specs/014-r2-cdn-serving/`
+**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, contracts/image-response.md ✅, quickstart.md ✅
+
+**Tests**: Unit tests for URL construction logic; integration tests asserting `file_url` and `thumbnail_url` in all image responses. Tests accompany each implementation task per §5.1.
+
+**Organization**: Phase 1 adds the config value (foundational — blocks everything). Phase 2 implements US1 (CDN URL serving in API + UI consumption). Phase 3 verifies US2 (local dev fallback). Polish runs the full suite and manual end-to-end check.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel (different files, no dependencies)
+- **[Story]**: Which user story this task belongs to
+
+---
+
+## Phase 1: Foundational (Config)
+
+**Goal**: Add `s3_public_base_url` to config and `.env.example`. All US1 and US2 tasks depend on this.
+
+**⚠️ CRITICAL**: No user story work can begin until this phase is complete.
+
+- [X] T001 Add `s3_public_base_url: str | None = None` to the `Settings` class in `api/app/config.py` (after `api_base_url`); add `S3_PUBLIC_BASE_URL=` with comment "# CDN base URL for serving images (e.g. https://cdn.example.com). Leave empty in local dev to use API proxy fallback." to `.env.example` after the `API_BASE_URL` line
+
+**Checkpoint**: Config in place — user story work can begin.
+
+---
+
+## Phase 2: User Story 1 — Images Load Directly from CDN (Priority: P1) 🎯 MVP
+
+**Goal**: API returns `file_url` and `thumbnail_url` in all image responses; UI uses those fields to render images rather than constructing proxy URLs client-side.
+
+**Independent Test**: With `S3_PUBLIC_BASE_URL=https://cdn.reactbin.juggalol.com` set, call `GET /api/v1/images` and confirm each item has `file_url` starting with `https://cdn.reactbin.juggalol.com/` and `thumbnail_url` starting with `https://cdn.reactbin.juggalol.com/` (or null). Open the library page in a browser and confirm image requests go to the CDN domain in the network panel.
+
+- [X] T002 [US1] Write unit tests in `api/tests/unit/test_url_construction.py` covering four cases: (1) CDN base set, image has thumbnail — `file_url` and `thumbnail_url` are CDN URLs; (2) CDN base set, image has no thumbnail — `thumbnail_url` is None; (3) CDN base not set, image has thumbnail — `file_url` is `/api/v1/images/{id}/file` and `thumbnail_url` is `/api/v1/images/{id}/thumbnail`; (4) CDN base not set, no thumbnail — `thumbnail_url` is None. Test the trailing-slash normalisation case (CDN base with trailing slash produces no double-slash). Import and call `_image_to_dict` directly with a mock `Image` object.
+
+- [X] T003 [US1] Update `_image_to_dict` in `api/app/routers/images.py`: add `cdn_base: str | None = None` keyword parameter; compute `_base = cdn_base.rstrip("/") if cdn_base else None`; set `file_url = f"{_base}/{image.storage_key}" if _base else f"/api/v1/images/{image.id}/file"`; set `thumbnail_url = (f"{_base}/{image.thumbnail_key}" if _base else f"/api/v1/images/{image.id}/thumbnail") if image.thumbnail_key else None`; add `"file_url": file_url` and `"thumbnail_url": thumbnail_url` to the returned dict. Run `make test-unit` and confirm T002 tests pass.
+
+- [X] T004 [US1] Update every `_image_to_dict(...)` call site in `api/app/routers/images.py`: at the top of each endpoint function that calls `_image_to_dict`, add `_cdn_base = get_settings().s3_public_base_url` (import `get_settings` is already present); pass `cdn_base=_cdn_base` to every `_image_to_dict` call in that endpoint. Affected endpoints: `upload_image`, `list_images`, `get_image`, `patch_image_tags`. Confirm `get_settings()` is called once per endpoint, not once per image in a loop (for `list_images`, call it before the list comprehension).
+
+- [X] T005 [US1] Update integration tests: in `api/tests/integration/test_upload.py`, add assertions after existing response checks that `"file_url"` is present in the response body and starts with `/api/v1/images/` (since no CDN is configured in test env); add the same assertion for `"thumbnail_url"` in `test_upload_returns_thumbnail_key`; add assertion that `thumbnail_url` is None in the test that expects `thumbnail_key` to be None. Run `make test-integration` and confirm all pass.
+
+- [X] T006 [P] [US1] Update `ui/src/app/services/image.service.ts`: add `file_url: string` and `thumbnail_url: string | null` to the `ImageRecord` interface; remove the `getFileUrl(id: string): string` method; remove the `getThumbnailUrl(id: string): string` method.
+
+- [X] T007 [P] [US1] Update `ui/src/app/library/library.component.ts`: replace `[src]="imageService.getThumbnailUrl(img.id)"` (line 77) with `[src]="img.thumbnail_url ?? img.file_url"` — fall back to `file_url` when thumbnail is absent (FR-009); update `ui/src/app/library/library.component.spec.ts` to add `file_url` and `thumbnail_url` to any mock `ImageRecord` objects and remove any references to `getThumbnailUrl()`.
+
+- [X] T008 [P] [US1] Update `ui/src/app/detail/detail.component.ts`: replace `[src]="imageService.getFileUrl(image.id)"` (line 52) with `[src]="image.file_url"`; update `ui/src/app/detail/detail.component.spec.ts` to add `file_url` and `thumbnail_url` to any mock `ImageRecord` objects and remove any references to `getFileUrl()`.
+
+- [X] T009 [US1] Update `ui/src/app/services/image.service.spec.ts`: add `file_url` and `thumbnail_url` fields to any mock `ImageRecord` objects used in tests; remove any test cases that test `getFileUrl()` or `getThumbnailUrl()` (these methods no longer exist). Run UI tests and confirm they pass.
+
+**Checkpoint**: US1 complete. API returns CDN URLs when configured; UI uses response fields to render images.
+
+---
+
+## Phase 3: User Story 2 — Local Development Works Without CDN (Priority: P2)
+
+**Goal**: Confirm that with no `S3_PUBLIC_BASE_URL` configured, `file_url` and `thumbnail_url` fall back to API proxy paths and images load correctly in local dev.
+
+**Independent Test**: Run `make test-unit && make test-integration` with no `S3_PUBLIC_BASE_URL` set (the default). Confirm all tests pass and that `file_url` values in integration test responses begin with `/api/v1/images/`.
+
+- [X] T010 [US2] Verify US2: run `make test-unit` and confirm the url-construction unit tests for the "no CDN base" case (T002 cases 3 and 4) pass; run `make test-integration` and confirm the updated upload tests (T005) pass — they already assert relative proxy paths since the test environment has no `S3_PUBLIC_BASE_URL`. Confirm `docker compose up` starts cleanly and images load in the browser via the proxy paths with no console errors.
+
+**Checkpoint**: US2 verified. Local development requires no additional configuration.
+
+---
+
+## Phase 4: Polish & Cross-Cutting Concerns
+
+- [X] T011 [P] Run `ruff check api/app/routers/images.py api/app/config.py` and fix any lint issues; run `ruff format --check` and format if needed.
+
+- [X] T012 Run end-to-end verification per `specs/014-r2-cdn-serving/quickstart.md`: in production with `S3_PUBLIC_BASE_URL` set, call `GET /api/v1/images` and confirm `file_url` and `thumbnail_url` begin with `https://cdn.reactbin.juggalol.com/`; open the library page in a browser and confirm image requests in the network panel go to `cdn.reactbin.juggalol.com`, not `/api/`.
+
+---
+
+## Dependencies & Execution Order
+
+- T001 must complete before any other task
+- T002 before T003 (tests before implementation — unit test first)
+- T003 before T004 (update helper before call sites)
+- T004 before T005 (implementation before integration tests)
+- T006, T007, T008 can run in parallel after T001 (different files)
+- T009 after T006 (spec depends on updated interface)
+- T010 after T003–T009 (verification requires full implementation)
+- T011 after T003–T004 (lint the changed files)
+- T012 last (manual end-to-end)
+
+### Execution Order Summary
+
+```
+Step 1: T001                          (foundational: config)
+Step 2: T002                          (US1: unit tests first)
+Step 3: T003                          (US1: implement _image_to_dict)
+Step 4: T004 ∥ T006 ∥ T007 ∥ T008    (US1: call sites + UI in parallel)
+Step 5: T005 ∥ T009                   (US1: integration tests + service spec)
+Step 6: T010                          (US2: verify local dev fallback)
+Step 7: T011                          (polish: lint)
+Step 8: T012                          (polish: manual end-to-end)
+```
+
+---
+
+## Implementation Strategy
+
+### MVP (US1 only — CDN URLs in API + UI)
+
+1. T001 — config
+2. T002–T005 — API implementation and tests
+3. T006–T009 — UI updates
+4. **STOP and VALIDATE**: `make test-unit && make test-integration`, check browser network panel
+
+### Incremental Delivery
+
+1. T001–T005 (API only) → deploy → verify CDN URLs appear in API responses
+2. T006–T009 (UI) → deploy → verify browser fetches images from CDN
+3. T010 (local dev verification) → confirm fallback intact
+4. T011–T012 (polish + end-to-end) → ship
--- a/specs/015-library-pagination/checklists/requirements.md
+++ b/specs/015-library-pagination/checklists/requirements.md
@@ -0,0 +1,34 @@
+# Specification Quality Checklist: Library Pagination UI
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-05-09
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [X] No implementation details (languages, frameworks, APIs)
+- [X] Focused on user value and business needs
+- [X] Written for non-technical stakeholders
+- [X] All mandatory sections completed
+
+## Requirement Completeness
+
+- [X] No [NEEDS CLARIFICATION] markers remain
+- [X] Requirements are testable and unambiguous
+- [X] Success criteria are measurable
+- [X] Success criteria are technology-agnostic (no implementation details)
+- [X] All acceptance scenarios are defined
+- [X] Edge cases are identified
+- [X] Scope is clearly bounded
+- [X] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [X] All functional requirements have clear acceptance criteria
+- [X] User scenarios cover primary flows
+- [X] Feature meets measurable outcomes defined in Success Criteria
+- [X] No implementation details leak into specification
+
+## Notes
+
+- All items pass. Ready for `/speckit-plan`.
--- a/specs/015-library-pagination/contracts/pagination-query.md
+++ b/specs/015-library-pagination/contracts/pagination-query.md
@@ -0,0 +1,52 @@
+# Contract: Image List Pagination Query
+
+No new API endpoints are introduced. This document records the existing API contract the UI relies on for pagination.
+
+## Endpoint
+
+```
+GET /api/v1/images?limit={limit}&offset={offset}&tags={tags}
+```
+
+## Parameters
+
+| Parameter | Type    | Required | Description                                      |
+|-----------|---------|----------|--------------------------------------------------|
+| `limit`   | integer | No       | Images per page. UI sends `24`. Max is 100.      |
+| `offset`  | integer | No       | Number of images to skip. UI computes `(page-1) * 24`. |
+| `tags`    | string  | No       | Comma-separated tag names for AND-filter.        |
+
+## Response
+
+```json
+{
+  "items": [ /* ImageRecord[] */ ],
+  "total": 143,
+  "limit": 24,
+  "offset": 48
+}
+```
+
+| Field    | Type    | Description                                      |
+|----------|---------|--------------------------------------------------|
+| `total`  | integer | Total images matching the filter (all pages).    |
+| `limit`  | integer | Page size echoed back.                           |
+| `offset` | integer | Offset echoed back.                              |
+| `items`  | array   | Images for this page only.                       |
+
+## UI-Computed Values
+
+```
+totalPages = Math.ceil(total / limit)       // e.g. ceil(143 / 24) = 6
+currentPage = offset / limit + 1            // e.g. 48 / 24 + 1 = 3
+offset = (page - 1) * limit                // e.g. (3 - 1) * 24 = 48
+```
+
+## URL State
+
+| Query Param | Source              | Example          |
+|-------------|---------------------|------------------|
+| `page`      | current page number | `?page=3`        |
+| `tags`      | active tag filters  | `?tags=cat,funny` |
+
+Both params coexist: `/?page=3&tags=cat,funny`
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
agatha	c210978261	Chore: Revert initContainer command after successful migration	2026-05-09 20:39:22 -04:00
agatha	a61c67614f	Chore: Bump manifests and add migration init container sequence	2026-05-09 20:26:51 -04:00
agatha	27425889b3	Fix: Include scripts/ in production Docker image Dockerfile.prod explicitly listed copied directories and omitted scripts/, so the migration script was absent from the prod image. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 00:18:48 +00:00
agatha	61d923d5be	Feat: Replace UUID image identifiers with 8-character base62 short IDs Short IDs become the canonical identifier in URLs (/i/:short_id), MinIO/R2 storage keys, and all API responses. Hash-based deduplication is preserved. Includes two-phase Alembic migration (003 adds nullable column, 004 enforces NOT NULL) with a backfill script to copy storage objects and populate short_id for existing images. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 00:13:55 +00:00
agatha	87eb2703f5	Chore: Bump manifests for v1.3.1	2026-05-09 18:43:33 -04:00
agatha	bc0f5173c0	Feat: Substring tag search — match anywhere in tag name Changes prefix-only LIKE to case-insensitive ILIKE with leading wildcard so queries like "at" now match "cat", "scatter", etc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 22:42:23 +00:00
agatha	309cfce71c	Chore: Bump manifests for v1.3.0 release	2026-05-09 18:34:26 -04:00
agatha	b094389131	Fix: Await second microtask tick in copyUrl reject test The .catch() handler on a rejected promise resolves on the second microtask tick, not the first — one extra await Promise.resolve() is needed before the assertion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 22:31:58 +00:00
agatha	7d49c12ce2	Feat: Add Copy URL button and reusable toast notification system Detail page now has a "Copy URL" button that copies the image's direct file URL to the clipboard. A toast service (BehaviorSubject-backed, auto-dismissing after 3s) confirms success or failure. ToastComponent is registered at the app root and available to all future features. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 22:21:48 +00:00
agatha	443887ea93	Chore: Bump manifests for v1.2.1	2026-05-09 17:31:28 -04:00
agatha	e4bfe13072	Feat: Add gradient fade on truncated tag rows Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 21:30:18 +00:00
agatha	0a76bb03b5	Fix: Prevent partial second tag row on image cards Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 21:27:39 +00:00
agatha	8cbf1e527a	Fix: React to external URL changes and cap tag-row height in library Clicking the Reactbin home link (or any navigation to / that removes ?page=) now resets the displayed page by subscribing to queryParamMap for post-init URL changes. Cards with many tags no longer push the pagination bar down since the tag row is clamped to one line. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 21:24:44 +00:00
agatha	a280d8c761	Chore: Bump manifests for v1.2.0 release	2026-05-09 17:10:03 -04:00
agatha	781be909bc	Feat: Replace Load More with Previous/Next pagination in library Page size changes from 50 to 24. Library now shows discrete page navigation with a "Page N of M" indicator, total image count, and URL state (?page=N) so pages are bookmarkable and the browser Back button works. Tag filter resets to page 1. Out-of-range page params are clamped silently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 21:08:42 +00:00
agatha	e5e1acb533	Chore: Bump manifests after adding previews	2026-05-09 16:18:50 -04:00
agatha	c9bfdaf241	Feat: Add Open Graph and Twitter Card meta tags Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 20:17:35 +00:00
agatha	75a1449354	Chore: Bump manifests for v1.1.1 release	2026-05-09 13:55:44 -04:00
agatha	68881b30f1	Ops: Add script to test lockout with spoofed X-Forwarded-For headers	2026-05-09 13:54:49 -04:00
agatha	9021f4816a	Fix: Prefer X-Real-IP over XFF[0] in get_client_ip to close spoof bypass XFF[0] is attacker-controllable; a crafted X-Forwarded-For header could attribute login failures to a victim IP, triggering their lockout while the attacker accumulates none. ingress-nginx sets X-Real-IP via its realip module using an authoritative CIDR allowlist and overwrites any client-supplied value, making it spoof-resistant. Fallback to XFF[0] is retained for defence in depth but now emits a warning if reached. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 17:52:05 +00:00
agatha	35d21dafa4	Fix: Strip whitespace from S3_PUBLIC_BASE_URL before building CDN URLs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 00:35:22 +00:00
agatha	34d8c3848b	Ops: Bump manifests for v1.1.0 release	2026-05-08 20:25:32 -04:00
agatha	aaacfae653	Feat: Serve images directly from Cloudflare R2 CDN API responses now include file_url and thumbnail_url fields. When S3_PUBLIC_BASE_URL is configured, these point to the CDN domain; when unset, they fall back to the existing API proxy paths so local dev requires no additional setup. UI updated to use response URL fields directly instead of constructing proxy URLs client-side. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 00:17:22 +00:00
agatha	728efeaa48	Ops: Bump manifests for v1.0.1	2026-05-08 14:49:40 -04:00
agatha	c858e47daa	Feat: Add favicon and web manifest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 18:43:45 +00:00
agatha	9db20fdf90	Fix: Raise nginx ingress body size limit to 52m for image uploads Default client_max_body_size of 1MB was rejecting uploads larger than 1MB with a 413 before the request reached the API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 17:34:08 +00:00
agatha	9b66fe1918	Docs: Update constitution to v1.4.0 Aligns principles with actual project state: soften TDD wording to allow tests alongside implementation, replace CI gate with concrete local test suite gate, add production infrastructure to tech stack (k3s, nginx, Vault + VSO), and document plaintext password storage as a known gap that must be resolved before further auth work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 16:01:48 +00:00
agatha	e9a2e9f014	Docs: Update example image for README.md	2026-05-08 11:54:36 -04:00
agatha	7b3d4a9257	Docs: Add comprehensive README with local dev and production deployment guide Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 15:51:32 +00:00
agatha	7c57629941	Fix: Add correct annotation to ingress	2026-05-07 18:36:24 -04:00
agatha	4fe8b19d19	Fix: Adjust Minio security context	2026-05-07 18:29:36 -04:00
agatha	e34c9f7b7f	Chore: Set image pull policy	2026-05-07 18:21:43 -04:00
agatha	551ddbec3b	Ops: Adjust deployment manifests for environment	2026-05-07 17:49:48 -04:00
agatha	666c32cd69	Ops: Point manifests at Juggalol container registry	2026-05-07 17:38:28 -04:00
agatha	bf27c97deb	Feat: Add Kubernetes manifests for k3s production deployment Adds complete k8s/ manifest tree: Namespace, VaultAuth + VaultStaticSecret CRDs (VSO secret sync from Vault KV v2), API and UI Deployments and Services, nginx Ingress with cert-manager TLS, MinIO StatefulSet with PVC and init Job, and Alembic init container on the API Deployment for automatic schema migrations. Includes .yamllint.yml config and validate-k8s Makefile target. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 21:19:09 +00:00
agatha	ce279e6121	Chore: Update speckit context to feature 012 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 20:43:03 +00:00
agatha	b14508e4cf	Chore: Rebuild api-test image before running integration tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 20:42:16 +00:00
agatha	602648ef56	Feat: Gate API docs endpoints behind API_DOCS_ENABLED env var When API_DOCS_ENABLED=false, FastAPI registers no routes for /docs, /redoc, or /openapi.json, returning 404 for all three. Default is true for backwards compatibility. Invalid values fall back to true (FR-007). Fix: Remove tests/ and alembic/ from api/.dockerignore so the test Dockerfile (which uses COPY . .) includes the test suite; Dockerfile.prod is unaffected as it only copies app/ explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 20:40:48 +00:00
agatha	1b3468b72d	Feat: Add production-grade multi-stage container image for UI Two-stage build (node:22-slim builder + nginxinc/nginx-unprivileged:alpine runtime) with SPA fallback routing, long-lived cache headers for fingerprinted assets, non-root user (UID 101), and no Node.js toolchain in runtime image (82 MB vs 329 MB+ single-stage). Verified by ui/tests/build/verify_production_image.sh covering build, health, SPA routing, non-root, stdout logging, cache-control headers, SIGTERM exit 0, Node.js absent, secret-free layers, and dep-layer cache hit. 102 integration tests still pass; shellcheck clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 20:18:55 +00:00
agatha	12176471e1	Feat: Add production-grade multi-stage container image for API Two-stage build (uv builder + python:3.12-slim runtime) with non-root user (UID 1001), no dev deps, layer-cache-optimised dep install, and graceful SIGTERM shutdown. Verified by api/tests/build/verify_production_image.sh covering build, health endpoint, non-root, stdout logging, secret-free layers, missing-env-var exit, and dep-layer cache hit. All 102 integration tests still pass; shellcheck clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 19:59:29 +00:00
agatha	7a835d3172	Feat: Rate-limit login endpoint to block brute-force attacks After LOGIN_MAX_FAILURES consecutive failed attempts from the same source IP within LOGIN_WINDOW_SECONDS, POST /api/v1/auth/token returns HTTP 429 with a Retry-After header for LOGIN_COOLDOWN_SECONDS. A successful login resets the counter. Trusted upstream proxy IPs/CIDRs can be declared via LOGIN_TRUSTED_PROXY_IPS so X-Forwarded-For is honoured correctly behind nginx ingress or similar reverse proxies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 21:01:37 +00:00