Adds complete k8s/ manifest tree: Namespace, VaultAuth + VaultStaticSecret CRDs (VSO secret sync from Vault KV v2), API and UI Deployments and Services, nginx Ingress with cert-manager TLS, MinIO StatefulSet with PVC and init Job, and Alembic init container on the API Deployment for automatic schema migrations. Includes .yamllint.yml config and validate-k8s Makefile target. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9.3 KiB
Feature Specification: Kubernetes Production Manifests
Feature Branch: 013-k8s-manifests
Created: 2026-05-07
Status: Draft
Input: User description: "Kubernetes manifests for production deployment to k3s: Deployment, Service, and Ingress for the API and UI; VaultStaticSecret CRDs to sync secrets from HashiCorp Vault; Alembic init container on the API Deployment for schema migrations. The cluster uses an nginx ingress controller with Let's Encrypt TLS, a shared external Postgres instance, MinIO running in-cluster, and VSO (Vault Secrets Operator) for secret management."
User Scenarios & Testing (mandatory)
User Story 1 — Application Reachable in Production (Priority: P1)
As an operator, I can apply the manifests to my k3s cluster and have both the API and UI reachable at the production domain over HTTPS, with all health checks passing.
Why this priority: This is the core deployment goal. Nothing else matters if the application is not reachable.
Independent Test: Apply the API and UI manifests with a manually-created K8s Secret (bypassing Vault). Confirm the UI loads at the domain root and the API health endpoint returns 200 at /api/v1/health. Confirm HTTPS is enforced and HTTP redirects to HTTPS.
Acceptance Scenarios:
- Given the manifests are applied to the cluster, When a browser navigates to
https://<domain>/, Then the UI loads successfully with a valid TLS certificate. - Given the manifests are applied, When a request is made to
https://<domain>/api/v1/health, Then a 200 response is returned. - Given the API docs flag is disabled, When a request is made to
https://<domain>/docs, Then a 404 is returned. - Given the API pod is restarted, When it comes back up, Then it passes readiness checks before receiving traffic.
- Given a request for an unknown path, When it is made to the UI, Then the SPA serves the index page (client-side routing is preserved).
User Story 2 — Secrets Sourced from Vault (Priority: P2)
As an operator, no secrets are stored in version-controlled manifest files. All sensitive values are declared in Vault and synced automatically into the cluster as Kubernetes Secrets by the Vault Secrets Operator.
Why this priority: Security prerequisite for production. Hardcoded secrets in manifests are a material risk.
Independent Test: Run git grep for known secret patterns across k8s/ and confirm zero matches. Confirm VaultStaticSecret CRDs reference a Vault path and that the synced K8s Secret is created and the API pod's environment is populated from it.
Acceptance Scenarios:
- Given Vault contains the required secret values at the declared path, When VSO is running, Then a K8s Secret is created in the cluster namespace with the declared keys.
- Given the K8s Secret exists, When the API pod starts, Then its environment variables are populated from that secret.
- Given a
git grepfor plaintext credentials acrossk8s/, When run against the committed manifests, Then no plaintext secrets are found.
User Story 3 — Schema Migrations Run Before API Starts (Priority: P3)
As an operator, every time the API is deployed, database migrations run automatically in an init container before the main application container starts. A failed migration prevents the pod from starting, protecting against schema drift.
Why this priority: Prevents the API from serving requests against a stale or incompatible schema. Safe deployment ordering is essential for production.
Independent Test: Deploy with the init container pointing at a valid database. Confirm migrations run and the API starts. Simulate a failing migration by pointing the init container at an unreachable database and confirm the pod stays in init state and does not serve traffic.
Acceptance Scenarios:
- Given the API Deployment is applied, When the pod starts, Then the init container completes
alembic upgrade headbefore the main container starts. - Given the schema is already current, When the pod starts, Then the migration init container exits successfully with no changes applied.
- Given the migration fails, When the pod starts, Then the init container exits non-zero, the main container does not start, and the pod enters a visible error state.
User Story 4 — MinIO Runs In-Cluster with Persistent Storage (Priority: P4)
As an operator, MinIO runs inside the cluster with a PersistentVolumeClaim for durable storage, is not externally reachable, and has the required bucket initialised on first deployment.
Why this priority: Required for image storage, but decoupled from the other manifests — the S3 endpoint is just a config value the API reads.
Independent Test: Confirm the MinIO pod is running and has no external Ingress. Confirm the required bucket exists. Restart the MinIO pod and confirm previously stored objects are still accessible.
Acceptance Scenarios:
- Given the MinIO manifests are applied, When the MinIO pod starts, Then the required bucket is created and the API can store and retrieve images.
- Given the MinIO pod is restarted, When it comes back up, Then all previously stored objects remain accessible (PVC-backed storage persists).
- Given no Ingress is defined for MinIO, When a connection is attempted from outside the cluster, Then MinIO is not reachable.
Edge Cases
- What if Vault is unavailable when VSO tries to sync? VSO retries on a configurable interval; the pod will not start until the K8s Secret exists.
- What if the database is unreachable during migration? The init container exits non-zero; the pod does not start and Kubernetes retries with backoff.
- What if the MinIO PVC runs out of space? MinIO will fail writes; the API will return upload errors. Capacity monitoring is out of scope for this feature.
- What if migrations and the main container use different image tags? They use the same tag in the same Deployment spec, so they are always in sync.
Requirements (mandatory)
Functional Requirements
- FR-001: All manifests MUST target a single configurable namespace (default:
reactbin). - FR-002: The API MUST be deployed as a Deployment with liveness and readiness probes on
/api/v1/health. - FR-003: The API Deployment MUST include an init container using the same image that runs database schema migrations before the main container starts.
- FR-004: The API Deployment MUST set
API_DOCS_ENABLED=false. - FR-005: The UI MUST be deployed as a Deployment with a liveness probe confirming the nginx process is serving.
- FR-006: A single Ingress MUST route
https://<domain>/api/to the API Service and all other paths to the UI Service, with TLS termination via a cert-manager Let's Encrypt certificate. - FR-007: HTTP requests MUST be redirected to HTTPS via the Ingress.
- FR-008: All API secrets MUST be declared in a VaultStaticSecret CRD and synced into a K8s Secret; no secret value MUST appear as plaintext in any manifest file.
- FR-009: The API Deployment MUST source all environment variables from the synced K8s Secret via
envFrom. - FR-010: MinIO MUST be deployed as a StatefulSet with a PersistentVolumeClaim using the cluster's default storage class.
- FR-011: A Kubernetes Job MUST create the required S3 bucket in MinIO on first deployment and MUST be idempotent on re-apply.
- FR-012: MinIO MUST have no Ingress; it MUST only be accessible within the cluster via ClusterIP.
- FR-013: All containers MUST run as non-root users.
- FR-014: The API production image MUST include migration files so the init container can run migrations without a separate image.
Success Criteria (mandatory)
Measurable Outcomes
- SC-001: The application is accessible at the production domain within 120 seconds of
kubectl apply. - SC-002: Schema migrations complete and the API begins serving traffic without manual operator intervention on every deployment.
- SC-003: A
git grepacrossk8s/finds zero plaintext secret values in committed files. - SC-004: A simulated migration failure holds the pod in init state and the application never serves traffic.
- SC-005: Restarting the MinIO pod does not result in data loss — previously uploaded images remain accessible.
Assumptions
- The k3s cluster is running with the nginx ingress controller installed.
- cert-manager is installed and a
ClusterIssuernamedletsencrypt-prodis already configured. - The Vault Secrets Operator is installed in the cluster.
- A HashiCorp Vault instance is accessible from the cluster and the required secret values are stored at the declared Vault path before deployment.
- A shared external PostgreSQL instance is available; the operator creates a dedicated database and user before deploying.
- DNS for the production domain is already pointing at the cluster ingress IP.
- Manifests are stored in a
k8s/directory at the repository root. - The cluster's default storage class supports ReadWriteOnce (sufficient for single-replica MinIO).
- All Deployments run a single replica (personal tool, no HA requirement).
- Image tags are managed externally; manifests use a placeholder tag that the operator substitutes at deploy time.
- The
API_DOCS_ENABLEDflag exists on the API (implemented in feature 012).