# Feature Specification: Kubernetes Production Manifests **Feature Branch**: `013-k8s-manifests` **Created**: 2026-05-07 **Status**: Draft **Input**: User description: "Kubernetes manifests for production deployment to k3s: Deployment, Service, and Ingress for the API and UI; VaultStaticSecret CRDs to sync secrets from HashiCorp Vault; Alembic init container on the API Deployment for schema migrations. The cluster uses an nginx ingress controller with Let's Encrypt TLS, a shared external Postgres instance, MinIO running in-cluster, and VSO (Vault Secrets Operator) for secret management." ## User Scenarios & Testing *(mandatory)* ### User Story 1 — Application Reachable in Production (Priority: P1) As an operator, I can apply the manifests to my k3s cluster and have both the API and UI reachable at the production domain over HTTPS, with all health checks passing. **Why this priority**: This is the core deployment goal. Nothing else matters if the application is not reachable. **Independent Test**: Apply the API and UI manifests with a manually-created K8s Secret (bypassing Vault). Confirm the UI loads at the domain root and the API health endpoint returns 200 at `/api/v1/health`. Confirm HTTPS is enforced and HTTP redirects to HTTPS. **Acceptance Scenarios**: 1. **Given** the manifests are applied to the cluster, **When** a browser navigates to `https:///`, **Then** the UI loads successfully with a valid TLS certificate. 2. **Given** the manifests are applied, **When** a request is made to `https:///api/v1/health`, **Then** a 200 response is returned. 3. **Given** the API docs flag is disabled, **When** a request is made to `https:///docs`, **Then** a 404 is returned. 4. **Given** the API pod is restarted, **When** it comes back up, **Then** it passes readiness checks before receiving traffic. 5. **Given** a request for an unknown path, **When** it is made to the UI, **Then** the SPA serves the index page (client-side routing is preserved). --- ### User Story 2 — Secrets Sourced from Vault (Priority: P2) As an operator, no secrets are stored in version-controlled manifest files. All sensitive values are declared in Vault and synced automatically into the cluster as Kubernetes Secrets by the Vault Secrets Operator. **Why this priority**: Security prerequisite for production. Hardcoded secrets in manifests are a material risk. **Independent Test**: Run `git grep` for known secret patterns across `k8s/` and confirm zero matches. Confirm VaultStaticSecret CRDs reference a Vault path and that the synced K8s Secret is created and the API pod's environment is populated from it. **Acceptance Scenarios**: 1. **Given** Vault contains the required secret values at the declared path, **When** VSO is running, **Then** a K8s Secret is created in the cluster namespace with the declared keys. 2. **Given** the K8s Secret exists, **When** the API pod starts, **Then** its environment variables are populated from that secret. 3. **Given** a `git grep` for plaintext credentials across `k8s/`, **When** run against the committed manifests, **Then** no plaintext secrets are found. --- ### User Story 3 — Schema Migrations Run Before API Starts (Priority: P3) As an operator, every time the API is deployed, database migrations run automatically in an init container before the main application container starts. A failed migration prevents the pod from starting, protecting against schema drift. **Why this priority**: Prevents the API from serving requests against a stale or incompatible schema. Safe deployment ordering is essential for production. **Independent Test**: Deploy with the init container pointing at a valid database. Confirm migrations run and the API starts. Simulate a failing migration by pointing the init container at an unreachable database and confirm the pod stays in init state and does not serve traffic. **Acceptance Scenarios**: 1. **Given** the API Deployment is applied, **When** the pod starts, **Then** the init container completes `alembic upgrade head` before the main container starts. 2. **Given** the schema is already current, **When** the pod starts, **Then** the migration init container exits successfully with no changes applied. 3. **Given** the migration fails, **When** the pod starts, **Then** the init container exits non-zero, the main container does not start, and the pod enters a visible error state. --- ### User Story 4 — MinIO Runs In-Cluster with Persistent Storage (Priority: P4) As an operator, MinIO runs inside the cluster with a PersistentVolumeClaim for durable storage, is not externally reachable, and has the required bucket initialised on first deployment. **Why this priority**: Required for image storage, but decoupled from the other manifests — the S3 endpoint is just a config value the API reads. **Independent Test**: Confirm the MinIO pod is running and has no external Ingress. Confirm the required bucket exists. Restart the MinIO pod and confirm previously stored objects are still accessible. **Acceptance Scenarios**: 1. **Given** the MinIO manifests are applied, **When** the MinIO pod starts, **Then** the required bucket is created and the API can store and retrieve images. 2. **Given** the MinIO pod is restarted, **When** it comes back up, **Then** all previously stored objects remain accessible (PVC-backed storage persists). 3. **Given** no Ingress is defined for MinIO, **When** a connection is attempted from outside the cluster, **Then** MinIO is not reachable. --- ### Edge Cases - What if Vault is unavailable when VSO tries to sync? VSO retries on a configurable interval; the pod will not start until the K8s Secret exists. - What if the database is unreachable during migration? The init container exits non-zero; the pod does not start and Kubernetes retries with backoff. - What if the MinIO PVC runs out of space? MinIO will fail writes; the API will return upload errors. Capacity monitoring is out of scope for this feature. - What if migrations and the main container use different image tags? They use the same tag in the same Deployment spec, so they are always in sync. ## Requirements *(mandatory)* ### Functional Requirements - **FR-001**: All manifests MUST target a single configurable namespace (default: `reactbin`). - **FR-002**: The API MUST be deployed as a Deployment with liveness and readiness probes on `/api/v1/health`. - **FR-003**: The API Deployment MUST include an init container using the same image that runs database schema migrations before the main container starts. - **FR-004**: The API Deployment MUST set `API_DOCS_ENABLED=false`. - **FR-005**: The UI MUST be deployed as a Deployment with a liveness probe confirming the nginx process is serving. - **FR-006**: A single Ingress MUST route `https:///api/` to the API Service and all other paths to the UI Service, with TLS termination via a cert-manager Let's Encrypt certificate. - **FR-007**: HTTP requests MUST be redirected to HTTPS via the Ingress. - **FR-008**: All API secrets MUST be declared in a VaultStaticSecret CRD and synced into a K8s Secret; no secret value MUST appear as plaintext in any manifest file. - **FR-009**: The API Deployment MUST source all environment variables from the synced K8s Secret via `envFrom`. - **FR-010**: MinIO MUST be deployed as a StatefulSet with a PersistentVolumeClaim using the cluster's default storage class. - **FR-011**: A Kubernetes Job MUST create the required S3 bucket in MinIO on first deployment and MUST be idempotent on re-apply. - **FR-012**: MinIO MUST have no Ingress; it MUST only be accessible within the cluster via ClusterIP. - **FR-013**: All containers MUST run as non-root users. - **FR-014**: The API production image MUST include migration files so the init container can run migrations without a separate image. ## Success Criteria *(mandatory)* ### Measurable Outcomes - **SC-001**: The application is accessible at the production domain within 120 seconds of `kubectl apply`. - **SC-002**: Schema migrations complete and the API begins serving traffic without manual operator intervention on every deployment. - **SC-003**: A `git grep` across `k8s/` finds zero plaintext secret values in committed files. - **SC-004**: A simulated migration failure holds the pod in init state and the application never serves traffic. - **SC-005**: Restarting the MinIO pod does not result in data loss — previously uploaded images remain accessible. ## Assumptions - The k3s cluster is running with the nginx ingress controller installed. - cert-manager is installed and a `ClusterIssuer` named `letsencrypt-prod` is already configured. - The Vault Secrets Operator is installed in the cluster. - A HashiCorp Vault instance is accessible from the cluster and the required secret values are stored at the declared Vault path before deployment. - A shared external PostgreSQL instance is available; the operator creates a dedicated database and user before deploying. - DNS for the production domain is already pointing at the cluster ingress IP. - Manifests are stored in a `k8s/` directory at the repository root. - The cluster's default storage class supports ReadWriteOnce (sufficient for single-replica MinIO). - All Deployments run a single replica (personal tool, no HA requirement). - Image tags are managed externally; manifests use a placeholder tag that the operator substitutes at deploy time. - The `API_DOCS_ENABLED` flag exists on the API (implemented in feature 012).