Files
agatha cd89ba5dea Feat: Proxy image content through the API instead of redirecting to MinIO
Replace the presigned-URL redirect (302) in GET /api/v1/images/{id}/file
with a direct proxy that fetches bytes from S3 server-side and returns them
to the client. The browser never contacts the storage backend, eliminating
the /etc/hosts workaround needed in local development.

- StorageBackend: swap get_presigned_url for get(key) -> bytes
- S3StorageBackend: implement get() via aiobotocore get_object
- serve_image_file: return Response with ETag + Cache-Control: immutable
- test_serving: assert 200 + content-type + ETag; add no-storage-details test
- Spec Kit artifacts for feature 002-api-image-proxy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:36:43 +00:00

7.8 KiB
Raw Permalink Blame History

Tasks: API Image Proxy

Input: Design documents from specs/002-api-image-proxy/ Prerequisites: plan.md , spec.md , research.md , data-model.md , contracts/api.md

TDD: Tests are non-negotiable per constitution §5.1. Every test task MUST be written and confirmed failing before its implementation task runs.

Organization: Tasks are grouped by user story to enable independent implementation and testing.

Format: [ID] [P?] [Story] Description

  • [P]: Can run in parallel (different files, no dependencies on incomplete tasks)
  • [Story]: Which user story this task belongs to (US1, US2)

Phase 1: Setup

No setup required. The project is already initialized; the feature branch 002-api-image-proxy is active.


Phase 2: Foundational (Blocking Prerequisites)

Purpose: Update the StorageBackend interface before any route or test change can reference the new method. These are pure interface/plumbing tasks with no callers yet — no preceding failing test is required because the tasks themselves introduce no observable behaviour change.

⚠️ CRITICAL: Both T001 and T002 must be complete before US1 work begins.

  • T001 Remove abstract method get_presigned_url and add async def get(self, key: str) -> bytes abstract method (with docstring "Return object content as bytes.") to StorageBackend in api/app/storage/backend.py
  • T002 [P] Remove get_presigned_url implementation; add async def get(self, key: str) -> bytes that opens an aiobotocore client, calls client.get_object(Bucket=self._settings.s3_bucket_name, Key=key), and returns await response["Body"].read() in api/app/storage/s3_backend.py

Checkpoint: ruff check api/ passes; pytest api/ passes (all existing tests green — serve_image_file route still compiles but will change in Phase 3).


Phase 3: User Story 1 — View Images Without Storage Configuration (Priority: P1) 🎯 MVP

Goal: Replace the 302 presigned-URL redirect with a 200 binary response so images load without any DNS or /etc/hosts configuration.

Independent Test: Start the application with no custom host entries. Open the library and detail page. All images load. Browser network tab shows only requests to /api/v1/ — none to the storage hostname.

Tests for User Story 1 (write FIRST — must FAIL before T004) ⚠️

  • T003 [US1] In api/tests/integration/test_serving.py: rename test_file_redirect_returns_302 to test_file_returns_200_with_content; remove follow_redirects=False; replace the 302/Location assertions with: response.status_code == 200, response.headers["content-type"].startswith("image/"), response.headers["etag"] == f'"{uploaded_image_hash}"' (extract hash from upload response), "immutable" in response.headers["cache-control"], and len(response.content) > 0; run pytest api/tests/integration/test_serving.py::test_file_returns_200_with_content and confirm it fails

Implementation for User Story 1

  • T004 [US1] In api/app/routers/images.py: remove RedirectResponse from the fastapi.responses import; update serve_image_file to (a) call data = await storage.get(image.storage_key) inside a try/except Exception that raises an HTTPException with status 500 and code "storage_error", then (b) return Response(content=data, media_type=image.mime_type, headers={"ETag": f'"{image.hash}"', "Cache-Control": "public, max-age=31536000, immutable"})
  • T005 [US1] Run pytest api/tests/integration/test_serving.py and confirm both test_file_returns_200_with_content and test_file_unknown_id_returns_404 pass; then run pytest api/ to confirm no regressions across all test files

Checkpoint: User Story 1 is fully functional. Images load via the API proxy with correct caching headers. No redirect to storage occurs.


Phase 4: User Story 2 — Storage Backend Remains Private (Priority: P1)

Goal: Verify, with an explicit test, that no storage-specific detail (hostname, presigned URL, bucket name) appears anywhere in any API response after the proxy change.

Independent Test: Run the new test suite. Inspect browser DevTools network tab: zero requests to the storage host; zero occurrences of "minio", "s3://", or "amazonaws.com" in any response header or body.

Tests for User Story 2 (write FIRST — must FAIL on pre-T004 code) ⚠️

  • T006 [US2] In api/tests/integration/test_serving.py: add test test_file_response_exposes_no_storage_details that uploads an image, calls GET /api/v1/images/{id}/file, then asserts: (a) "location" not in response.headers (no redirect target), (b) "minio" not in response.text.lower(), (c) "s3://" not in response.text.lower(), (d) "amazonaws.com" not in response.text.lower(); run the test against the pre-T004 code and confirm it fails on assertion (a) (Location header is present in the redirect response)

Implementation for User Story 2

No new implementation — US2 is fully satisfied by the T004 router change. The test written in T006 must pass once T004 is complete.

  • T007 [US2] Run pytest api/tests/integration/test_serving.py and confirm test_file_response_exposes_no_storage_details passes; this confirms US2 is satisfied by the US1 implementation

Checkpoint: Both user stories verified. Storage backend is provably private. Full proxy behaviour confirmed with explicit assertions.


Phase 5: Polish & Cross-Cutting Concerns

  • T008 [P] Run ruff check api/ && ruff format --check api/ and fix any lint or formatting issues in api/app/storage/backend.py, api/app/storage/s3_backend.py, and api/app/routers/images.py
  • T009 Run full test suite pytest api/ -v and confirm all tests pass (expected: existing 45 tests + 2 new/updated tests = 47 total); record final count

Dependencies & Execution Order

Phase Dependencies

  • Foundational (Phase 2): No dependencies — start immediately
  • US1 (Phase 3): Depends on Phase 2 complete (T001 + T002 done)
  • US2 (Phase 4): Depends on Phase 3 complete (T004 done, since T006 tests against pre-T004 code first)
  • Polish (Phase 5): Depends on Phases 3 and 4 complete

User Story Dependencies

  • US1: Independent — requires only Phase 2
  • US2: Logically depends on US1 (its implementation task is zero — it reuses T004's work; T006 test is written pre-T004 to confirm it fails, then passes after T004)

Within Each User Story

  • Test tasks MUST be written and confirmed FAILING before implementation tasks run (§5.1)
  • T001 and T002 can run in parallel (different files)
  • T003 must precede T004
  • T005 validates T004
  • T006 is written pre-T004 but validated post-T004

Parallel Example: Phase 2 (Foundational)

# T001 and T002 touch different files — run in parallel:
Task: "Remove get_presigned_url and add get() abstract method in api/app/storage/backend.py"
Task: "Remove get_presigned_url impl and add get() body in api/app/storage/s3_backend.py"

Implementation Strategy

MVP (Full feature — both user stories together)

  1. Complete Phase 2: T001, T002 (parallel)
  2. Complete Phase 3: T003 (fail), T004 (implement), T005 (verify)
  3. Complete Phase 4: T006 (fail → passes), T007 (verify)
  4. Complete Phase 5: T008 (lint), T009 (final count)
  5. STOP and VALIDATE: Confirm zero browser requests to storage host in DevTools

Total tasks: 9 (T001T009)


Notes

  • [P] tasks touch different files and have no mutual dependencies
  • T006 must be run against pre-T004 code to confirm it fails — this is the TDD "red" step for US2
  • storage_key in image metadata responses is a SHA-256 hex string (not a URL); it does not constitute an exposed storage credential and is out of scope for this feature per research.md Decision 5
  • After this feature, get_presigned_url has no callers anywhere in the codebase — its removal is safe