Files
agatha 61d923d5be Feat: Replace UUID image identifiers with 8-character base62 short IDs
Short IDs become the canonical identifier in URLs (/i/:short_id),
MinIO/R2 storage keys, and all API responses. Hash-based deduplication
is preserved. Includes two-phase Alembic migration (003 adds nullable
column, 004 enforces NOT NULL) with a backfill script to copy storage
objects and populate short_id for existing images.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 00:13:55 +00:00

57 lines
3.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Research: Short Image IDs
## Short ID Generation
**Decision**: Use `secrets.choice` over `string.ascii_letters + string.digits` (base62, 62 characters), 8 characters long.
**Rationale**: `secrets.choice` is cryptographically random, eliminating any bias from modular reduction that affects simpler approaches. Base62 (az, AZ, 09) is URL-safe without percent-encoding. 8 characters gives 62⁸ ≈ 218 trillion combinations — negligible collision probability even at millions of images.
**Alternatives considered**:
- `secrets.token_urlsafe(6)` — includes `-` and `_`, not pure alphanumeric
- UUID truncation (first 8 chars of hex) — only 16 chars of alphabet (hex), dramatically fewer combinations than base62
- nanoid (npm) — JavaScript library, requires a separate dependency for Python
**Collision retry**: On insert, if a `UniqueConstraint` violation is raised on `short_id`, generate a new one and retry (up to a configurable limit, e.g., 10 attempts). At 10,000 images the per-attempt collision probability is ~4.6 × 10⁻¹¹; retries are a pure safety measure.
---
## Alembic Two-Phase Migration Strategy
**Decision**: Two separate Alembic migrations (003 + 004), with the Python migration script run between them.
**Rationale**: The `short_id` column must start nullable so existing rows can be inserted without a value. The migration script fills all existing rows. Once confirmed, a second migration adds the NOT NULL constraint. Running both as one migration would require a complex inline Python script in Alembic (fragile, untestable). Two migrations with a script in between is the standard approach for backfill + constraint change.
**Migration 003**: `ADD COLUMN short_id VARCHAR(8) NULL UNIQUE` + GiST/B-tree index.
**Script**: Fill all rows, idempotent (skip rows where `short_id IS NOT NULL`).
**Migration 004**: `ALTER COLUMN short_id SET NOT NULL`.
---
## Storage Object Copy Strategy
**Decision**: Copy-then-verify-then-delete (not atomic rename). Using the MinIO/S3 `copy_object` API followed by a `delete_object` call.
**Rationale**: S3-compatible object stores do not support atomic renames. The safe approach is: copy to new key, verify new object exists (head_object), update DB, delete old key. If interrupted after copy but before delete, the old object remains — wasted storage but no data loss. The migration is idempotent: if `short_id` is already set on a row, the script skips it.
**Alternatives considered**:
- `mc mv` (MinIO client CLI) — simpler but harder to script transactionally with DB updates
- Direct Python with `aiobotocore` — chosen; same library already used by the storage backend
---
## API Route Parameter Change
**Decision**: Change all image route parameters from `image_id: uuid.UUID` to `short_id: str` with manual length/charset validation.
**Rationale**: FastAPI's `uuid.UUID` type annotation rejects non-UUID strings at the path-parsing stage, so the existing routes cannot accept short IDs without a type change. Switching to `str` with a custom validator (8 alphanumeric chars) is minimal and clear.
**Impact**: All routes under `/api/v1/images/{id}` change to accept an 8-char string. The `id` field in API responses is retained as the UUID; `short_id` is added as a new field. The UI switches to using `short_id` for all navigation and API calls.
---
## Response Schema: Additive Change
**Decision**: Add `short_id` as a new field to the image response dict. The existing `id` (UUID) field is retained.
**Rationale**: Adding a field is non-breaking per §3.1. Removing `id` would be a breaking change. Retaining both allows any internal tooling or API consumers that already use `id` to continue working. The UI transitions to using `short_id` for routing and API calls, but the UUID remains queryable if needed.