Files
agatha 61d923d5be Feat: Replace UUID image identifiers with 8-character base62 short IDs
Short IDs become the canonical identifier in URLs (/i/:short_id),
MinIO/R2 storage keys, and all API responses. Hash-based deduplication
is preserved. Includes two-phase Alembic migration (003 adds nullable
column, 004 enforces NOT NULL) with a backfill script to copy storage
objects and populate short_id for existing images.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 00:13:55 +00:00

3.7 KiB
Raw Permalink Blame History

Research: Short Image IDs

Short ID Generation

Decision: Use secrets.choice over string.ascii_letters + string.digits (base62, 62 characters), 8 characters long.

Rationale: secrets.choice is cryptographically random, eliminating any bias from modular reduction that affects simpler approaches. Base62 (az, AZ, 09) is URL-safe without percent-encoding. 8 characters gives 62⁸ ≈ 218 trillion combinations — negligible collision probability even at millions of images.

Alternatives considered:

  • secrets.token_urlsafe(6) — includes - and _, not pure alphanumeric
  • UUID truncation (first 8 chars of hex) — only 16 chars of alphabet (hex), dramatically fewer combinations than base62
  • nanoid (npm) — JavaScript library, requires a separate dependency for Python

Collision retry: On insert, if a UniqueConstraint violation is raised on short_id, generate a new one and retry (up to a configurable limit, e.g., 10 attempts). At 10,000 images the per-attempt collision probability is ~4.6 × 10⁻¹¹; retries are a pure safety measure.


Alembic Two-Phase Migration Strategy

Decision: Two separate Alembic migrations (003 + 004), with the Python migration script run between them.

Rationale: The short_id column must start nullable so existing rows can be inserted without a value. The migration script fills all existing rows. Once confirmed, a second migration adds the NOT NULL constraint. Running both as one migration would require a complex inline Python script in Alembic (fragile, untestable). Two migrations with a script in between is the standard approach for backfill + constraint change.

Migration 003: ADD COLUMN short_id VARCHAR(8) NULL UNIQUE + GiST/B-tree index. Script: Fill all rows, idempotent (skip rows where short_id IS NOT NULL). Migration 004: ALTER COLUMN short_id SET NOT NULL.


Storage Object Copy Strategy

Decision: Copy-then-verify-then-delete (not atomic rename). Using the MinIO/S3 copy_object API followed by a delete_object call.

Rationale: S3-compatible object stores do not support atomic renames. The safe approach is: copy to new key, verify new object exists (head_object), update DB, delete old key. If interrupted after copy but before delete, the old object remains — wasted storage but no data loss. The migration is idempotent: if short_id is already set on a row, the script skips it.

Alternatives considered:

  • mc mv (MinIO client CLI) — simpler but harder to script transactionally with DB updates
  • Direct Python with aiobotocore — chosen; same library already used by the storage backend

API Route Parameter Change

Decision: Change all image route parameters from image_id: uuid.UUID to short_id: str with manual length/charset validation.

Rationale: FastAPI's uuid.UUID type annotation rejects non-UUID strings at the path-parsing stage, so the existing routes cannot accept short IDs without a type change. Switching to str with a custom validator (8 alphanumeric chars) is minimal and clear.

Impact: All routes under /api/v1/images/{id} change to accept an 8-char string. The id field in API responses is retained as the UUID; short_id is added as a new field. The UI switches to using short_id for all navigation and API calls.


Response Schema: Additive Change

Decision: Add short_id as a new field to the image response dict. The existing id (UUID) field is retained.

Rationale: Adding a field is non-breaking per §3.1. Removing id would be a breaking change. Retaining both allows any internal tooling or API consumers that already use id to continue working. The UI transitions to using short_id for routing and API calls, but the UUID remains queryable if needed.