Short IDs become the canonical identifier in URLs (/i/:short_id), MinIO/R2 storage keys, and all API responses. Hash-based deduplication is preserved. Includes two-phase Alembic migration (003 adds nullable column, 004 enforces NOT NULL) with a backfill script to copy storage objects and populate short_id for existing images. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
57 lines
3.7 KiB
Markdown
57 lines
3.7 KiB
Markdown
# Research: Short Image IDs
|
||
|
||
## Short ID Generation
|
||
|
||
**Decision**: Use `secrets.choice` over `string.ascii_letters + string.digits` (base62, 62 characters), 8 characters long.
|
||
|
||
**Rationale**: `secrets.choice` is cryptographically random, eliminating any bias from modular reduction that affects simpler approaches. Base62 (a–z, A–Z, 0–9) is URL-safe without percent-encoding. 8 characters gives 62⁸ ≈ 218 trillion combinations — negligible collision probability even at millions of images.
|
||
|
||
**Alternatives considered**:
|
||
- `secrets.token_urlsafe(6)` — includes `-` and `_`, not pure alphanumeric
|
||
- UUID truncation (first 8 chars of hex) — only 16 chars of alphabet (hex), dramatically fewer combinations than base62
|
||
- nanoid (npm) — JavaScript library, requires a separate dependency for Python
|
||
|
||
**Collision retry**: On insert, if a `UniqueConstraint` violation is raised on `short_id`, generate a new one and retry (up to a configurable limit, e.g., 10 attempts). At 10,000 images the per-attempt collision probability is ~4.6 × 10⁻¹¹; retries are a pure safety measure.
|
||
|
||
---
|
||
|
||
## Alembic Two-Phase Migration Strategy
|
||
|
||
**Decision**: Two separate Alembic migrations (003 + 004), with the Python migration script run between them.
|
||
|
||
**Rationale**: The `short_id` column must start nullable so existing rows can be inserted without a value. The migration script fills all existing rows. Once confirmed, a second migration adds the NOT NULL constraint. Running both as one migration would require a complex inline Python script in Alembic (fragile, untestable). Two migrations with a script in between is the standard approach for backfill + constraint change.
|
||
|
||
**Migration 003**: `ADD COLUMN short_id VARCHAR(8) NULL UNIQUE` + GiST/B-tree index.
|
||
**Script**: Fill all rows, idempotent (skip rows where `short_id IS NOT NULL`).
|
||
**Migration 004**: `ALTER COLUMN short_id SET NOT NULL`.
|
||
|
||
---
|
||
|
||
## Storage Object Copy Strategy
|
||
|
||
**Decision**: Copy-then-verify-then-delete (not atomic rename). Using the MinIO/S3 `copy_object` API followed by a `delete_object` call.
|
||
|
||
**Rationale**: S3-compatible object stores do not support atomic renames. The safe approach is: copy to new key, verify new object exists (head_object), update DB, delete old key. If interrupted after copy but before delete, the old object remains — wasted storage but no data loss. The migration is idempotent: if `short_id` is already set on a row, the script skips it.
|
||
|
||
**Alternatives considered**:
|
||
- `mc mv` (MinIO client CLI) — simpler but harder to script transactionally with DB updates
|
||
- Direct Python with `aiobotocore` — chosen; same library already used by the storage backend
|
||
|
||
---
|
||
|
||
## API Route Parameter Change
|
||
|
||
**Decision**: Change all image route parameters from `image_id: uuid.UUID` to `short_id: str` with manual length/charset validation.
|
||
|
||
**Rationale**: FastAPI's `uuid.UUID` type annotation rejects non-UUID strings at the path-parsing stage, so the existing routes cannot accept short IDs without a type change. Switching to `str` with a custom validator (8 alphanumeric chars) is minimal and clear.
|
||
|
||
**Impact**: All routes under `/api/v1/images/{id}` change to accept an 8-char string. The `id` field in API responses is retained as the UUID; `short_id` is added as a new field. The UI switches to using `short_id` for all navigation and API calls.
|
||
|
||
---
|
||
|
||
## Response Schema: Additive Change
|
||
|
||
**Decision**: Add `short_id` as a new field to the image response dict. The existing `id` (UUID) field is retained.
|
||
|
||
**Rationale**: Adding a field is non-breaking per §3.1. Removing `id` would be a breaking change. Retaining both allows any internal tooling or API consumers that already use `id` to continue working. The UI transitions to using `short_id` for routing and API calls, but the UUID remains queryable if needed.
|