11 KiB
Database schema reference
Overview
All tables use UUID primary keys (generated client-side via uuid4()), timestamptz for datetime columns, and follow a consistent naming convention: snake_case table names, singular for join/config tables, plural for entity tables.
The schema is managed by Alembic. Never modify tables directly — always create a migration.
Proxy domain tables
proxy_sources
Configurable scrape targets. Each record defines a URL to fetch, a parser to use, and a schedule.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
uuid |
PK, default uuid4 | |
url |
varchar(2048) |
UNIQUE, NOT NULL | The URL to scrape |
parser_name |
varchar(64) |
NOT NULL | Maps to a registered SourceParser.name |
cron_schedule |
varchar(64) |
nullable | Cron expression for scrape frequency. Falls back to the parser's default_schedule() if NULL |
default_protocol |
enum(proxy_protocol) |
NOT NULL, default http |
Protocol to assign when the parser can't determine it from the source |
is_active |
boolean |
NOT NULL, default true |
Inactive sources are skipped by the scrape task |
last_scraped_at |
timestamptz |
nullable | Timestamp of the last successful scrape |
created_at |
timestamptz |
NOT NULL, server default now() |
Rationale: Storing the parser name rather than auto-detecting every time allows explicit control. A source might look like a plain text file but actually need a custom parser.
proxies
The core proxy table. Each record represents a unique (ip, port, protocol) combination.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
uuid |
PK, default uuid4 | |
ip |
inet |
NOT NULL | IPv4 or IPv6 address |
port |
integer |
NOT NULL | Port number (1–65535) |
protocol |
enum(proxy_protocol) |
NOT NULL | http, https, socks4, socks5 |
source_id |
uuid |
FK → proxy_sources.id, NOT NULL | Which source discovered this proxy |
status |
enum(proxy_status) |
NOT NULL, default unchecked |
unchecked, active, dead |
anonymity |
enum(anonymity_level) |
nullable | transparent, anonymous, elite |
exit_ip |
inet |
nullable | The IP address seen by the target when using this proxy |
country |
varchar(2) |
nullable | ISO 3166-1 alpha-2 country code of the exit IP |
score |
float |
NOT NULL, default 0.0 |
Composite quality score (0.0–1.0) |
avg_latency_ms |
float |
nullable | Rolling average latency across recent checks |
uptime_pct |
float |
nullable | Percentage of checks that passed (0.0–100.0) |
first_seen_at |
timestamptz |
NOT NULL, server default now() |
When this proxy was first discovered |
last_checked_at |
timestamptz |
nullable | When the last validation check completed |
created_at |
timestamptz |
NOT NULL, server default now() |
Indexes:
| Name | Columns | Type | Purpose |
|---|---|---|---|
ix_proxies_ip_port_proto |
(ip, port, protocol) |
UNIQUE | Deduplication on upsert |
ix_proxies_status_score |
(status, score) |
B-tree | Fast filtering for "active proxies sorted by score" |
Design note: The same ip:port can appear multiple times if it supports different protocols (e.g., HTTP on port 8080 and SOCKS5 on port 1080). The composite unique index enforces this correctly.
Computed columns: score, avg_latency_ms, and uptime_pct are denormalized from proxy_checks. They are recomputed by the validation pipeline after each check run and by a periodic rollup task. This avoids expensive aggregation queries on every proxy list request.
proxy_checks
Append-only log of every validation check attempt. This is the raw data behind the computed fields on proxies.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
uuid |
PK, default uuid4 | |
proxy_id |
uuid |
FK → proxies.id ON DELETE CASCADE, NOT NULL | |
checker_name |
varchar(64) |
NOT NULL | The ProxyChecker.name that ran this check |
stage |
integer |
NOT NULL | Pipeline stage number |
passed |
boolean |
NOT NULL | Whether the check succeeded |
latency_ms |
float |
nullable | Time taken for this specific check |
detail |
text |
nullable | Human-readable result description or error message |
exit_ip |
inet |
nullable | Exit IP discovered during this check (if applicable) |
created_at |
timestamptz |
NOT NULL, server default now() |
Indexes:
| Name | Columns | Purpose |
|---|---|---|
ix_checks_proxy_created |
(proxy_id, created_at) |
Efficient history queries per proxy |
Retention: This table grows fast. A periodic cleanup task (tasks_cleanup.prune_checks) deletes rows older than a configurable retention period (default: 7 days), keeping only the most recent N checks per proxy.
proxy_tags
Flexible key-value labels for proxies. Useful for user-defined categorization (e.g., datacenter: true, provider: aws, tested_site: google.com).
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
uuid |
PK, default uuid4 | |
proxy_id |
uuid |
FK → proxies.id ON DELETE CASCADE, NOT NULL | |
key |
varchar(64) |
NOT NULL | Tag name |
value |
varchar(256) |
NOT NULL | Tag value |
Indexes:
| Name | Columns | Type | Purpose |
|---|---|---|---|
ix_tags_proxy_key |
(proxy_id, key) |
UNIQUE | One value per key per proxy |
Accounts domain tables
users
User accounts. Minimal by design — the primary purpose is to own API keys and credits.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
uuid |
PK, default uuid4 | |
email |
varchar(320) |
UNIQUE, NOT NULL | Used for notifications and account recovery |
display_name |
varchar(128) |
nullable | |
is_active |
boolean |
NOT NULL, default true |
Inactive users cannot authenticate |
created_at |
timestamptz |
NOT NULL, server default now() |
api_keys
API keys for authentication. The raw key is shown once at creation; only the hash is stored.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
uuid |
PK, default uuid4 | |
user_id |
uuid |
FK → users.id ON DELETE CASCADE, NOT NULL | |
key_hash |
varchar(128) |
NOT NULL | SHA-256 hash of the raw API key |
prefix |
varchar(8) |
NOT NULL | First 8 characters of the raw key, for quick lookup |
label |
varchar(128) |
nullable | User-assigned label (e.g., "production", "testing") |
is_active |
boolean |
NOT NULL, default true |
Revoked keys have is_active = false |
last_used_at |
timestamptz |
nullable | Updated on each authenticated request |
expires_at |
timestamptz |
nullable | NULL means no expiration |
created_at |
timestamptz |
NOT NULL, server default now() |
Indexes:
| Name | Columns | Type | Purpose |
|---|---|---|---|
ix_api_keys_hash |
(key_hash) |
UNIQUE | Uniqueness constraint on key hashes |
ix_api_keys_prefix |
(prefix) |
B-tree | Fast prefix-based lookup before full hash comparison |
Auth flow: On each request, the middleware extracts the API key from the Authorization: Bearer <key> header, computes prefix = key[:8], queries api_keys WHERE prefix = ? AND is_active = true AND (expires_at IS NULL OR expires_at > now()), then verifies sha256(key) == key_hash. This two-step approach avoids computing a hash against every key in the database.
credit_ledger
Append-only ledger of all credit transactions. Current balance is SELECT SUM(amount) FROM credit_ledger WHERE user_id = ?.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
uuid |
PK, default uuid4 | |
user_id |
uuid |
FK → users.id ON DELETE CASCADE, NOT NULL | |
amount |
integer |
NOT NULL | Positive = credit in, negative = debit |
tx_type |
enum(credit_tx_type) |
NOT NULL | purchase, acquire, refund, admin_adjust |
description |
text |
nullable | Human-readable note |
reference_id |
uuid |
nullable | Links to the related entity (e.g., lease ID for acquire transactions) |
created_at |
timestamptz |
NOT NULL, server default now() |
Indexes:
| Name | Columns | Purpose |
|---|---|---|
ix_ledger_user_created |
(user_id, created_at) |
Balance computation and history queries |
Caching: The computed balance is cached in Redis under credits:{user_id}. The cache is invalidated (DEL) whenever a new ledger entry is created. Cache miss triggers a SUM(amount) query.
Concurrency: Because balance is derived from a SUM, concurrent inserts don't cause race conditions on the balance itself. The acquire endpoint uses SELECT ... FOR UPDATE on the user row to serialize credit checks, preventing double-spending under high concurrency.
proxy_leases
Tracks which proxies are currently checked out by which users. Both Redis (for fast lookup) and PostgreSQL (for audit trail) maintain lease state.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
uuid |
PK, default uuid4 | |
user_id |
uuid |
FK → users.id, NOT NULL | |
proxy_id |
uuid |
FK → proxies.id, NOT NULL | |
acquired_at |
timestamptz |
NOT NULL, server default now() |
|
expires_at |
timestamptz |
NOT NULL | When the lease automatically releases |
is_released |
boolean |
NOT NULL, default false |
Set to true on explicit release or expiration cleanup |
Indexes:
| Name | Columns | Purpose |
|---|---|---|
ix_leases_user |
(user_id) |
List a user's active leases |
ix_leases_proxy_active |
(proxy_id, is_released) |
Check if a proxy is currently leased |
Dual state: Redis holds the lease as lease:{proxy_id} with a TTL matching expires_at. The proxy selection query excludes proxies with an active Redis lease key. The PostgreSQL record exists for audit, billing reconciliation, and cleanup if Redis state is lost.
Enum types
All enums are PostgreSQL native enums created via CREATE TYPE:
| Enum name | Values |
|---|---|
proxy_protocol |
http, https, socks4, socks5 |
proxy_status |
unchecked, active, dead |
anonymity_level |
transparent, anonymous, elite |
credit_tx_type |
purchase, acquire, refund, admin_adjust |
Migration conventions
- One migration per logical change. Don't bundle unrelated schema changes.
- Migration filenames:
NNN_descriptive_name.py(e.g.,001_initial_schema.py). - Always include both
upgrade()anddowngrade()functions. - Test migrations against a fresh database AND against a database with existing data.
- Use
alembic revision --autogenerate -m "description"for model-driven changes, but always review the generated SQL before applying.