9.0 KiB
Operations guide
Deployment
Docker Compose (single-server)
The simplest deployment for small-to-medium workloads. All services run on a single machine.
# Clone and configure
git clone <repo-url> proxy-pool && cd proxy-pool
cp .env.example .env
# Edit .env with production values
# Build and start
docker compose build
docker compose --profile migrate up -d migrate # Run migrations
docker compose up -d api worker # Start services
Production considerations
API scaling: Run multiple API instances behind a load balancer. The API is stateless — any instance can handle any request. In Docker Compose, use docker compose up -d --scale api=3.
Worker scaling: Typically 1-2 worker instances are sufficient. ARQ deduplicates jobs via Redis, so multiple workers don't cause duplicate work. Scale workers if validation throughput is a bottleneck.
Database: Use a managed PostgreSQL service (AWS RDS, GCP Cloud SQL, etc.) for production. Enable connection pooling (PgBouncer) if running more than ~10 API instances.
Redis: A single Redis instance is sufficient for most workloads. Enable persistence (AOF or RDB snapshots) if you want lease state to survive Redis restarts. For high availability, use Redis Sentinel or a managed Redis service.
Configuration reference
All configuration is via environment variables, parsed by pydantic-settings.
Required
| Variable | Description | Example |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | postgresql+asyncpg://user:pass@host:5432/db |
REDIS_URL |
Redis connection string | redis://host:6379/0 |
SECRET_KEY |
Used for internal signing (API key generation) | Random 64+ character string |
Application
| Variable | Default | Description |
|---|---|---|
APP_NAME |
proxy-pool |
Application name (appears in logs, OpenAPI docs) |
LOG_LEVEL |
INFO |
Logging level: DEBUG, INFO, WARNING, ERROR |
CORS_ORIGINS |
[] |
Comma-separated list of allowed CORS origins |
API_KEY_PREFIX |
pp_ |
Prefix for generated API keys |
Proxy pipeline
| Variable | Default | Description |
|---|---|---|
SCRAPE_TIMEOUT_SECONDS |
30 |
HTTP timeout when fetching proxy sources |
SCRAPE_USER_AGENT |
ProxyPool/0.1 |
User-Agent header for scrape requests |
CHECK_TCP_TIMEOUT |
5.0 |
Timeout for TCP connect checks |
CHECK_HTTP_TIMEOUT |
10.0 |
Timeout for HTTP-level checks |
CHECK_PIPELINE_TIMEOUT |
120 |
Overall pipeline timeout per proxy |
JUDGE_URL |
http://httpbin.org/ip |
URL used by the HTTP anonymity checker to determine exit IP |
REVALIDATE_ACTIVE_INTERVAL_MINUTES |
10 |
How often active proxies are re-checked |
REVALIDATE_DEAD_INTERVAL_HOURS |
6 |
How often dead proxies are re-checked |
REVALIDATE_BATCH_SIZE |
200 |
Max proxies per revalidation sweep |
POOL_LOW_THRESHOLD |
100 |
Emit proxy.pool_low event when active count drops below this |
Accounts
| Variable | Default | Description |
|---|---|---|
DEFAULT_CREDITS |
100 |
Credits granted to new accounts |
MAX_LEASE_DURATION_SECONDS |
3600 |
Maximum allowed lease duration |
DEFAULT_LEASE_DURATION_SECONDS |
300 |
Default lease duration if not specified |
CREDIT_LOW_THRESHOLD |
10 |
Emit credits.low_balance when balance drops below this |
Cleanup
| Variable | Default | Description |
|---|---|---|
PRUNE_DEAD_AFTER_DAYS |
30 |
Delete dead proxies older than this |
PRUNE_CHECKS_AFTER_DAYS |
7 |
Delete check history older than this |
PRUNE_CHECKS_KEEP_LAST |
100 |
Always keep at least this many checks per proxy |
Notifications
| Variable | Default | Description |
|---|---|---|
SMTP_HOST |
(empty) | SMTP server. If empty, SMTP notifier is disabled. |
SMTP_PORT |
587 |
SMTP port |
SMTP_USER |
(empty) | SMTP username |
SMTP_PASSWORD |
(empty) | SMTP password |
ALERT_EMAIL |
(empty) | Recipient for alert emails |
WEBHOOK_URL |
(empty) | Webhook URL. If empty, webhook notifier is disabled. |
Redis cache
| Variable | Default | Description |
|---|---|---|
CACHE_PROXY_LIST_TTL |
60 |
TTL in seconds for cached proxy query results |
CACHE_CREDIT_BALANCE_TTL |
300 |
TTL in seconds for cached credit balances |
Monitoring
Health check
curl http://localhost:8000/health
Returns 200 with connection status for PostgreSQL and Redis. Use this as a Docker/Kubernetes health check and load balancer target.
Key metrics to watch
Pool health (GET /stats/pool):
by_status.active— The number of working proxies. If this drops suddenly, investigate source failures or upstream blocks.last_scrape_at— If this is stale, the worker may be down or the scrape task is failing.last_validation_at— If this is stale, validation is backed up or the worker is stuck.
Plugin health (GET /stats/plugins):
- Check
notifiers[].healthy— if a notifier is unhealthy, alerts won't be delivered.
Worker job queue: Monitor Redis keys arq:queue:default (pending jobs) and arq:result:* (completed/failed jobs). A growing queue indicates the worker can't keep up.
Log format
Logs are structured JSON in production (LOG_LEVEL=INFO):
{
"timestamp": "2025-01-15T10:30:00Z",
"level": "INFO",
"message": "scrape_source completed",
"source_id": "abc-123",
"proxies_new": 23,
"duration_ms": 1540
}
Alerting
The built-in notification system handles operational alerts:
proxy.pool_low— Active proxy count below threshold. Action: add more sources or investigate why proxies are dying.source.failed— A scrape failed. Usually transient (upstream 503). Investigate if persistent.source.stale— A source hasn't produced results in N hours. The source may be dead or blocking your scraper.credits.low_balance/credits.exhausted— User account alerts. No operational action needed unless it's your own account.
Troubleshooting
Proxies are all dying
Symptoms: by_status.active dropping, by_status.dead increasing.
Possible causes:
- The judge URL (
JUDGE_URL) is down or rate-limiting you. Check ifhttpbin.org/ipis accessible from your server. - Your server's IP is blocked by proxy providers. Try from a different IP or use a self-hosted judge endpoint.
- Proxy sources are returning stale lists. Check
last_scraped_aton sources.
Fix: Self-host a simple judge endpoint (a Flask/FastAPI app that returns {"ip": request.remote_addr}) to eliminate dependency on httpbin.
Worker is not processing jobs
Symptoms: last_scrape_at and last_validation_at are stale. Redis queue is growing.
Check:
docker compose logs worker --tail=50
docker compose exec redis redis-cli LLEN arq:queue:default
Possible causes:
- Worker process crashed. Restart it:
docker compose restart worker. - Redis connection lost. Check Redis health:
docker compose exec redis redis-cli ping. - A task is stuck (infinite loop or hung network call). Check
CHECK_PIPELINE_TIMEOUT.
Database connections exhausted
Symptoms: asyncpg.exceptions.TooManyConnectionsError or slow queries.
Fix: Reduce the connection pool size in DATABASE_URL parameters, or deploy PgBouncer. The default asyncpg pool size is 10 connections per process — with 3 API instances and 1 worker, that's 40 connections. PostgreSQL's default limit is 100.
# In DATABASE_URL or via SQLAlchemy pool config
DATABASE_POOL_SIZE=5
DATABASE_MAX_OVERFLOW=10
Redis memory growing
Symptoms: Redis memory usage increasing over time.
Possible causes:
- ARQ job results not expiring. Check
keep_resultsetting. - Proxy cache not being invalidated. Verify
CACHE_PROXY_LIST_TTLis set. - Lease keys not expiring (should auto-expire via TTL).
Fix: Set a Redis maxmemory policy:
maxmemory 256mb
maxmemory-policy allkeys-lru
Migration failed
Symptoms: alembic upgrade head errors.
Steps:
- Check the current state:
uv run alembic current. - Look at the error — usually a constraint violation or type mismatch.
- If the migration is partially applied, you may need to manually fix the state:
uv run alembic stamp <revision>. - For production, always test migrations against a copy of the production database first.
Backup and recovery
Database backup
# Dump
docker compose exec postgres pg_dump -U proxypool proxypool > backup.sql
# Restore
docker compose exec -T postgres psql -U proxypool proxypool < backup.sql
Redis
For proxy pool, Redis data is ephemeral (cache + queue). Losing Redis state means:
- Cached proxy lists are rebuilt on next query (minor latency spike).
- Active leases are lost (the
expire_leasestask will clean up PostgreSQL state). - Pending ARQ jobs are lost (the next cron cycle will re-enqueue them).
If lease integrity is critical, enable Redis persistence (AOF recommended):
appendonly yes
appendfsync everysec