proxy-pool/docs/07-operations-guide.md

9.0 KiB

Operations guide

Deployment

Docker Compose (single-server)

The simplest deployment for small-to-medium workloads. All services run on a single machine.

# Clone and configure
git clone <repo-url> proxy-pool && cd proxy-pool
cp .env.example .env
# Edit .env with production values

# Build and start
docker compose build
docker compose --profile migrate up -d migrate   # Run migrations
docker compose up -d api worker                    # Start services

Production considerations

API scaling: Run multiple API instances behind a load balancer. The API is stateless — any instance can handle any request. In Docker Compose, use docker compose up -d --scale api=3.

Worker scaling: Typically 1-2 worker instances are sufficient. ARQ deduplicates jobs via Redis, so multiple workers don't cause duplicate work. Scale workers if validation throughput is a bottleneck.

Database: Use a managed PostgreSQL service (AWS RDS, GCP Cloud SQL, etc.) for production. Enable connection pooling (PgBouncer) if running more than ~10 API instances.

Redis: A single Redis instance is sufficient for most workloads. Enable persistence (AOF or RDB snapshots) if you want lease state to survive Redis restarts. For high availability, use Redis Sentinel or a managed Redis service.

Configuration reference

All configuration is via environment variables, parsed by pydantic-settings.

Required

Variable Description Example
DATABASE_URL PostgreSQL connection string postgresql+asyncpg://user:pass@host:5432/db
REDIS_URL Redis connection string redis://host:6379/0
SECRET_KEY Used for internal signing (API key generation) Random 64+ character string

Application

Variable Default Description
APP_NAME proxy-pool Application name (appears in logs, OpenAPI docs)
LOG_LEVEL INFO Logging level: DEBUG, INFO, WARNING, ERROR
CORS_ORIGINS [] Comma-separated list of allowed CORS origins
API_KEY_PREFIX pp_ Prefix for generated API keys

Proxy pipeline

Variable Default Description
SCRAPE_TIMEOUT_SECONDS 30 HTTP timeout when fetching proxy sources
SCRAPE_USER_AGENT ProxyPool/0.1 User-Agent header for scrape requests
CHECK_TCP_TIMEOUT 5.0 Timeout for TCP connect checks
CHECK_HTTP_TIMEOUT 10.0 Timeout for HTTP-level checks
CHECK_PIPELINE_TIMEOUT 120 Overall pipeline timeout per proxy
JUDGE_URL http://httpbin.org/ip URL used by the HTTP anonymity checker to determine exit IP
REVALIDATE_ACTIVE_INTERVAL_MINUTES 10 How often active proxies are re-checked
REVALIDATE_DEAD_INTERVAL_HOURS 6 How often dead proxies are re-checked
REVALIDATE_BATCH_SIZE 200 Max proxies per revalidation sweep
POOL_LOW_THRESHOLD 100 Emit proxy.pool_low event when active count drops below this

Accounts

Variable Default Description
DEFAULT_CREDITS 100 Credits granted to new accounts
MAX_LEASE_DURATION_SECONDS 3600 Maximum allowed lease duration
DEFAULT_LEASE_DURATION_SECONDS 300 Default lease duration if not specified
CREDIT_LOW_THRESHOLD 10 Emit credits.low_balance when balance drops below this

Cleanup

Variable Default Description
PRUNE_DEAD_AFTER_DAYS 30 Delete dead proxies older than this
PRUNE_CHECKS_AFTER_DAYS 7 Delete check history older than this
PRUNE_CHECKS_KEEP_LAST 100 Always keep at least this many checks per proxy

Notifications

Variable Default Description
SMTP_HOST (empty) SMTP server. If empty, SMTP notifier is disabled.
SMTP_PORT 587 SMTP port
SMTP_USER (empty) SMTP username
SMTP_PASSWORD (empty) SMTP password
ALERT_EMAIL (empty) Recipient for alert emails
WEBHOOK_URL (empty) Webhook URL. If empty, webhook notifier is disabled.

Redis cache

Variable Default Description
CACHE_PROXY_LIST_TTL 60 TTL in seconds for cached proxy query results
CACHE_CREDIT_BALANCE_TTL 300 TTL in seconds for cached credit balances

Monitoring

Health check

curl http://localhost:8000/health

Returns 200 with connection status for PostgreSQL and Redis. Use this as a Docker/Kubernetes health check and load balancer target.

Key metrics to watch

Pool health (GET /stats/pool):

  • by_status.active — The number of working proxies. If this drops suddenly, investigate source failures or upstream blocks.
  • last_scrape_at — If this is stale, the worker may be down or the scrape task is failing.
  • last_validation_at — If this is stale, validation is backed up or the worker is stuck.

Plugin health (GET /stats/plugins):

  • Check notifiers[].healthy — if a notifier is unhealthy, alerts won't be delivered.

Worker job queue: Monitor Redis keys arq:queue:default (pending jobs) and arq:result:* (completed/failed jobs). A growing queue indicates the worker can't keep up.

Log format

Logs are structured JSON in production (LOG_LEVEL=INFO):

{
  "timestamp": "2025-01-15T10:30:00Z",
  "level": "INFO",
  "message": "scrape_source completed",
  "source_id": "abc-123",
  "proxies_new": 23,
  "duration_ms": 1540
}

Alerting

The built-in notification system handles operational alerts:

  • proxy.pool_low — Active proxy count below threshold. Action: add more sources or investigate why proxies are dying.
  • source.failed — A scrape failed. Usually transient (upstream 503). Investigate if persistent.
  • source.stale — A source hasn't produced results in N hours. The source may be dead or blocking your scraper.
  • credits.low_balance / credits.exhausted — User account alerts. No operational action needed unless it's your own account.

Troubleshooting

Proxies are all dying

Symptoms: by_status.active dropping, by_status.dead increasing.

Possible causes:

  • The judge URL (JUDGE_URL) is down or rate-limiting you. Check if httpbin.org/ip is accessible from your server.
  • Your server's IP is blocked by proxy providers. Try from a different IP or use a self-hosted judge endpoint.
  • Proxy sources are returning stale lists. Check last_scraped_at on sources.

Fix: Self-host a simple judge endpoint (a Flask/FastAPI app that returns {"ip": request.remote_addr}) to eliminate dependency on httpbin.

Worker is not processing jobs

Symptoms: last_scrape_at and last_validation_at are stale. Redis queue is growing.

Check:

docker compose logs worker --tail=50
docker compose exec redis redis-cli LLEN arq:queue:default

Possible causes:

  • Worker process crashed. Restart it: docker compose restart worker.
  • Redis connection lost. Check Redis health: docker compose exec redis redis-cli ping.
  • A task is stuck (infinite loop or hung network call). Check CHECK_PIPELINE_TIMEOUT.

Database connections exhausted

Symptoms: asyncpg.exceptions.TooManyConnectionsError or slow queries.

Fix: Reduce the connection pool size in DATABASE_URL parameters, or deploy PgBouncer. The default asyncpg pool size is 10 connections per process — with 3 API instances and 1 worker, that's 40 connections. PostgreSQL's default limit is 100.

# In DATABASE_URL or via SQLAlchemy pool config
DATABASE_POOL_SIZE=5
DATABASE_MAX_OVERFLOW=10

Redis memory growing

Symptoms: Redis memory usage increasing over time.

Possible causes:

  • ARQ job results not expiring. Check keep_result setting.
  • Proxy cache not being invalidated. Verify CACHE_PROXY_LIST_TTL is set.
  • Lease keys not expiring (should auto-expire via TTL).

Fix: Set a Redis maxmemory policy:

maxmemory 256mb
maxmemory-policy allkeys-lru

Migration failed

Symptoms: alembic upgrade head errors.

Steps:

  1. Check the current state: uv run alembic current.
  2. Look at the error — usually a constraint violation or type mismatch.
  3. If the migration is partially applied, you may need to manually fix the state: uv run alembic stamp <revision>.
  4. For production, always test migrations against a copy of the production database first.

Backup and recovery

Database backup

# Dump
docker compose exec postgres pg_dump -U proxypool proxypool > backup.sql

# Restore
docker compose exec -T postgres psql -U proxypool proxypool < backup.sql

Redis

For proxy pool, Redis data is ephemeral (cache + queue). Losing Redis state means:

  • Cached proxy lists are rebuilt on next query (minor latency spike).
  • Active leases are lost (the expire_leases task will clean up PostgreSQL state).
  • Pending ARQ jobs are lost (the next cron cycle will re-enqueue them).

If lease integrity is critical, enable Redis persistence (AOF recommended):

appendonly yes
appendfsync everysec