237 lines
9.0 KiB
Markdown
237 lines
9.0 KiB
Markdown
# Operations guide
|
|
|
|
## Deployment
|
|
|
|
### Docker Compose (single-server)
|
|
|
|
The simplest deployment for small-to-medium workloads. All services run on a single machine.
|
|
|
|
```bash
|
|
# Clone and configure
|
|
git clone <repo-url> proxy-pool && cd proxy-pool
|
|
cp .env.example .env
|
|
# Edit .env with production values
|
|
|
|
# Build and start
|
|
docker compose build
|
|
docker compose --profile migrate up -d migrate # Run migrations
|
|
docker compose up -d api worker # Start services
|
|
```
|
|
|
|
### Production considerations
|
|
|
|
**API scaling**: Run multiple API instances behind a load balancer. The API is stateless — any instance can handle any request. In Docker Compose, use `docker compose up -d --scale api=3`.
|
|
|
|
**Worker scaling**: Typically 1-2 worker instances are sufficient. ARQ deduplicates jobs via Redis, so multiple workers don't cause duplicate work. Scale workers if validation throughput is a bottleneck.
|
|
|
|
**Database**: Use a managed PostgreSQL service (AWS RDS, GCP Cloud SQL, etc.) for production. Enable connection pooling (PgBouncer) if running more than ~10 API instances.
|
|
|
|
**Redis**: A single Redis instance is sufficient for most workloads. Enable persistence (AOF or RDB snapshots) if you want lease state to survive Redis restarts. For high availability, use Redis Sentinel or a managed Redis service.
|
|
|
|
## Configuration reference
|
|
|
|
All configuration is via environment variables, parsed by `pydantic-settings`.
|
|
|
|
### Required
|
|
|
|
| Variable | Description | Example |
|
|
|----------|-------------|---------|
|
|
| `DATABASE_URL` | PostgreSQL connection string | `postgresql+asyncpg://user:pass@host:5432/db` |
|
|
| `REDIS_URL` | Redis connection string | `redis://host:6379/0` |
|
|
| `SECRET_KEY` | Used for internal signing (API key generation) | Random 64+ character string |
|
|
|
|
### Application
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `APP_NAME` | `proxy-pool` | Application name (appears in logs, OpenAPI docs) |
|
|
| `LOG_LEVEL` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR` |
|
|
| `CORS_ORIGINS` | `[]` | Comma-separated list of allowed CORS origins |
|
|
| `API_KEY_PREFIX` | `pp_` | Prefix for generated API keys |
|
|
|
|
### Proxy pipeline
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `SCRAPE_TIMEOUT_SECONDS` | `30` | HTTP timeout when fetching proxy sources |
|
|
| `SCRAPE_USER_AGENT` | `ProxyPool/0.1` | User-Agent header for scrape requests |
|
|
| `CHECK_TCP_TIMEOUT` | `5.0` | Timeout for TCP connect checks |
|
|
| `CHECK_HTTP_TIMEOUT` | `10.0` | Timeout for HTTP-level checks |
|
|
| `CHECK_PIPELINE_TIMEOUT` | `120` | Overall pipeline timeout per proxy |
|
|
| `JUDGE_URL` | `http://httpbin.org/ip` | URL used by the HTTP anonymity checker to determine exit IP |
|
|
| `REVALIDATE_ACTIVE_INTERVAL_MINUTES` | `10` | How often active proxies are re-checked |
|
|
| `REVALIDATE_DEAD_INTERVAL_HOURS` | `6` | How often dead proxies are re-checked |
|
|
| `REVALIDATE_BATCH_SIZE` | `200` | Max proxies per revalidation sweep |
|
|
| `POOL_LOW_THRESHOLD` | `100` | Emit `proxy.pool_low` event when active count drops below this |
|
|
|
|
### Accounts
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `DEFAULT_CREDITS` | `100` | Credits granted to new accounts |
|
|
| `MAX_LEASE_DURATION_SECONDS` | `3600` | Maximum allowed lease duration |
|
|
| `DEFAULT_LEASE_DURATION_SECONDS` | `300` | Default lease duration if not specified |
|
|
| `CREDIT_LOW_THRESHOLD` | `10` | Emit `credits.low_balance` when balance drops below this |
|
|
|
|
### Cleanup
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `PRUNE_DEAD_AFTER_DAYS` | `30` | Delete dead proxies older than this |
|
|
| `PRUNE_CHECKS_AFTER_DAYS` | `7` | Delete check history older than this |
|
|
| `PRUNE_CHECKS_KEEP_LAST` | `100` | Always keep at least this many checks per proxy |
|
|
|
|
### Notifications
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `SMTP_HOST` | (empty) | SMTP server. If empty, SMTP notifier is disabled. |
|
|
| `SMTP_PORT` | `587` | SMTP port |
|
|
| `SMTP_USER` | (empty) | SMTP username |
|
|
| `SMTP_PASSWORD` | (empty) | SMTP password |
|
|
| `ALERT_EMAIL` | (empty) | Recipient for alert emails |
|
|
| `WEBHOOK_URL` | (empty) | Webhook URL. If empty, webhook notifier is disabled. |
|
|
|
|
### Redis cache
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `CACHE_PROXY_LIST_TTL` | `60` | TTL in seconds for cached proxy query results |
|
|
| `CACHE_CREDIT_BALANCE_TTL` | `300` | TTL in seconds for cached credit balances |
|
|
|
|
## Monitoring
|
|
|
|
### Health check
|
|
|
|
```bash
|
|
curl http://localhost:8000/health
|
|
```
|
|
|
|
Returns `200` with connection status for PostgreSQL and Redis. Use this as a Docker/Kubernetes health check and load balancer target.
|
|
|
|
### Key metrics to watch
|
|
|
|
**Pool health** (`GET /stats/pool`):
|
|
- `by_status.active` — The number of working proxies. If this drops suddenly, investigate source failures or upstream blocks.
|
|
- `last_scrape_at` — If this is stale, the worker may be down or the scrape task is failing.
|
|
- `last_validation_at` — If this is stale, validation is backed up or the worker is stuck.
|
|
|
|
**Plugin health** (`GET /stats/plugins`):
|
|
- Check `notifiers[].healthy` — if a notifier is unhealthy, alerts won't be delivered.
|
|
|
|
**Worker job queue**: Monitor Redis keys `arq:queue:default` (pending jobs) and `arq:result:*` (completed/failed jobs). A growing queue indicates the worker can't keep up.
|
|
|
|
### Log format
|
|
|
|
Logs are structured JSON in production (`LOG_LEVEL=INFO`):
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2025-01-15T10:30:00Z",
|
|
"level": "INFO",
|
|
"message": "scrape_source completed",
|
|
"source_id": "abc-123",
|
|
"proxies_new": 23,
|
|
"duration_ms": 1540
|
|
}
|
|
```
|
|
|
|
### Alerting
|
|
|
|
The built-in notification system handles operational alerts:
|
|
|
|
- `proxy.pool_low` — Active proxy count below threshold. Action: add more sources or investigate why proxies are dying.
|
|
- `source.failed` — A scrape failed. Usually transient (upstream 503). Investigate if persistent.
|
|
- `source.stale` — A source hasn't produced results in N hours. The source may be dead or blocking your scraper.
|
|
- `credits.low_balance` / `credits.exhausted` — User account alerts. No operational action needed unless it's your own account.
|
|
|
|
## Troubleshooting
|
|
|
|
### Proxies are all dying
|
|
|
|
**Symptoms**: `by_status.active` dropping, `by_status.dead` increasing.
|
|
|
|
**Possible causes**:
|
|
- The judge URL (`JUDGE_URL`) is down or rate-limiting you. Check if `httpbin.org/ip` is accessible from your server.
|
|
- Your server's IP is blocked by proxy providers. Try from a different IP or use a self-hosted judge endpoint.
|
|
- Proxy sources are returning stale lists. Check `last_scraped_at` on sources.
|
|
|
|
**Fix**: Self-host a simple judge endpoint (a Flask/FastAPI app that returns `{"ip": request.remote_addr}`) to eliminate dependency on httpbin.
|
|
|
|
### Worker is not processing jobs
|
|
|
|
**Symptoms**: `last_scrape_at` and `last_validation_at` are stale. Redis queue is growing.
|
|
|
|
**Check**:
|
|
```bash
|
|
docker compose logs worker --tail=50
|
|
docker compose exec redis redis-cli LLEN arq:queue:default
|
|
```
|
|
|
|
**Possible causes**:
|
|
- Worker process crashed. Restart it: `docker compose restart worker`.
|
|
- Redis connection lost. Check Redis health: `docker compose exec redis redis-cli ping`.
|
|
- A task is stuck (infinite loop or hung network call). Check `CHECK_PIPELINE_TIMEOUT`.
|
|
|
|
### Database connections exhausted
|
|
|
|
**Symptoms**: `asyncpg.exceptions.TooManyConnectionsError` or slow queries.
|
|
|
|
**Fix**: Reduce the connection pool size in `DATABASE_URL` parameters, or deploy PgBouncer. The default asyncpg pool size is 10 connections per process — with 3 API instances and 1 worker, that's 40 connections. PostgreSQL's default limit is 100.
|
|
|
|
```env
|
|
# In DATABASE_URL or via SQLAlchemy pool config
|
|
DATABASE_POOL_SIZE=5
|
|
DATABASE_MAX_OVERFLOW=10
|
|
```
|
|
|
|
### Redis memory growing
|
|
|
|
**Symptoms**: Redis memory usage increasing over time.
|
|
|
|
**Possible causes**:
|
|
- ARQ job results not expiring. Check `keep_result` setting.
|
|
- Proxy cache not being invalidated. Verify `CACHE_PROXY_LIST_TTL` is set.
|
|
- Lease keys not expiring (should auto-expire via TTL).
|
|
|
|
**Fix**: Set a Redis `maxmemory` policy:
|
|
```
|
|
maxmemory 256mb
|
|
maxmemory-policy allkeys-lru
|
|
```
|
|
|
|
### Migration failed
|
|
|
|
**Symptoms**: `alembic upgrade head` errors.
|
|
|
|
**Steps**:
|
|
1. Check the current state: `uv run alembic current`.
|
|
2. Look at the error — usually a constraint violation or type mismatch.
|
|
3. If the migration is partially applied, you may need to manually fix the state: `uv run alembic stamp <revision>`.
|
|
4. For production, always test migrations against a copy of the production database first.
|
|
|
|
## Backup and recovery
|
|
|
|
### Database backup
|
|
|
|
```bash
|
|
# Dump
|
|
docker compose exec postgres pg_dump -U proxypool proxypool > backup.sql
|
|
|
|
# Restore
|
|
docker compose exec -T postgres psql -U proxypool proxypool < backup.sql
|
|
```
|
|
|
|
### Redis
|
|
|
|
For proxy pool, Redis data is ephemeral (cache + queue). Losing Redis state means:
|
|
- Cached proxy lists are rebuilt on next query (minor latency spike).
|
|
- Active leases are lost (the `expire_leases` task will clean up PostgreSQL state).
|
|
- Pending ARQ jobs are lost (the next cron cycle will re-enqueue them).
|
|
|
|
If lease integrity is critical, enable Redis persistence (AOF recommended):
|
|
```
|
|
appendonly yes
|
|
appendfsync everysec
|
|
```
|