docs: update README.md

This commit is contained in:
agatha 2026-03-15 18:00:35 -04:00
parent 5082085c6d
commit c69336a823

343
README.md
View File

@ -1,6 +1,339 @@
# Proxy Pool # proxy-pool
Proxy Pool is a FastAPI backend that discovers, validates, and serves free proxy servers. It scrapes configurable proxy
list sources, runs a multi-stage validation pipeline to determine which proxies are alive and usable, and exposes a
query API for consumers to find and acquire working proxies.
See `docs/` for full documentation. An async proxy pool backend that discovers, validates, and serves free proxy servers. Built with FastAPI, SQLAlchemy 2.0, asyncpg, ARQ, and Redis.
## What it does
Proxy Pool automates the full lifecycle of free proxy management:
- **Discovers** proxies by scraping configurable sources on a schedule
- **Validates** proxies through a multi-stage pipeline (TCP liveness, HTTP anonymity detection, exit IP identification)
- **Serves** working proxies through a query API with filtering by protocol, country, anonymity level, score, and latency
- **Manages access** through API key authentication and a credit-based acquisition system with exclusive leases
## Architecture
The system runs as two processes sharing a PostgreSQL database and Redis instance:
- **API server** — FastAPI application handling all HTTP requests
- **ARQ worker** — Background task processor running scrape, validation, and cleanup jobs on cron schedules
A plugin system allows extending functionality without modifying core code. Plugins can add new proxy list parsers, validation methods, and notification channels.
See `docs/01-architecture.md` for the full architecture overview.
## Quick start
### Prerequisites
- Python 3.12+
- [uv](https://docs.astral.sh/uv/)
- Docker and Docker Compose
### 1. Clone and install
```bash
git clone <repo-url> proxy-pool
cd proxy-pool
uv sync --all-extras
```
### 2. Configure
```bash
cp .env.example .env
```
Edit `.env` with your settings. For local development with Docker infrastructure:
```env
DB_URL=postgresql+asyncpg://proxypool:proxypool@localhost:5432/proxypool
REDIS_URL=redis://localhost:6379/0
SECRET_KEY=change-me-to-something-random
```
### 3. Start infrastructure
```bash
docker compose up -d postgres redis
```
### 4. Run migrations
```bash
uv run alembic upgrade head
```
### 5. Start the application
```bash
# API server (with hot reload)
uv run uvicorn proxy_pool.app:create_app --factory --reload --port 8000
# In a separate terminal: background worker
uv run arq proxy_pool.worker.settings.WorkerSettings
```
The API is available at `http://localhost:8000`. Interactive docs are at `http://localhost:8000/docs`.
### 6. Add a proxy source
```bash
curl -X POST http://localhost:8000/sources \
-H "Content-Type: application/json" \
-d '{
"url": "https://raw.githubusercontent.com/TheSpeedX/PROXY-List/master/http.txt",
"parser_name": "plaintext"
}'
```
Trigger an immediate scrape:
```bash
curl -X POST http://localhost:8000/sources/{source_id}/scrape
```
### 7. Query the pool
```bash
curl "http://localhost:8000/proxies?status=active&min_score=0.5&limit=10"
```
## Docker deployment
Run the full stack in Docker:
```bash
make docker-up
```
This builds the images, runs migrations, and starts the API and worker. Services are available at `http://localhost:8000`.
To stop:
```bash
docker compose down
```
## API overview
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check (Postgres + Redis status) |
| `/sources` | GET, POST | List and create proxy sources |
| `/sources/{id}` | GET, PATCH, DELETE | Manage individual sources |
| `/sources/{id}/scrape` | POST | Trigger immediate scrape |
| `/proxies` | GET | Query proxies with filtering and sorting |
| `/proxies/{id}` | GET | Get proxy details |
| `/proxies/acquire` | POST | Acquire a proxy with an exclusive lease (requires auth) |
| `/proxies/acquire/{id}/release` | POST | Release a lease early |
| `/auth/register` | POST | Create account and get API key |
| `/auth/keys` | GET, POST | List and create API keys |
| `/auth/keys/{id}` | DELETE | Revoke an API key |
| `/account` | GET | Account info |
| `/account/credits` | GET | Credit balance and history |
See `docs/04-api-reference.md` for full request/response documentation.
## Plugin system
Proxy Pool uses a plugin architecture with three extension points:
- **Source parsers** — Extract proxies from different list formats
- **Proxy checkers** — Validate proxies in a staged pipeline
- **Notifiers** — React to system events (pool health, credit alerts)
Plugins implement Python Protocol classes — no inheritance required. Drop a `.py` file in `plugins/contrib/` with a `create_plugin(settings)` function and it's auto-discovered at startup.
### Built-in plugins
| Plugin | Type | Description |
|--------|------|-------------|
| `plaintext` | Parser | One `ip:port` per line |
| `protocol_prefix` | Parser | `protocol://ip:port` format (HTTP, SOCKS4/5) |
| `tcp_connect` | Checker (stage 1) | TCP liveness check |
| `http_anonymity` | Checker (stage 2) | Exit IP detection, anonymity classification |
| `smtp` | Notifier | Email alerts for pool health and credit events |
See `docs/02-plugin-system.md` for the full plugin development guide.
## Contributing
### Development setup
1. Fork and clone the repository.
2. Install dependencies:
```bash
uv sync --all-extras
```
3. Install pre-commit hooks:
```bash
uv tool install pre-commit
pre-commit install
pre-commit install --hook-type commit-msg
```
This sets up:
- **ruff** for linting and formatting on every commit
- **Conventional Commits** validation on commit messages
4. Start infrastructure:
```bash
docker compose up -d postgres redis
```
5. Run migrations:
```bash
uv run alembic upgrade head
```
### Commit conventions
This project uses [Conventional Commits](https://www.conventionalcommits.org/). Every commit message must follow the format:
```
type: description
Optional body explaining the change.
```
Valid types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `build`, `ci`.
Examples:
```
feat: add SOCKS5 proxy support to HTTP checker
fix: catch CancelledError in validation pipeline
docs: update API reference for acquire endpoint
test: add integration tests for credit ledger
refactor: extract proxy scoring into service layer
chore: bump httpx dependency to 0.28
```
### Branch workflow
All work happens on feature branches. Branch names mirror commit types:
```bash
git checkout -b feat/geoip-checker
git checkout -b fix/validation-timeout
git checkout -b docs/deployment-guide
```
When done, rebase onto master and fast-forward merge:
```bash
git checkout master
git merge --ff-only feat/your-feature
git branch -d feat/your-feature
```
### Running tests
```bash
# Unit tests only (fast, no infrastructure needed)
make test-unit
# Full test suite (requires Postgres + Redis running)
make test
# Full suite in Docker (completely isolated, clean database)
make test-docker
```
Integration tests require Postgres and Redis. Either run them locally with Docker infrastructure or use the Docker test stack which provides an ephemeral database.
To reset your local database between test runs:
```bash
make reset-db
```
### Adding a database migration
```bash
# Auto-generate from model changes
uv run alembic revision --autogenerate -m "description of change"
# Always review the generated migration before applying
uv run alembic upgrade head
```
### Code quality
```bash
# Lint and format check
make lint
# Auto-fix lint issues
make lint-fix
# Type checking
make typecheck
```
Ruff is configured for Python 3.12 with rules for import sorting, naming conventions, bugbear checks, and code simplification. Mypy runs in strict mode with the Pydantic plugin.
### Project structure
```
proxy-pool/
├── src/proxy_pool/ # Application source code
│ ├── app.py # FastAPI app factory
│ ├── config.py # Settings (env-driven, grouped)
│ ├── common/ # Shared dependencies, schemas
│ ├── db/ # SQLAlchemy base, session factory
│ ├── proxy/ # Proxy domain (models, routes, service)
│ ├── accounts/ # Accounts domain (auth, credits, keys)
│ ├── plugins/ # Plugin system + built-in plugins
│ └── worker/ # ARQ task definitions
├── tests/ # Test suite (unit, integration, plugins)
├── alembic/ # Database migrations
├── docs/ # Architecture and reference documentation
├── Dockerfile # Production image
├── Dockerfile.test # Test image with dev dependencies
├── docker-compose.yml # Full stack (API, worker, Postgres, Redis)
└── docker-compose.test.yml # Test override (ephemeral database)
```
### Makefile reference
| Command | Description |
|---------|-------------|
| `make dev` | Start API with hot reload |
| `make worker` | Start ARQ worker |
| `make test` | Run full test suite locally |
| `make test-unit` | Run unit tests only |
| `make test-docker` | Run tests in Docker (clean DB) |
| `make lint` | Check linting and formatting |
| `make lint-fix` | Auto-fix lint issues |
| `make typecheck` | Run mypy |
| `make migrate` | Apply database migrations |
| `make reset-db` | Drop and recreate local database |
| `make docker-up` | Build and start full Docker stack |
| `make docker-down` | Stop Docker stack |
| `make docker-logs` | Tail API and worker logs |
## Documentation
Detailed documentation lives in the `docs/` directory:
| Document | Contents |
|----------|----------|
| `01-architecture.md` | System overview, components, data flow, deployment |
| `02-plugin-system.md` | Plugin protocols, registry, discovery, how to write plugins |
| `03-database-schema.md` | Every table, column, index with design rationale |
| `04-api-reference.md` | All endpoints with request/response examples |
| `05-worker-tasks.md` | Background tasks, schedules, retry behavior |
| `06-development-guide.md` | Setup, workflow, common tasks |
| `07-operations-guide.md` | Configuration, monitoring, troubleshooting |
## License
TBD