Feat: Rate-limit login endpoint to block brute-force attacks

After LOGIN_MAX_FAILURES consecutive failed attempts from the same source IP within LOGIN_WINDOW_SECONDS, POST /api/v1/auth/token returns HTTP 429 with a Retry-After header for LOGIN_COOLDOWN_SECONDS. A successful login resets the counter. Trusted upstream proxy IPs/CIDRs can be declared via LOGIN_TRUSTED_PROXY_IPS so X-Forwarded-For is honoured correctly behind nginx ingress or similar reverse proxies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 21:01:37 +00:00
parent f3e0021ee8
commit 7a835d3172
18 changed files with 1320 additions and 7 deletions
--- a/specs/009-login-rate-limiting/plan.md
+++ b/specs/009-login-rate-limiting/plan.md
@@ -0,0 +1,388 @@
+# Implementation Plan: Login Brute-Force Protection
+
+**Branch**: `009-login-rate-limiting` | **Date**: 2026-05-06 | **Spec**: [spec.md](spec.md)  
+**Input**: Feature specification from `specs/009-login-rate-limiting/spec.md`
+
+## Summary
+
+Add failure-counting brute-force protection to the login endpoint (`POST /api/v1/auth/token`).
+After a configurable number of consecutive failed attempts from the same resolved client IP,
+the endpoint returns HTTP 429 with a `Retry-After` header for a configurable cooldown period.
+A successful login resets the counter. All thresholds are configurable via environment variables.
+When deployed behind a reverse proxy (nginx, Kubernetes ingress), a `LOGIN_TRUSTED_PROXY_IPS`
+setting enables extraction of the real client IP from `X-Forwarded-For`. No new infrastructure
+(no Redis, no new DB table) — counters live in process memory.
+
+---
+
+## Technical Context
+
+**Language/Version**: Python 3.12+  
+**Primary Dependencies**: FastAPI, pydantic-settings (already in use); no new dependencies added  
+**Storage**: In-memory `dict` (no persistence across restarts — intentional)  
+**Testing**: pytest + pytest-asyncio (existing test infrastructure)  
+**Target Platform**: Linux server (Docker)  
+**Project Type**: Web service (API only — this feature has no UI surface)  
+**Performance Goals**: Rate limiter adds negligible overhead (dict lookup + lock acquisition; sub-millisecond)  
+**Constraints**: Must not add new runtime service dependencies; must not change any auth behaviour for non-blocked sources  
+**Scale/Scope**: Single process, single user; in-memory store is sufficient
+
+---
+
+## Constitution Check
+
+| Principle | Status | Notes |
+|-----------|--------|-------|
+| §2.4 Auth abstraction (AuthProvider interface) | ✅ Pass | Rate limiter is a guard *before* `JWTAuthProvider.verify_credentials()`, not a bypass of the interface |
+| §2.5 DB abstraction (repository layer) | ✅ Pass | No database access; in-memory only |
+| §2.6 No speculative abstraction | ✅ Pass | Concrete `LoginRateLimiter` class, no interface; only one implementation planned |
+| §3.3 Error envelope (`detail` + `code`) | ✅ Pass | 429 response uses `{"detail": "...", "code": "login_rate_limited"}` |
+| §5.1 TDD | ✅ Required | Tasks follow red → green order |
+| §5.2 Integration tests against PostgreSQL | ✅ Pass | Integration test for the login endpoint will run against the Docker PostgreSQL stack |
+| §7.2 Environment configuration | ✅ Pass | `LOGIN_MAX_FAILURES`, `LOGIN_WINDOW_SECONDS`, `LOGIN_COOLDOWN_SECONDS`, `LOGIN_TRUSTED_PROXY_IPS` from env vars |
+| §7.3 Linting (ruff) | ✅ Required | All new files must pass `ruff check` |
+
+**Gate result**: No violations. Cleared to proceed.
+
+---
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/009-login-rate-limiting/
+├── plan.md              ← this file
+├── research.md          ← decisions on approach
+├── data-model.md        ← rate-limit record entity
+├── quickstart.md        ← curl runbook
+├── contracts/
+│   └── auth.md          ← updated POST /api/v1/auth/token with 429
+└── tasks.md             ← generated by /speckit-tasks
+```
+
+### Source Code Changes
+
+```text
+api/
+├── app/
+│   ├── auth/
+│   │   ├── rate_limiter.py          ← NEW: LoginRateLimiter class
+│   │   ├── jwt_provider.py          (unchanged)
+│   │   ├── noop.py                  (unchanged)
+│   │   └── provider.py              (unchanged)
+│   ├── config.py                    ← add login_max_failures, login_window_seconds, login_cooldown_seconds, login_trusted_proxy_ips
+│   ├── main.py                      ← init LoginRateLimiter in lifespan, attach to app.state
+│   └── routers/
+│       └── auth.py                  ← check rate limit before auth, record outcome
+└── tests/
+    ├── unit/
+    │   └── test_rate_limiter.py     ← NEW: unit tests for LoginRateLimiter logic
+    └── integration/
+        └── test_login_rate_limit.py ← NEW: integration tests for 429 behaviour via HTTP
+```
+
+---
+
+## Implementation Detail
+
+### `api/app/auth/rate_limiter.py`
+
+```python
+import ipaddress
+import logging
+import time
+from dataclasses import dataclass, field
+from ipaddress import IPv4Network, IPv6Network
+from threading import Lock
+
+from starlette.requests import Request
+
+logger = logging.getLogger(__name__)
+
+
+def get_client_ip(
+    request: Request,
+    trusted_networks: list[IPv4Network | IPv6Network],
+) -> str:
+    """Return the resolved client IP, honouring X-Forwarded-For when the
+    TCP peer is a trusted upstream proxy. Falls back to the TCP peer address
+    when no trusted networks are configured or the peer is not in the list."""
+    peer = request.client.host if request.client else "unknown"
+    if trusted_networks and peer != "unknown":
+        try:
+            peer_addr = ipaddress.ip_address(peer)
+            if any(peer_addr in net for net in trusted_networks):
+                xff = request.headers.get("X-Forwarded-For", "").split(",")[0].strip()
+                if xff:
+                    return xff
+                real_ip = request.headers.get("X-Real-IP", "").strip()
+                if real_ip:
+                    return real_ip
+        except ValueError:
+            pass
+    return peer
+
+
+@dataclass
+class _Record:
+    failures: int = 0
+    window_start: float = field(default_factory=time.time)
+    blocked_until: float = 0.0
+
+
+class LoginRateLimiter:
+    def __init__(
+        self,
+        max_failures: int = 5,
+        window_seconds: int = 300,
+        cooldown_seconds: int = 900,
+    ) -> None:
+        self._max = max_failures
+        self._window = window_seconds
+        self._cooldown = cooldown_seconds
+        self._store: dict[str, _Record] = {}
+        self._lock = Lock()
+
+    @property
+    def cooldown_seconds(self) -> int:
+        return self._cooldown
+
+    def is_blocked(self, ip: str) -> bool:
+        now = time.time()
+        with self._lock:
+            rec = self._store.get(ip)
+            if rec is None:
+                return False
+            if rec.blocked_until > now:
+                return True
+            if rec.blocked_until > 0:
+                del self._store[ip]
+            return False
+
+    def record_failure(self, ip: str) -> None:
+        now = time.time()
+        with self._lock:
+            rec = self._store.get(ip)
+            if rec is None:
+                rec = _Record(window_start=now)
+                self._store[ip] = rec
+            if now - rec.window_start > self._window:
+                rec.failures = 0
+                rec.window_start = now
+            rec.failures += 1
+            if rec.failures >= self._max:
+                rec.blocked_until = now + self._cooldown
+                logger.warning(
+                    "Login blocked for %s after %d failures", ip, rec.failures
+                )
+
+    def record_success(self, ip: str) -> None:
+        with self._lock:
+            self._store.pop(ip, None)
+```
+
+### `api/app/config.py` additions
+
+```python
+login_max_failures: int = 5
+login_window_seconds: int = 300
+login_cooldown_seconds: int = 900
+login_trusted_proxy_ips: str = ""  # comma-separated IPs/CIDRs; empty = disabled
+```
+
+### `api/app/main.py` lifespan update
+
+```python
+import ipaddress
+
+from app.auth.rate_limiter import LoginRateLimiter
+
+@asynccontextmanager
+async def lifespan(application: FastAPI):
+    settings = get_settings()
+    application.state.login_rate_limiter = LoginRateLimiter(
+        max_failures=settings.login_max_failures,
+        window_seconds=settings.login_window_seconds,
+        cooldown_seconds=settings.login_cooldown_seconds,
+    )
+    trusted_networks = []
+    for part in settings.login_trusted_proxy_ips.split(","):
+        part = part.strip()
+        if part:
+            try:
+                trusted_networks.append(ipaddress.ip_network(part, strict=False))
+            except ValueError:
+                pass  # invalid entry — skip silently
+    application.state.login_trusted_networks = trusted_networks
+    # ... existing DB setup unchanged
+    engine = get_engine()
+    async with engine.begin() as conn:
+        await conn.run_sync(Base.metadata.create_all)
+    yield
+    await engine.dispose()
+```
+
+### `api/app/routers/auth.py` update
+
+```python
+from fastapi import APIRouter, Depends, HTTPException, Request
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel
+
+from app.auth.jwt_provider import JWTAuthProvider
+from app.auth.rate_limiter import LoginRateLimiter, get_client_ip
+from app.dependencies import get_jwt_auth
+
+router = APIRouter(tags=["auth"])
+
+
+class LoginRequest(BaseModel):
+    username: str
+    password: str
+
+
+class TokenResponse(BaseModel):
+    access_token: str
+    token_type: str = "bearer"
+    expires_in: int
+
+
+@router.post("/auth/token", response_model=TokenResponse)
+async def login(
+    request: Request,
+    body: LoginRequest,
+    auth: JWTAuthProvider = Depends(get_jwt_auth),
+):
+    limiter: LoginRateLimiter = request.app.state.login_rate_limiter
+    ip: str = get_client_ip(request, request.app.state.login_trusted_networks)
+
+    if limiter.is_blocked(ip):
+        return JSONResponse(
+            status_code=429,
+            content={
+                "detail": "Too many failed login attempts. Please try again later.",
+                "code": "login_rate_limited",
+            },
+            headers={"Retry-After": str(limiter.cooldown_seconds)},
+        )
+
+    if not auth.verify_credentials(body.username, body.password):
+        limiter.record_failure(ip)
+        raise HTTPException(
+            status_code=401,
+            detail={"detail": "Invalid credentials", "code": "invalid_credentials"},
+        )
+
+    limiter.record_success(ip)
+    token = auth.create_token()
+    return TokenResponse(
+        access_token=token,
+        token_type="bearer",
+        expires_in=auth._expiry_seconds,
+    )
+```
+
+### `api/tests/unit/test_rate_limiter.py` (representative cases)
+
+```python
+import time
+import pytest
+from app.auth.rate_limiter import LoginRateLimiter
+
+
+def test_not_blocked_initially():
+    limiter = LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=300)
+    assert limiter.is_blocked("1.2.3.4") is False
+
+
+def test_blocked_after_threshold():
+    limiter = LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=300)
+    for _ in range(3):
+        limiter.record_failure("1.2.3.4")
+    assert limiter.is_blocked("1.2.3.4") is True
+
+
+def test_success_clears_failures():
+    limiter = LoginRateLimiter(max_failures=3, window_seconds=60, cooldown_seconds=300)
+    limiter.record_failure("1.2.3.4")
+    limiter.record_failure("1.2.3.4")
+    limiter.record_success("1.2.3.4")
+    assert limiter.is_blocked("1.2.3.4") is False
+
+
+def test_ips_are_isolated():
+    limiter = LoginRateLimiter(max_failures=2, window_seconds=60, cooldown_seconds=300)
+    limiter.record_failure("1.1.1.1")
+    limiter.record_failure("1.1.1.1")
+    assert limiter.is_blocked("2.2.2.2") is False
+```
+
+### `api/tests/integration/test_login_rate_limit.py` (representative cases)
+
+```python
+import pytest
+from httpx import AsyncClient
+
+# Uses the 'client' fixture (NoOpAuthProvider) from conftest — sufficient for this
+# endpoint since we're testing the rate-limit layer, not auth correctness.
+# The login endpoint instantiates its own limiter via app.state, so we need
+# the full ASGI app.
+
+BAD_CREDS = {"username": "attacker", "password": "wrong"}
+
+
+@pytest.mark.asyncio
+async def test_repeated_failures_trigger_429(client: AsyncClient):
+    # Use a custom limiter with low threshold to avoid slow tests
+    # (the app.state.login_rate_limiter is set in lifespan; override for test)
+    from app.auth.rate_limiter import LoginRateLimiter
+    from app.main import app
+    original = app.state.login_rate_limiter
+    app.state.login_rate_limiter = LoginRateLimiter(
+        max_failures=3, window_seconds=60, cooldown_seconds=30
+    )
+    try:
+        for _ in range(3):
+            await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        resp = await client.post("/api/v1/auth/token", json=BAD_CREDS)
+        assert resp.status_code == 429
+        assert resp.json()["code"] == "login_rate_limited"
+        assert "Retry-After" in resp.headers
+    finally:
+        app.state.login_rate_limiter = original
+```
+
+---
+
+## Implementation Phases
+
+### Phase 1 (MVP — P1): Blocking after repeated failures
+
+1. Add `login_max_failures`, `login_window_seconds`, `login_cooldown_seconds`, `login_trusted_proxy_ips` to `api/app/config.py`
+2. Create `api/app/auth/rate_limiter.py` with `LoginRateLimiter` and `get_client_ip()`
+3. Initialize rate limiter and parse trusted networks in `api/app/main.py` lifespan; attach both to `app.state`
+4. Update `api/app/routers/auth.py` to resolve client IP via `get_client_ip()`, then check + record outcomes
+5. Unit tests: `api/tests/unit/test_rate_limiter.py`
+6. Integration tests: `api/tests/integration/test_login_rate_limit.py`
+
+### Phase 2 (US2 — observability): Logging and response hints
+
+Delivered as part of Phase 1 (the `logger.warning(...)` call and `Retry-After` header
+are embedded in the same implementation). No separate phase needed.
+
+---
+
+## Environment Variables to Add to `.env.example`
+
+```dotenv
+# Login brute-force protection
+LOGIN_MAX_FAILURES=5
+LOGIN_WINDOW_SECONDS=300
+LOGIN_COOLDOWN_SECONDS=900
+# Comma-separated IPs/CIDRs of trusted upstream proxies (e.g. nginx ingress pod CIDR).
+# Leave empty when not behind a reverse proxy.
+LOGIN_TRUSTED_PROXY_IPS=
+```
+
+These are optional (have defaults) so existing `.env` files without them continue working.