Feat: Rate-limit login endpoint to block brute-force attacks

After LOGIN_MAX_FAILURES consecutive failed attempts from the same source IP within LOGIN_WINDOW_SECONDS, POST /api/v1/auth/token returns HTTP 429 with a Retry-After header for LOGIN_COOLDOWN_SECONDS. A successful login resets the counter. Trusted upstream proxy IPs/CIDRs can be declared via LOGIN_TRUSTED_PROXY_IPS so X-Forwarded-For is honoured correctly behind nginx ingress or similar reverse proxies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 21:01:37 +00:00
parent f3e0021ee8
commit 7a835d3172
18 changed files with 1320 additions and 7 deletions
--- a/specs/009-login-rate-limiting/spec.md
+++ b/specs/009-login-rate-limiting/spec.md
@@ -0,0 +1,84 @@
+# Feature Specification: Login Brute-Force Protection
+
+**Feature Branch**: `009-login-rate-limiting`  
+**Created**: 2026-05-06  
+**Status**: Draft  
+**Input**: User description: "Login API endpoints should be rate limited or otherwise protected against brute force attacks"
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Repeated failed logins are blocked (Priority: P1)
+
+An attacker (or misconfigured client) sending many rapid login attempts with the wrong password is slowed or blocked before they can exhaustively guess credentials. After a threshold number of consecutive failures from the same source, the system refuses further attempts for a cooldown period and returns a clear, non-leaking error.
+
+**Why this priority**: Directly prevents credential-stuffing and brute-force attacks against the sole privileged account. Without this, the owner account is exposed to automated password guessing with no friction.
+
+**Independent Test**: Send more than the allowed number of failed login requests in quick succession and confirm that subsequent attempts are rejected with a rate-limit or lockout response — without knowing or changing the real password.
+
+**Acceptance Scenarios**:
+
+1. **Given** an attacker sends N+1 failed login attempts within the configured window, **When** the (N+1)th request arrives, **Then** the system returns an error response indicating the request is blocked (not the normal "invalid credentials" error) and does not process the login attempt.
+2. **Given** a legitimate user has been temporarily blocked after too many failures, **When** the cooldown period elapses and they retry with the correct password, **Then** they are logged in successfully.
+3. **Given** a legitimate user makes a few failed attempts and then waits beyond the cooldown window, **When** they retry within the next window, **Then** their failure counter resets and they are not blocked.
+
+---
+
+### User Story 2 - Operators can observe and reason about blocking activity (Priority: P2)
+
+When the protection triggers, the system produces enough observable signal (log entries, response metadata) that an operator can confirm the feature is working, diagnose false positives, and tune thresholds — without exposing sensitive details to the client.
+
+**Why this priority**: Invisible security controls are unmanageable. Operators need to know the system is doing what it claims, and blocked legitimate users need a clear (but not exploitable) explanation.
+
+**Independent Test**: Trigger the rate limiter and confirm that: (a) the response body or headers communicate that the request was blocked and when the client may retry; (b) the server logs an entry identifying the blocked source and the reason.
+
+**Acceptance Scenarios**:
+
+1. **Given** a source is blocked, **When** they receive the rejection response, **Then** the response indicates they should wait before retrying (e.g., a `Retry-After` hint) without disclosing the exact threshold values.
+2. **Given** the rate limiter fires, **When** an operator inspects server logs, **Then** there is a log entry at WARNING level or above recording the blocked source and timestamp.
+
+---
+
+### Edge Cases
+
+- What happens when a distributed attacker rotates IPs to avoid per-IP limits?
+- How does the system behave if the backing store for rate-limit counters is temporarily unavailable — does it fail open (allow all) or fail closed (block all)?
+- Are IPv6 addresses and IPv4-mapped-IPv6 addresses treated consistently?
+- Does a successful login reset the failure counter for that source?
+- What happens if many legitimate users share a NAT/proxy IP (e.g., corporate network)?
+- What if `TRUSTED_PROXY_IPS` is configured to include an IP that an external attacker controls? (An attacker could then spoof `X-Forwarded-For` and rotate fake source IPs to bypass the rate limiter — operators must only list genuinely trusted upstream infrastructure.)
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The system MUST enforce a maximum number of failed login attempts per source identifier (the resolved client IP address) within a rolling time window before blocking further attempts.
+- **FR-002**: Once a source exceeds the failure threshold, the system MUST reject subsequent login requests for a configurable cooldown period, returning a distinct response (not the normal invalid-credentials response).
+- **FR-003**: After the cooldown period expires, the system MUST permit the source to attempt login again, resetting its failure count.
+- **FR-004**: A successful login MUST reset the failure counter for that source, preventing accumulation of old failures from blocking future legitimate access.
+- **FR-005**: The rejection response MUST NOT reveal the specific threshold values or remaining lockout duration in a way that aids an attacker in timing their attempts, but MUST provide enough information (e.g., "try again later") for a legitimate user to understand the situation.
+- **FR-006**: The system MUST log a structured warning event whenever a source is blocked, including the source identifier and timestamp.
+- **FR-007**: Rate-limit thresholds (maximum attempts, window duration, cooldown duration) MUST be configurable without code changes.
+- **FR-008**: The system MUST support a configurable list of trusted upstream proxy IP addresses and CIDR ranges. When the TCP peer address matches a trusted proxy, the resolved client IP MUST be extracted from the `X-Forwarded-For` request header (first entry) or, if absent, `X-Real-IP`. When no trusted proxies are configured, the TCP peer address MUST be used directly and forwarded-IP headers MUST be ignored.
+
+### Key Entities
+
+- **Rate-limit record**: Tracks the number of consecutive failures and the window start time for a given source identifier; expires automatically after the cooldown period.
+- **Source identifier**: The resolved client IP address used to key rate-limit records. When `LOGIN_TRUSTED_PROXY_IPS` is empty (default), this is the TCP peer address. When one or more proxy IPs/CIDRs are configured and the TCP peer matches, the first `X-Forwarded-For` entry (or `X-Real-IP`) is used instead.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: An automated script sending 100 consecutive failed login requests completes with at least 90 of those requests rejected after the threshold is crossed — verified in a controlled test environment.
+- **SC-002**: A legitimate user who has been temporarily blocked can successfully log in within 5 minutes of the cooldown period expiring without any manual intervention.
+- **SC-003**: Zero information about threshold values or exact lockout expiry is present in blocked response bodies or headers.
+- **SC-004**: Every blocking event produces a corresponding log entry; 100% of triggered blocking events are observable in logs during testing.
+
+## Assumptions
+
+- The application has a single login endpoint used by all clients (the owner login introduced in feature 004).
+- Source identification uses the resolved client IP address. By default (when `LOGIN_TRUSTED_PROXY_IPS` is empty) this is the TCP peer address. When one or more proxy IPs/CIDRs are configured, the first entry of `X-Forwarded-For` (or `X-Real-IP`) is used instead — but only when the TCP peer is in the trusted list, preventing header spoofing by external clients.
+- If the rate-limit backing store is unavailable, the system fails open (allows the attempt through) rather than blocking all logins — this preserves the owner's access, which is critical for a single-user admin application.
+- No CAPTCHA or multi-factor step is in scope; protection is purely count/time-based.
+- The feature targets the login endpoint only; other endpoints are out of scope.
+- The single-user nature of the app means IP-based identification is sufficient — there is no need for per-username lockout, and using IP (rather than username) avoids contributing to username enumeration risk.