Your API is open to the world. Without rate limiting, one bad actor can crash everyone. A single user hammering your endpoints exhausts database connections, starves other users, and eventually brings down the service. For AI agents, the stakes are higher: LLM calls cost money. A runaway loop hitting your GPT-4 endpoint can generate a $10,000 surprise bill in hours.
BackendTrafficPolicy is Envoy Gateway's extension for protecting services. It controls how traffic flows to your backends—limiting request rates, breaking circuits when services fail, and retrying transient errors. This lesson teaches you to configure these protections so your Task API survives abuse, controls costs, and recovers gracefully from failures.
By the end, you will protect your services with rate limits, configure per-user quotas using headers, implement circuit breakers that exclude failing backends, and understand when to use local versus global rate limiting.
BackendTrafficPolicy is an Envoy Gateway extension CRD that configures traffic behavior between Envoy proxies and your backend services. While HTTPRoute controls where traffic goes, BackendTrafficPolicy controls how that traffic behaves.
BackendTrafficPolicy uses targetRefs to specify which resources it applies to:
Target options:
Important constraint: A BackendTrafficPolicy can only target resources in the same namespace as the policy itself.
Local rate limiting applies limits per Envoy proxy instance. If you have 3 proxy replicas and set a limit of 100 requests/minute, each replica allows 100 requests/minute independently—total cluster capacity is 300 requests/minute.
Apply a simple rate limit to all requests:
Apply and test:
Output:
Generate load to test the limit:
Output:
The first 100 requests succeed (200). Requests 101-120 are rate limited (429 Too Many Requests).
The unit field accepts these values:
Choose based on your use case:
Global rate limits protect your infrastructure, but they punish all users equally. When one user hits the limit, everyone gets blocked. Per-user rate limits give each user their own quota.
Rate limit based on the x-user-id header:
[!NOTE] The value: "*" matches any value of the header. Each unique header value gets its own rate limit bucket.
Test with different users:
Output for alice:
Output for bob:
Alice hits her 50-request limit. Bob's quota is independent—he can still make requests.
Rate limit a specific user more aggressively:
This limits user heavy-user-123 to 10 requests/minute while other users get the default limit.
Anonymous users (no x-user-id header) should get lower limits than authenticated users. Use the invert field to match requests without a header.
Test anonymous access:
Output:
Anonymous users hit the limit at 10 requests. Authenticated users get 100.
Local rate limiting has a limitation: limits are per proxy instance. With 5 replicas at 100 requests/minute each, your actual cluster limit is 500 requests/minute. If you need strict organization-wide quotas, use global rate limiting.
All proxies query the same Redis instance. When one proxy increments the counter, all proxies see the updated value.
First, configure Envoy Gateway to use Redis:
Then create a BackendTrafficPolicy with global rate limiting:
[!IMPORTANT] Key difference: Use rateLimit.global instead of rateLimit.local.
Rate limiting protects against too many requests. Circuit breakers protect against failing backends. When a backend becomes unhealthy, the circuit breaker stops sending traffic—preventing cascade failures and giving the backend time to recover.
Limit concurrent connections and pending requests:
Field meanings:
Use hey to generate concurrent load:
Output (with circuit breaker at maxParallelRequests=10):
Only 10 requests reached the backend (matching maxParallelRequests). The other 90 failed fast with 503—protecting your backend from overload.
Envoy's default thresholds (1024 connections, 1024 pending) may be too high or too low for your workload. Size based on your backend's capacity:
Transient failures happen—network blips, brief pod restarts, temporary overload. Retry policies automatically retry failed requests so clients do not see every hiccup.
Field meanings:
[!CAUTION] Only retry idempotent operations. Retrying a POST that creates a record may create duplicates. Configure retry policies on read-heavy routes, not write routes.
You can combine rate limiting, circuit breaking, and retries in a single BackendTrafficPolicy:
Order of evaluation:
When policies target different levels (Gateway vs HTTPRoute), they merge with specific precedence.
Example: Default plus override
Gateway-level default (applies to all routes):
HTTPRoute-level override (applies only to expensive operations):
The LLM inference route gets 10 requests/minute (override). All other routes get 100 requests/minute (default).
Monitor rate limiting effectiveness with Prometheus metrics.
Rate limited requests per route:
Circuit breaker activations:
Create a panel showing rate limit vs successful requests:
Apply a rate limit and observe 429 responses:
Test:
Expected Output:
Add per-user limits using x-user-id header:
Test with different users:
Expected Output:
Configure circuit breaker and observe 503 responses under load:
Generate concurrent load:
Expected: Some 200 responses (up to maxParallelRequests), rest 503.
Query Prometheus for rate limiting data:
You built a traffic-engineer skill in Lesson 0. Based on what you learned about rate limiting and circuit breaking:
Your skill should ask:
Local rate limiting template:
Circuit breaker template:
Ask your traffic-engineer skill to generate configuration:
What you're learning: AI generates multi-rule rate limiting. Review the output—did AI use invert: true correctly for anonymous users? Are both rules in the same policy?
Check AI's output:
If something is missing, provide feedback:
Extend the configuration:
What you're learning: AI adapts existing configurations. Verify the circuit breaker fields are correct and added to the same BackendTrafficPolicy resource.
Before applying AI's configuration:
This iteration—specifying requirements, evaluating output, refining with constraints—builds production configurations safely.
Rate limiting and circuit breakers affect all traffic to your service. Test in development before production. Start with higher limits and lower circuit breaker thresholds—you can always tighten them based on observed behavior. Monitor envoy_http_ratelimit_over_limit metrics to ensure you are not blocking legitimate traffic.