Who is Muhammad Usman Akbar?

Muhammad Usman Akbar is a world-class AI Transformation Consultant and Agentic Architect focused on achieving 30x industrial efficiency through autonomous ecosystems.

What results can an AI Transformation Consultant provide?

By replacing manual work with autonomous AI workflows, a consultant like Muhammad Usman can deliver up to 30x growth in output while reducing operational overhead by 40%.

What is Agentic AI Orchestration?

It is the engineering of multi-agent systems where autonomous AI entities collaborate to manage complex industrial operations in production environments.

Rate Limiting & Circuit Breaking

Your API is open to the world. Without rate limiting, one bad actor can crash everyone. A single user hammering your endpoints exhausts database connections, starves other users, and eventually brings down the service. For AI agents, the stakes are higher: LLM calls cost money. A runaway loop hitting your GPT-4 endpoint can generate a $10,000 surprise bill in hours.

BackendTrafficPolicy is Envoy Gateway's extension for protecting services. It controls how traffic flows to your backends—limiting request rates, breaking circuits when services fail, and retrying transient errors. This lesson teaches you to configure these protections so your Task API survives abuse, controls costs, and recovers gracefully from failures.

By the end, you will protect your services with rate limits, configure per-user quotas using headers, implement circuit breakers that exclude failing backends, and understand when to use local versus global rate limiting.

Understanding BackendTrafficPolicy

BackendTrafficPolicy is an Envoy Gateway extension CRD that configures traffic behavior between Envoy proxies and your backend services. While HTTPRoute controls where traffic goes, BackendTrafficPolicy controls how that traffic behaves.

What BackendTrafficPolicy Controls

Feature	Purpose
Rate Limiting	Limit requests per time unit
Circuit Breaker	Stop sending to failing backends
Retry Policy	Automatically retry failed requests
Timeouts	Fail requests that take too long
Load Balancing	Control backend selection strategy

How Policy Targeting Works

BackendTrafficPolicy uses targetRefs to specify which resources it applies to:

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: my-policy
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route

Target options:

Target Kind	Effect
HTTPRoute	Policy applies to that specific route
Gateway	Policy applies to all routes through that Gateway
GRPCRoute	Policy applies to gRPC traffic

Important constraint: A BackendTrafficPolicy can only target resources in the same namespace as the policy itself.

Local Rate Limiting

Local rate limiting applies limits per Envoy proxy instance. If you have 3 proxy replicas and set a limit of 100 requests/minute, each replica allows 100 requests/minute independently—total cluster capacity is 300 requests/minute.

Basic Local Rate Limit

Apply a simple rate limit to all requests:

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-ratelimit
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  rateLimit:
    local:
      rules:
        - limit:
            requests: 100
            unit: Minute

Apply and test:

bash

kubectl apply -f task-api-ratelimit.yaml

Output:

text

backendtrafficpolicy.gateway.envoyproxy.io/task-api-ratelimit created

Generate load to test the limit:

bash

for i in {1..120}; do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/api/tasks
done | sort | uniq -c

Output:

text

100 200
     20 429

The first 100 requests succeed (200). Requests 101-120 are rate limited (429 Too Many Requests).

Understanding Rate Limit Units

The unit field accepts these values:

Unit	Meaning
Second	Requests per second
Minute	Requests per minute
Hour	Requests per hour

Choose based on your use case:

API endpoints: Minute (smooth traffic over time)
Health checks: Second (low volume, quick reset)
Expensive operations: Hour (strict daily quotas)

Per-User Rate Limits

Global rate limits protect your infrastructure, but they punish all users equally. When one user hits the limit, everyone gets blocked. Per-user rate limits give each user their own quota.

Using Header-Based Descriptors

Rate limit based on the x-user-id header:

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-per-user
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  rateLimit:
    local:
      rules:
        - clientSelectors:
            - headers:
                - name: x-user-id
                  value: "*"
          limit:
            requests: 50
            unit: Minute

[!NOTE] The value: "*" matches any value of the header. Each unique header value gets its own rate limit bucket.

Test with different users:

bash

# User "alice" makes 60 requests
for i in {1..60}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -H "x-user-id: alice" http://localhost:8080/api/tasks
done | tail -15

# User "bob" makes 60 requests (independent quota)
for i in {1..60}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -H "x-user-id: bob" http://localhost:8080/api/tasks
done | tail -15

Output for alice:

text

Output for bob:

text

Alice hits her 50-request limit. Bob's quota is independent—he can still make requests.

Matching Specific Header Values

Rate limit a specific user more aggressively:

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: heavy-user-limit
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  rateLimit:
    local:
      rules:
        - clientSelectors:
            - headers:
                - name: x-user-id
                  value: "heavy-user-123"
          limit:
            requests: 10
            unit: Minute

This limits user heavy-user-123 to 10 requests/minute while other users get the default limit.

Anonymous vs Authenticated Users

Anonymous users (no x-user-id header) should get lower limits than authenticated users. Use the invert field to match requests without a header.

Different Limits by Authentication Status

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-tiered
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  rateLimit:
    local:
      rules:
        # Authenticated users: 100 requests/minute
        - clientSelectors:
            - headers:
                - name: x-user-id
                  value: "*"
          limit:
            requests: 100
            unit: Minute
        # Anonymous users (no x-user-id header): 10 requests/minute
        - clientSelectors:
            - headers:
                - name: x-user-id
                  value: "*"
                  invert: true
          limit:
            requests: 10
            unit: Minute

Test anonymous access:

bash

# No x-user-id header
for i in {1..15}; do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/api/tasks
done

Output:

text

Anonymous users hit the limit at 10 requests. Authenticated users get 100.

Global Rate Limiting

Local rate limiting has a limitation: limits are per proxy instance. With 5 replicas at 100 requests/minute each, your actual cluster limit is 500 requests/minute. If you need strict organization-wide quotas, use global rate limiting.

How Global Rate Limiting Works

Rendering diagram...

All proxies query the same Redis instance. When one proxy increments the counter, all proxies see the updated value.

Configuring Global Rate Limiting

First, configure Envoy Gateway to use Redis:

yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-gateway-config
  namespace: envoy-gateway-system
data:
  envoy-gateway.yaml: |
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyGateway
    rateLimit:
      backend:
        type: Redis
        redis:
          url: redis.redis-system.svc.cluster.local:6379

Then create a BackendTrafficPolicy with global rate limiting:

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-global
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  rateLimit:
    global:
      rules:
        - clientSelectors:
            - headers:
                - name: x-user-id
                  value: "*"
          limit:
            requests: 100
            unit: Minute

[!IMPORTANT] Key difference: Use rateLimit.global instead of rateLimit.local.

When to Use Global vs Local

Scenario	Recommendation
Development/testing	Local (simpler, no Redis needed)
Strict API quotas	Global (exact limits across cluster)
High availability priority	Local (no Redis dependency)
Billing-based limits	Global (accurate usage tracking)
Cost protection for LLM APIs	Global (prevent runaway spending)

Circuit Breaker Pattern

Rate limiting protects against too many requests. Circuit breakers protect against failing backends. When a backend becomes unhealthy, the circuit breaker stops sending traffic—preventing cascade failures and giving the backend time to recover.

How Circuit Breakers Work

State	Condition	Action
Circuit Closed	Normal operations; backend healthy	Requests forwarded to backend
Circuit Open	Failure threshold exceeded	Requests fail fast with 503; backend excluded
Recovery	Probing health; timeout passed	Limited requests allowed to verify health

Configuring Circuit Breaker

Limit concurrent connections and pending requests:

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-circuitbreaker
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  circuitBreaker:
    maxConnections: 100
    maxParallelRequests: 50
    maxPendingRequests: 10

Field meanings:

Field	Description	When Exceeded
maxConnections	Maximum TCP connections to backend	New connections rejected
maxParallelRequests	Maximum concurrent HTTP requests	Requests wait or fail
maxPendingRequests	Maximum queued requests	Returns 503 immediately

Testing Circuit Breaker Behavior

Use hey to generate concurrent load:

bash

# Install hey load testing tool
go install github.com/rakyll/hey@latest

# Generate 100 concurrent requests with 10 second delay
hey -n 100 -c 100 -host "www.example.com" \
  http://localhost:8080/api/tasks?delay=10s

Output (with circuit breaker at maxParallelRequests=10):

text

Summary:
  Total:        10.5 secs
  Slowest:      10.2 secs
  Fastest:      0.001 secs
  Average:      1.0 secs

Status code distribution:
  [200] 10 responses
  [503] 90 responses

Only 10 requests reached the backend (matching maxParallelRequests). The other 90 failed fast with 503—protecting your backend from overload.

Circuit Breaker Sizing

Envoy's default thresholds (1024 connections, 1024 pending) may be too high or too low for your workload. Size based on your backend's capacity:

Backend Type	Recommended maxConnections	Recommended maxParallelRequests
Single container	50-100	25-50
Replicated (3 pods)	150-300	75-150
Database-backed API	20-50	10-25
LLM inference endpoint	5-20	5-10

Retry Policy

Transient failures happen—network blips, brief pod restarts, temporary overload. Retry policies automatically retry failed requests so clients do not see every hiccup.

Configuring Automatic Retries

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-retry
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  retry:
    numRetries: 3
    perRetry:
      backOff:
        baseInterval: 100ms
        maxInterval: 1s
      timeout: 500ms
    retryOn:
      httpStatusCodes:
        - 500
        - 502
        - 503
        - 504
      triggers:
        - connect-failure
        - retriable-status-codes

Field meanings:

Field	Description
numRetries	Maximum retry attempts
perRetry.backOff.baseInterval	Initial delay between retries
perRetry.backOff.maxInterval	Maximum delay (exponential backoff caps here)
perRetry.timeout	Timeout for each individual retry attempt
retryOn.httpStatusCodes	Which status codes trigger retry
retryOn.triggers	Which failure types trigger retry

Retry Triggers

Trigger	Meaning
connect-failure	TCP connection failed
retriable-status-codes	Status code matched httpStatusCodes list
reset	Connection reset by backend
refused-stream	HTTP/2 stream refused

Retry Safety

[!CAUTION] Only retry idempotent operations. Retrying a POST that creates a record may create duplicates. Configure retry policies on read-heavy routes, not write routes.

HTTP Method	Safe to Retry	Reason
GET	Yes	Read-only; no state change
HEAD	Yes	Read-only
PUT	Yes	Idempotent full replacement
POST	No	May create duplicate records
DELETE	No	May fail on second attempt

Combining Policies

You can combine rate limiting, circuit breaking, and retries in a single BackendTrafficPolicy:

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-comprehensive
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  # Rate limiting
  rateLimit:
    local:
      rules:
        - limit:
            requests: 100
            unit: Minute
  # Circuit breaker
  circuitBreaker:
    maxConnections: 100
    maxParallelRequests: 50
    maxPendingRequests: 10
  # Retry policy
  retry:
    numRetries: 2
    perRetry:
      backOff:
        baseInterval: 50ms
        maxInterval: 500ms
      timeout: 250ms
    retryOn:
      httpStatusCodes:
        - 503
      triggers:
        - connect-failure

Order of evaluation:

Rate limit check (429 if exceeded)
Circuit breaker check (503 if circuit open)
Forward to backend
Retry on failure (if policy matches)

Policy Merging

When policies target different levels (Gateway vs HTTPRoute), they merge with specific precedence.

Merging Behavior

Policy Level	Precedence	Use Case
HTTPRoute	Higher (overrides)	Route-specific settings
Gateway	Lower (defaults)	Cluster-wide defaults

Example: Default plus override

Gateway-level default (applies to all routes):

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: gateway-defaults
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: task-api-gateway
  rateLimit:
    local:
      rules:
        - limit:
            requests: 100
            unit: Minute

HTTPRoute-level override (applies only to expensive operations):

yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: llm-route-override
  namespace: task-api
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: llm-inference-route
  rateLimit:
    local:
      rules:
        - limit:
            requests: 10
            unit: Minute

The LLM inference route gets 10 requests/minute (override). All other routes get 100 requests/minute (default).

Observing Rate Limiting

Monitor rate limiting effectiveness with Prometheus metrics.

Key Metrics

Metric	Meaning
envoy_http_ratelimit_over_limit	Requests that exceeded rate limit
envoy_cluster_circuit_breakers_cx_open	Circuit breaker open (connections)
envoy_cluster_circuit_breakers_rq_pending_open	Circuit breaker open (pending requests)

Prometheus Query Examples

Rate limited requests per route:

promql

sum(rate(envoy_http_ratelimit_over_limit[5m])) by (route_name)

Circuit breaker activations:

promql

sum(rate(envoy_cluster_circuit_breakers_cx_open[5m])) by (envoy_cluster_name)

Grafana Dashboard Panel

Create a panel showing rate limit vs successful requests:

promql

# Successful requests
sum(rate(envoy_http_downstream_rq_completed{response_code="200"}[5m]))

# Rate limited requests
sum(rate(envoy_http_ratelimit_over_limit[5m]))

Exercises

Exercise 1: Apply Basic Rate Limit

Apply a rate limit and observe 429 responses:

bash

kubectl apply -f - <<EOF
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: exercise-ratelimit
  namespace: default
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  rateLimit:
    local:
      rules:
        - limit:
            requests: 5
            unit: Minute
EOF

Test:

bash

for i in {1..10}; do
  curl -s -o /dev/null -w "%{http_code} " http://localhost:8080/api/tasks
done
echo

Expected Output:

text

200 200 200 200 200 429 429 429 429 429

Exercise 2: Per-User Rate Limits

Add per-user limits using x-user-id header:

bash

kubectl apply -f - <<EOF
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: exercise-peruser
  namespace: default
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  rateLimit:
    local:
      rules:
        - clientSelectors:
            - headers:
                - name: x-user-id
                  value: "*"
          limit:
            requests: 3
            unit: Minute
EOF

Test with different users:

bash

# User alice
for i in {1..5}; do
  curl -s -o /dev/null -w "%{http_code} " \
    -H "x-user-id: alice" http://localhost:8080/api/tasks
done
echo "alice"

# User bob (independent quota)
for i in {1..5}; do
  curl -s -o /dev/null -w "%{http_code} " \
    -H "x-user-id: bob" http://localhost:8080/api/tasks
done
echo "bob"

Expected Output:

text

200 200 200 429 429 alice
200 200 200 429 429 bob

Exercise 3: Circuit Breaker Under Load

Configure circuit breaker and observe 503 responses under load:

bash

kubectl apply -f - <<EOF
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: exercise-circuit
  namespace: default
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: task-api-route
  circuitBreaker:
    maxParallelRequests: 5
    maxPendingRequests: 0
EOF

Generate concurrent load:

bash

# Run 20 concurrent requests
for i in {1..20}; do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/api/tasks?delay=1s &
done
wait

Expected: Some 200 responses (up to maxParallelRequests), rest 503.

Exercise 4: View Rate Limit Metrics

Query Prometheus for rate limiting data:

bash

# Port-forward to Prometheus
kubectl port-forward -n monitoring svc/prometheus 9090:9090 &

# Query rate limited requests
curl -s "http://localhost:9090/api/v1/query?query=envoy_http_ratelimit_over_limit" | jq

Reflect on Your Skill

You built a traffic-engineer skill in Lesson 0. Based on what you learned about rate limiting and circuit breaking:

Add Rate Limiting Decision Logic

Your skill should ask:

Question	If Yes	If No
Need strict cluster-wide quota?	Use global (Redis)	Use local
Different user tiers?	Add per-user limits with headers	Use uniform limit
Protecting expensive LLM endpoint?	Aggressive limits (10/min)	Standard limits (100/min)

Add Policy Templates

Local rate limiting template:

yaml

# Template: local-ratelimit
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: {{ service }}-ratelimit
  namespace: {{ namespace }}
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: {{ route }}
  rateLimit:
    local:
      rules:
        - limit:
            requests: {{ requests }}
            unit: {{ unit }}

Circuit breaker template:

yaml

# Template: circuit-breaker
circuitBreaker:
  maxConnections: {{ max_connections }}
  maxParallelRequests: {{ max_parallel }}
  maxPendingRequests: {{ max_pending }}

Update Troubleshooting Guidance

Symptom	Check	Likely Cause
429 on all requests	Rate limit too low	Increase limit
503 under normal load	Circuit breaker too aggressive	Increase maxParallelRequests
No rate limiting	Policy not attached	Check targetRefs

Try With AI

Generate Rate Limiting Configuration

Ask your traffic-engineer skill to generate configuration:

text

Using my traffic-engineer skill, generate BackendTrafficPolicy for my Task API:
- 100 requests/minute per user (based on x-user-id header)
- 10 requests/minute for anonymous users (no header)
- Apply to the task-api-route HTTPRoute in task-api namespace

What you're learning: AI generates multi-rule rate limiting. Review the output—did AI use invert: true correctly for anonymous users? Are both rules in the same policy?

Evaluate the Configuration

Check AI's output:

Does it use clientSelectors with headers?
Is the invert: true field on the anonymous user rule?
Are the limits in the correct order (authenticated first, anonymous second)?

If something is missing, provide feedback:

text

The anonymous user rule needs invert: true on the x-user-id header to match requests WITHOUT that header. Please fix.

Add Circuit Breaker Protection

Extend the configuration:

text

Add circuit breaker protection to this policy:
- Maximum 50 concurrent connections
- Maximum 25 parallel requests
- Maximum 5 pending requests (fail fast after that)

What you're learning: AI adapts existing configurations. Verify the circuit breaker fields are correct and added to the same BackendTrafficPolicy resource.

Validate Before Applying

Before applying AI's configuration:

bash

# Validate YAML syntax
kubectl apply --dry-run=client -f policy.yaml

# Check for common errors
kubectl apply --dry-run=server -f policy.yaml

This iteration—specifying requirements, evaluating output, refining with constraints—builds production configurations safely.

Safety Note

Rate limiting and circuit breakers affect all traffic to your service. Test in development before production. Start with higher limits and lower circuit breaker thresholds—you can always tighten them based on observed behavior. Monitor envoy_http_ratelimit_over_limit metrics to ensure you are not blocking legitimate traffic.