Your agent running in Kubernetes handles thousands of requests. But how does traffic reach it? In Lesson 5, you learned about Services—the stable interfaces to your Pods. LoadBalancer Services work well for simple cases, but production systems need something more powerful: Ingress.
Ingress lets you expose HTTP and HTTPS routes from outside the cluster to services within it. Unlike LoadBalancer, which creates a cloud load balancer for every service (expensive), Ingress shares one load balancer across multiple services. You can route requests based on hostname, URL path, or both. You can terminate TLS for HTTPS. You can even implement A/B testing by routing different traffic percentages to different versions of your agent.
Think of it this way: LoadBalancer is a direct tunnel to one service. Ingress is an intelligent receptionist—it looks at your request, reads the address and directions, and routes you to the right department.
Every LoadBalancer Service you create provisions a cloud load balancer. On AWS, that's $16/month per load balancer. With 10 services, you're paying $160/month just for load balancers. Ingress shares one load balancer across all services—one $16/month bill instead of ten.
LoadBalancer gives you Layer 4 routing (TCP/UDP based on port). It knows nothing about HTTP. Ingress gives you Layer 7 routing:
With LoadBalancer, every service is exposed separately. You're managing multiple external IPs, each with different security rules. With Ingress, you have one entrypoint—one place to manage TLS certificates, one place to enforce security policies.
Ingress has two parts: a resource (declarative specification) and a controller (the process that implements it).
The Ingress resource is a Kubernetes API object—like Deployment or Service. It specifies your desired routing rules:
This says: "Listen on api.example.com, route requests to /api/v1/ to the agent-stable service, route requests to /api/v2/beta/ to the agent-experimental service."
The resource is just a specification. It does nothing by itself.
The Ingress controller reads Ingress resources and implements them. It's a Pod running in your cluster that watches for Ingress resources, then configures a real load balancer (nginx, HAProxy, cloud provider's LB) to actually route traffic.
Popular controllers:
For this chapter, we'll use nginx-ingress because it works on Docker Desktop, cloud clusters, and on-premises Kubernetes.
Install nginx-ingress using kubectl:
Output:
Verify the controller is running:
Output:
The ingress-nginx-controller is the daemon watching your cluster for Ingress resources. When you create an Ingress, this controller reads it and configures nginx to route traffic accordingly.
Check which IngressClass is available:
Output:
The nginx IngressClass is your controller. When you create an Ingress with ingressClassName: nginx, this controller takes responsibility for implementing it.
Your agent has evolved. You have a stable /api/v1/ endpoint that clients rely on, and a new /api/v2/beta/ endpoint with experimental features. Different Deployments run each version:
Output:
Expose each as a Service:
Output:
Verify both services exist:
Output:
Now create an Ingress to route traffic to both:
Save as agent-path-routing.yaml and apply:
Output:
Verify the Ingress is configured:
Output:
The ADDRESS is your localhost. Wait a moment for nginx to configure, then test:
Output:
Output:
The same Ingress gateway routes /api/v1 and /api/v2/beta to different backend services. This is the power of path-based routing—one IP, multiple APIs, each backed by independent Deployments.
Your team operates multiple services from one cluster:
Each has its own Service. Host-based Ingress routes traffic to the right Service based on the hostname:
Apply this configuration:
Output:
Verify the Ingress configuration:
Output:
Test locally by modifying /etc/hosts (or adding entries to your DNS):
Output:
Then test each hostname:
Output:
Output:
The same Ingress controller routes three different hostnames to three different services. This is the foundation of multi-tenant deployments—one gateway, many applications.
Internet traffic should be encrypted. Kubernetes TLS termination means the Ingress handles encryption/decryption, so backend services communicate in plain HTTP internally (they're protected by the network).
Output:
Kubernetes stores certificates in Secrets. Create one with your TLS key and certificate:
Output:
Verify the secret:
Output:
Modify your Ingress to reference the TLS secret:
Apply:
Output:
Test HTTPS:
Output:
(We use --insecure because the self-signed certificate isn't in your system's trust store. In production, you'd use a certificate from a trusted CA like Let's Encrypt.)
The Ingress controller (nginx) terminates TLS—it decrypts incoming HTTPS traffic and routes requests to your services over plain HTTP. This simplifies certificate management (one place to update certs) and reduces computational burden on your agent services.
Verify the TLS secret is mounted correctly:
Output:
The TLS configuration is active. The nginx controller has loaded your certificate and key from the agent-tls secret.
Your team wants to validate a new agent version with 10% of traffic while keeping 90% on the stable version. You can't do this with basic routing—you need weighted traffic splitting.
Create two Services:
Output:
With nginx-ingress, you can use the nginx.ingress.kubernetes.io/service-weights annotation:
Apply:
Output:
Send 100 requests and observe traffic distribution:
Output:
Roughly 90% reach the stable version, 10% reach the test version. This lets you validate new code with real traffic before full rollout.
Annotations let you customize Ingress behavior without changing the core specification. Common annotations for nginx-ingress:
Protect your agent from being overwhelmed by limiting requests per client:
Apply and test:
Output:
Verify the annotations are applied:
Output:
Allow browser clients from specific origins to call your agent:
Apply:
Output:
When routing fails, use these kubectl commands to diagnose:
Output:
If ADDRESS is <none>, the ingress controller hasn't assigned an IP yet (usually means services don't exist).
Output:
This shows exactly what rules are configured and which backends they target.
Output (example):
Logs show when the controller detects new Ingress resources and updates its configuration.
If the Ingress won't route traffic, verify the backend Service:
In another terminal:
Output:
If this works but Ingress routing doesn't, the problem is in the Ingress controller configuration, not the Service.
Issue: "Service not found" errors in Ingress
Check the Service exists in the correct namespace:
Output:
Issue: 503 Service Unavailable
The Ingress exists but backends are unhealthy:
Output:
If Endpoints is empty, Pods aren't running or labels don't match.
Setup: You have two agent services running in your cluster: chat-agent and tool-agent. Both listen on port 8000. You want to expose them via Ingress so:
Part 1: Ask AI for the Ingress design
Prompt AI:
Part 2: Evaluate the design
Review AI's response. Ask yourself:
Part 3: Test the design
Create the services (if not already running):
Create the TLS secret:
Apply AI's Ingress:
Test routing:
Part 4: Refinement
If routing works:
If routing fails:
Part 5: Compare to your design
When you started, you might have created separate Ingresses per service. Look at AI's consolidated design: