Every production API needs HTTPS. Without TLS, credentials travel in plaintext, session tokens can be intercepted, and users see browser warnings that destroy trust. For AI agents, the stakes are higher: API keys for LLM providers, user authentication tokens, and sensitive business data all flow through your endpoints. A single intercepted request could expose thousands of dollars in API credits or compromise user accounts.
Manual certificate management does not scale. Certificates expire every 90 days with Let's Encrypt, every year with traditional CAs. Forgetting to renew crashes your production service at 3 AM. cert-manager automates the entire lifecycle: issuing certificates when you create TLS listeners, renewing them before expiration, and updating secrets without downtime.
This lesson installs cert-manager, configures it to issue certificates from Let's Encrypt, and connects it to your Gateway API infrastructure. By the end, your Task API will serve HTTPS traffic with automatically renewed certificates, and you will understand how the ACME protocol proves domain ownership without manual intervention.
Before installing anything, understand the flow from Gateway creation to HTTPS traffic. Three components collaborate: cert-manager (certificate lifecycle), Let's Encrypt (certificate authority), and Envoy Gateway (TLS termination).
Let's Encrypt uses ACME (Automatic Certificate Management Environment) to verify you control the domain before issuing certificates. The HTTP-01 challenge works like this:
This entire process takes 30-90 seconds and requires no manual intervention.
Certificates end up in Kubernetes Secrets:
Envoy Gateway watches these secrets. When cert-manager updates the secret with a renewed certificate, Envoy proxies pick up the new certificate within seconds—no pod restarts required.
cert-manager is distributed as a Helm chart. The installation includes CRDs for Certificate, ClusterIssuer, and other resources, plus the controller that manages the certificate lifecycle.
Add the Jetstack Helm repository:
Output:
Install cert-manager with Gateway API support:
Output:
The --set config.enableGatewayAPI=true flag is critical. Without it, cert-manager ignores Gateway resources and only watches Ingress resources.
Wait for cert-manager to become available:
Output:
Verify all cert-manager components are running:
Output:
Three pods run the cert-manager components:
Before cert-manager can issue certificates, it needs to know how. A ClusterIssuer defines the certificate authority and authentication method. For Let's Encrypt, this means configuring the ACME protocol.
Let's Encrypt provides two endpoints:
Always test with staging first. Production rate limits can lock you out for a week if you misconfigure and retry repeatedly.
Start with staging to validate your configuration. Create clusterissuer-staging.yaml:
Field meanings:
Apply the ClusterIssuer:
Output:
Verify the ClusterIssuer is ready:
Output:
Check detailed status:
Output:
The ACME account was registered message confirms cert-manager successfully authenticated with Let's Encrypt.
Once staging works, create clusterissuer-production.yaml:
Apply when ready for production:
Output:
With cert-manager installed and ClusterIssuer configured, update your Gateway to request certificates automatically. This requires two changes: adding cert-manager annotations and configuring a TLS listener.
Create task-api-gateway-tls.yaml:
Key configuration points:
Apply the Gateway:
Output:
When you apply the annotated Gateway, cert-manager's gateway-shim:
Watch the certificate creation:
Output (over 30-60 seconds):
The certificate transitions from READY: False to READY: True when issuance completes.
After the certificate is issued, verify HTTPS works end-to-end.
Output:
Key status fields:
Output:
View certificate details:
Output:
The (STAGING) in the issuer confirms this is a staging certificate. Production certificates show R10 or similar without the staging prefix.
For local testing with port-forward:
Test with curl (staging cert is not trusted by default):
Output:
The -k flag skips certificate verification (needed for staging certs). Production certs from Let's Encrypt are trusted by default.
Certificate issuance can fail for several reasons. The troubleshooting workflow follows the certificate lifecycle.
Look for status conditions:
Common status reasons:
Output:
If not ready, describe for details:
Output:
Order states:
If Order is stuck at pending, check the Challenge:
Output:
Describe the challenge:
Common failure messages:
Look for errors related to your certificate:
This error means the challenge HTTPRoute is not serving the token correctly.
Production deployments should redirect HTTP traffic to HTTPS. Create http-redirect.yaml:
Apply the redirect:
Output:
Test the redirect:
Output:
Note: The redirect should not interfere with ACME challenges. cert-manager creates challenge routes with higher priority that match the specific /.well-known/acme-challenge/ path.
For local development without DNS or when you cannot reach Let's Encrypt, use a self-signed ClusterIssuer. Create clusterissuer-selfsigned.yaml:
Apply:
Output:
Update Gateway annotation:
Self-signed certificates are not trusted by browsers but work for development and testing.
Install cert-manager with Gateway API support:
Expected: Three pods running (cert-manager, cainjector, webhook)
Create a ClusterIssuer for Let's Encrypt staging:
Expected: ClusterIssuer shows READY: True
Update your Gateway with a TLS listener:
Expected: Gateway shows PROGRAMMED: True
Watch the certificate lifecycle:
Expected: Certificate transitions to READY: True (or stay pending if DNS is not configured for the hostname)
You built a traffic-engineer skill in Lesson 0. Based on what you learned about TLS and cert-manager:
Your skill should ask:
Ask your traffic-engineer skill to generate complete TLS setup:
What you're learning: AI generates ClusterIssuer and Gateway configurations together. Review the output: Does the ClusterIssuer reference the correct Gateway? Does the Gateway annotation reference the correct ClusterIssuer name?
Check AI's output for common mistakes:
If something is missing or incorrect:
Request troubleshooting guidance:
What you're learning: AI can explain the troubleshooting workflow. Compare with the troubleshooting section in this lesson. Did AI include checking the Challenge status and cert-manager logs?
Before applying AI-generated configuration:
This iteration refines production configurations safely without impacting live systems.
Test TLS configuration in staging before production. Let's Encrypt production has rate limits: 50 certificates per registered domain per week. If your configuration is wrong and you retry repeatedly, you may hit these limits and be unable to issue certificates for a week. Always validate with the staging endpoint first, then switch the ClusterIssuer to production only after confirming the full certificate lifecycle works.