Your development cluster works perfectly on Docker Desktop. One broker, ephemeral storage, no authentication. But production is a different world.
In production, Kafka clusters must survive broker failures without data loss. They must encrypt traffic to prevent eavesdropping. They must authenticate clients to prevent unauthorized access. And they must have enough resources to handle peak load without throttling.
The gap between your development setup and production readiness is significant. Chapter 4 got Kafka running quickly. This chapter makes it production-grade.
Before diving into configuration, understand what changes between environments:
Every difference addresses a specific production failure mode. Let's configure each one.
In Chapter 4, you deployed a single node running both controller and broker roles. This works for development but creates problems in production:
Production clusters separate these roles into dedicated node pools.
Controllers manage cluster metadata through Raft consensus. They don't handle client traffic.
Key production settings:
Brokers handle producer/consumer traffic and store message data. They scale based on throughput requirements.
Key production settings:
The separation creates independent failure domains:
Benefits:
In development, you used the plain listener on port 9092. Production traffic should be encrypted.
Key security settings:
Strimzi automatically manages TLS certificates:
You don't need to manually create certificates. Strimzi's Entity Operator handles the PKI lifecycle.
To extract the CA certificate for clients:
Clients use this CA certificate to verify they're connecting to the real Kafka cluster, not an impersonator.
TLS encrypts traffic but doesn't identify clients. Add authentication so only authorized clients can connect.
ACL breakdown:
Apply the user:
Output:
Strimzi stores the generated password in a Kubernetes Secret:
For a Python producer, you'd configure authentication like this:
Critical settings for authenticated connections:
Production clusters need explicit resource boundaries. Without them:
JVM heap sizing rules:
Example for a broker with 8Gi memory limit:
Production data must survive pod restarts. Configure storage classes that match your cloud provider:
Key settings:
Round up and add 20% headroom for operational flexibility.
Here's the full production deployment combining all security and reliability settings:
After applying the production configuration:
Expected output:
If you have an existing development cluster, here's the migration approach:
Strimzi handles partition redistribution automatically when you add brokers. The migration can be done with zero downtime.
You built a kafka-events skill in Chapter 1. Test and improve it based on what you learned.
Ask yourself:
If you found gaps:
You've configured production Kafka with security and reliability features. Now explore how to validate and optimize your configuration.
What you're learning: Security configuration involves tradeoffs between usability and protection. AI can help you understand your threat model and prioritize hardening efforts.
What you're learning: Capacity planning requires understanding the relationship between throughput, retention, replication, and resources. AI can teach you the formulas while applying them to your specific scenario.
What you're learning: Production debugging requires correlating symptoms with root causes. AI can help you develop a systematic troubleshooting methodology for distributed systems.
Safety note: Always test configuration changes in a staging environment before production. Incorrect authentication or replication settings can cause client failures or data loss. Keep your development cluster configuration separate so you can iterate quickly without risking production.