You've built a producer that reliably publishes events to Kafka. Now you need something to receive those events. In the previous lesson, you verified message delivery by checking broker acknowledgments. But real systems need consumers that process messages, handle failures gracefully, and track their position in the event stream.
Consuming from Kafka is fundamentally different from calling an API. With an API, you request data and get an immediate response. With Kafka, you poll for messages continuously, process whatever arrives, and tell Kafka you're done. This poll-process-commit loop is the heart of every Kafka consumer.
The challenge is reliability. What happens if your consumer crashes after reading a message but before processing it? What if it processes successfully but crashes before confirming? These edge cases determine whether your system loses messages, processes them twice, or handles them exactly right. Your commit strategy makes this decision.
Before writing code, understand what a consumer actually does:
Every Kafka consumer follows this structure:
The poll() call:
Install the same library you used for the producer:
Like in Chapter 6, use the NodePort to connect from your local machine:
Output:
Let's understand each configuration:
When a consumer group first subscribes to a topic (or when its committed offsets have expired), Kafka needs to know where to start reading:
Example scenario: Your notification service starts for the first time. With earliest, it processes all past task-created events (potentially thousands). With latest, it ignores history and only notifies for new tasks.
Choose based on your business requirements, not technical preference.
Here's a production-ready consumer with proper error handling:
Output:
The msg.error() check catches several conditions:
The _PARTITION_EOF error is not really an error. It means "you've caught up with this partition." Continue polling for new messages.
Always call consumer.close() when shutting down:
Without close(), Kafka waits for session timeout (default 45 seconds) before reassigning partitions.
This is the most important decision for your consumer. It determines your message delivery guarantee.
With auto-commit enabled, Kafka periodically commits offsets in the background:
The Problem: Consider this timeline:
Auto-commit committed the offset before processing completed. If processing fails, that message is never retried.
When auto-commit is acceptable:
Manual commit lets you commit AFTER successful processing:
Output:
Timeline with manual commit:
For most applications, at-least-once is the right choice. Process, then commit:
Your processing code must be idempotent. If the same message is processed twice (consumer crashed after processing but before commit), the result should be the same.
Idempotent processing example:
The commit() method has two modes:
For critical data, use synchronous. For high-throughput scenarios where occasional reprocessing is acceptable, use asynchronous.
Create a consumer deployment that runs alongside your producer:
Use the Kafka CLI to check consumer group status:
Output:
A lag of 0 means your consumer is caught up. Growing lag means consumer can't keep up with producers.
You built a kafka-events skill in Chapter 1. Test and improve it based on what you learned.
Ask yourself:
If you found gaps:
What you're learning: Diagnosing the auto-commit timing problem and implementing proper at-least-once semantics.
What you're learning: Designing idempotent consumers that handle duplicates gracefully using state tracking.
What you're learning: Matching commit strategies to business requirements, understanding that different consumers of the same topic may need different configurations.
Important: When working with Kafka consumers, always test your error handling by simulating failures. Kill your consumer mid-processing and verify it correctly resumes from the last committed offset.