This is where Kafka Consumers come into the picture.
A consumer is responsible for reading records from Kafka topics and processing them. Whether you are sending emails after an order is placed, updating inventory, generating invoices, performing fraud detection, or feeding data into analytics systems, consumers are the components performing the actual work.
At first glance, consuming messages appears simple. A consumer subscribes to a topic and reads records. However, production systems introduce much more complexity. Such as:
- How Kafka scales consumers horizontally?
- How it tracks progress?
- What happens when a consumer crashes?
- How rebalancing works, and
- Whether a message can be processed more than once.
Understanding these concepts is essential for designing reliable event-driven systems and is frequently tested during senior engineering interviews.
In this article, we will explore Kafka consumers in depth, focusing on Consumer Groups, Offsets, Rebalancing, and Delivery Guarantees.
Role of a Kafka Consumer
A Consumer is a client application that reads records from Kafka topics. Consider an e-commerce application. When an order is placed, the producer publishes an event:{
"orderId": 1001,
"customerId": 500,
"amount": 2500
}
Several independent systems may need this event.
- An email service may send order confirmation emails.
- An inventory service may reserve stock.
- An analytics service may update reporting dashboards.
Each of these applications acts as a Kafka consumer.
Kafka allows multiple consumers to read the same data independently without affecting each other. This capability is one of Kafka's most powerful features.
Creating a Kafka Consumer
A consumer requires a set of configuration properties. A basic consumer configuration looks like:Properties props = new Properties();
props.put(
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092"
);
props.put(
ConsumerConfig.GROUP_ID_CONFIG,
"order-processing-group"
);
props.put(
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName()
);
props.put(
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName()
);
KafkaConsumer<String, String> consumer =
new KafkaConsumer<>(props);
Notice the presence of the group.id property. This property is one of the most important concepts in Kafka consumption.
A Consumer Group is a collection of consumers working together to process data from a topic. Every consumer belongs to a group. Example:
Group: order-processing-group
Consumer-1
Consumer-2
Consumer-3
Kafka distributes partitions among consumers within the same group. Consider a topic:
orders
โโโ Partition-0
โโโ Partition-1
โโโ Partition-2
โโโ Partition-3
Kafka may assign partitions as:
Consumer-1
โโโ Partition-0
โโโ Partition-1
Consumer-2
โโโ Partition-2
Consumer-3
โโโ Partition-3
Each partition is assigned to exactly one consumer within a group. This guarantees ordering while enabling parallel processing.
Without consumer groups, every consumer would process every message. Imagine an order processing application with three instances. Without coordination:
Order-1
|
+----โ Consumer-1
+----โ Consumer-2
+----โ Consumer-3
The same order would be processed three times. Consumer groups solve this problem. Only one consumer processes the message. This provides scalability and prevents duplicate work.
Consumer Parallelism
"How many consumers can actively process messages in a consumer group?" The answer depends on partition count. Consider:orders
โโโ Partition-0
Only one consumer can process messages. Even if you have:
Consumer-1
Consumer-2
Consumer-3
Kafka assigns:
Partition-0 โ Consumer-1
Consumer-2 โ Idle
Consumer-3 โ Idle
Why? Because Kafka guarantees ordering within a partition. Allowing multiple consumers to process the same partition simultaneously would break ordering guarantees.
A useful rule is - Maximum Consumer Parallelism = Number of Partitions. This is one of the most important Kafka design principles.
Offsets
Kafka does not track which messages have been consumed. Instead, Kafka tracks positions within partitions. Each record receives an Offset. Example:Partition-0
Offset 0 โ Order-1
Offset 1 โ Order-2
Offset 2 โ Order-3
Offset 3 โ Order-4
Offsets uniquely identify records within a partition. Consumers maintain their own reading position using offsets. If a consumer has processed:
Offset 0
Offset 1
Offset 2
Its next read begins at: Offset 3
Offsets make Kafka fundamentally different from traditional message queues. Traditional queues remove messages after consumption. Kafka does not. Even after processing, records remain stored. Consumers simply advance their offsets.
Kafka can automatically commit offsets. Example:
props.put(
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
true
);
Kafka periodically saves offsets. The default interval is: 5 seconds.
This simplifies development but introduces risks. A message may be marked as processed before business logic actually completes. Many production systems therefore disable auto commit.
In Apache Kafka, committing means saving your current position (the "offset") in a topic partition so Kafka knows your consumer group has successfully processed those messages. If your consumer restarts, crashes, or rebalances, it will resume reading from the last committed offset.
Kafka stores these bookmarks in an internal topic named __consumer_offsets.
Manual Offset Commit
Production applications often use manual commits.props.put(
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
false
);
After successful processing, consumer.commitSync();. This gives applications complete control. Only after successful completion does Kafka advance the offset. This significantly improves reliability.
What Happens When a Consumer Crashes?
Consider:Partition-0
Offset 0
Offset 1
Offset 2
Offset 3
Consumer successfully processed:
Offset 0
Offset 1
Offset 2 is being processed. The application crashes. Because Offset 2 was never committed: Last Committed Offset = 1. Kafka restarts processing from Offset 2. This prevents data loss.
However, Offset 2 may be processed again. This behavior leads us to delivery guarantees.
Understanding Rebalancing
Consumer groups are dynamic.- Consumers may join.
- Consumers may leave.
- Consumers may crash.
When group membership changes, Kafka performs a Rebalance. Example:
Initial state:
Consumer-1 โ Partition-0
Consumer-2 โ Partition-1, Partition-2
A new consumer joins. Kafka redistributes partitions:
Consumer-1 โ Partition-0
Consumer-2 โ Partition-1
Consumer-3 โ Partition-2
This process is called rebalancing.
Why Rebalancing Can Be Expensive
During rebalancing:- Consumers stop processing.
- Partition ownership changes.
- Offsets may be transferred.
- State may need rebuilding.
- Large consumer groups can experience noticeable pauses.
This is why excessive rebalancing should be avoided in production.
Consumer Heartbeats
Kafka must determine whether consumers are alive. Consumers periodically send heartbeats. Consumer โ Heartbeat โ Coordinator.If heartbeats stop arriving: Consumer Timeout Rebalance Triggered. Consumer Timeout โ Rebalance Triggered.
Kafka assumes the consumer failed and reassigns partitions. This ensures continuous processing.
Delivery Guarantees in Kafka
One of the most important Kafka topics is delivery guarantees. A delivery guarantee defines how many times a record may be processed. Kafka supports three primary models.1. At Most Once
Messages are processed zero or one time. Offsets are committed before processing. Example: Commit Offset โ Process Message.
If processing fails: Message Lost. No duplicates occur. However, data loss is possible.
Achieving At Most Once
Set auto-commit: Enable enable.auto.commit=true.
Set short intervals: Keep auto.commit.interval.ms low.
Commit immediately: The consumer commits offsets before handling data.
Accept data loss: If processing fails, the offset is already moved forward.
2. At Least Once
Messages are processed one or more times. Processing occurs before offset commit. Example: Process Message โ Commit Offset.
If a crash occurs: Message Reprocessed. No data loss occurs. Duplicates are possible. This is the most common Kafka delivery model.
Achieving At Least Once
Disable auto-commit: Set enable.auto.commit=false.
Commit manually: Call consumer.commitSync() or commitAsync() only after processing.
Configure retries: Set producer retries=MAX_VALUE.
Require acknowledgments: Set producer acks=all or acks=1.
3. Exactly Once
Messages are processed exactly once. No duplicates. No data loss. Kafka achieves this through transactions and idempotent producers.
Exactly-once semantics are typically required in financial systems and critical business workflows. However, they add complexity and performance overhead.
Achieving Exactly Once (EOS)
Enable idempotent producers: Set enable.idempotence=true on the producer.
Use Kafka transactions: Provide a unique transactional.id to the producer.
Isolate consumer reads: Set consumer isolation.level=read_committed.
Use Streams API: Utilize Kafka Streams, which handles transactional boundaries automatically.
Offset Reset Policies
What happens when a consumer starts with no committed offsets?
Kafka uses: auto.offset.reset
Options include: earliest | latest.
With: earliest, Consumption starts from the beginning of the topic.
With: latest, Consumption starts from newly arriving messages.
Recommended Consumer Configuration
A typical production configuration looks like:
props.put(
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
false
);
props.put(
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,
"earliest"
);
props.put(
ConsumerConfig.MAX_POLL_RECORDS_CONFIG,
500
);
Combined with manual commits, this provides better control and reliability.
Common Kafka Consumer Interview Questions
What a consumer group is?
- A consumer group is a collection of consumers that cooperate to process partitions from a topic.
Why partition count affects scalability?
- The reason is that only one consumer within a group can own a partition at any given time.
Questions about offsets are extremely common.
- Offsets represent positions within partitions and allow consumers to track progress independently.
Rebalancing.
- Rebalancing occurs whenever consumer group membership changes and partitions must be reassigned.
Delivery guarantees.
- Kafka supports at-most-once, at-least-once, and exactly-once processing, each involving different trade-offs between reliability, duplicates, and complexity.
Conclusion
Kafka consumers are responsible for transforming stored events into business actions. While consuming messages appears simple, production-grade consumer applications must handle partition assignments, offset management, rebalancing, failures, and delivery guarantees correctly.
The most important concepts are Consumer Groups, which provide scalability; Offsets, which track processing progress; Rebalancing, which redistributes workload when consumers join or leave; and Delivery Guarantees, which determine whether duplicates or message loss can occur.
props.put(
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
false
);
props.put(
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,
"earliest"
);
props.put(
ConsumerConfig.MAX_POLL_RECORDS_CONFIG,
500
);