Designing Kafka Topics and Partitions (Scalability, Ordering and Throughput Trade-offs)

In Apache Kafka, topic and partition design decisions have a direct impact on scalability, throughput, ordering guarantees, consumer parallelism, storage utilization, and operational complexity.

Teams often start with a single partition, only to discover later that their consumer throughput is limited. Others create hundreds of partitions unnecessarily and struggle with increased broker memory usage, slower rebalances, and operational overhead.

In this article, we will explore how topics and partitions work together, how they impact scalability and throughput, what ordering guarantees Kafka provides, and the trade-offs involved when choosing the right partition count.

Topic and Partition Design

At first glance, a Kafka topic appears to be a simple container for messages. Consider an e-commerce system:

orders
payments
shipments
notifications

These topics seem straightforward. However, the real scalability unit in Kafka is not the topic itself. The real scalability unit is the Partition.

When engineers discuss Kafka scaling, consumer parallelism, throughput, and performance, they are almost always discussing partitions.

A topic is simply a logical grouping of one or more partitions. Consider a topic: orders. Internally Kafka stores data as partitions.

orders
 ├── Partition-0
 ├── Partition-1
 └── Partition-2

Messages are actually stored within these partitions.

- Consumers process partitions.
- Replication happens at partition level.
- Leader election occurs at partition level.
- Throughput scaling occurs at partition level.

This is why partition design is far more important than topic naming.

Partitions & Horizontal Scalability

Imagine an orders topic with a single partition.

orders
 └── Partition-0

Suppose one consumer processes messages at: 1,000 messages/second. Maximum throughput becomes: 1,000 messages/second.

Now consider:

orders
 ├── Partition-0
 ├── Partition-1
 ├── Partition-2
 └── Partition-3

Kafka can assign partitions to four consumers:

Consumer-1 → Partition-0
Consumer-2 → Partition-1
Consumer-3 → Partition-2
Consumer-4 → Partition-3

Now total throughput becomes: 4,000 messages/second

Partitions allow Kafka to scale horizontally by enabling parallel processing. This is one of Kafka's most important architectural principles.

Partitions & Consumer Parallelism

A common interview question is: "Can all 5 consumers process a topic with three partitions?" The answer is no. Consider:

orders
 ├── Partition-0
 ├── Partition-1
 └── Partition-2

And:

Consumer-1
Consumer-2
Consumer-3
Consumer-4
Consumer-5

Kafka can only assign:

Partition-0 → Consumer-1
Partition-1 → Consumer-2
Partition-2 → Consumer-3

Consumer-4 -> Idle
Consumer-5 -> Idle

The rule is simple: Maximum Consumer Parallelism = Number of Partitions This rule should always influence partition design decisions.

Why Not Create Thousands of Partitions?

A common reaction is: "If more partitions improve scalability, why not create thousands?". While partitions increase parallelism, they are not free.

Every partition introduces metadata overhead. Kafka brokers maintain information about:

1. Leader
2. Followers
3. Offsets
4. Replication State
5. ISR State
6. Segment Files

Large partition counts increase:

1. Broker Memory Usage
2. Network Traffic
3. Controller Workload
4. Rebalance Time
5. Recovery Time

A cluster with 50 partitions behaves very differently from a cluster with 50,000 partitions. More partitions improve scalability but also increase operational complexity.

Ordering Guarantees

Ordering is one of the most misunderstood Kafka concepts. Kafka guarantees ordering only within a partition. Consider:

Partition-0

- Message-1: Order Created
- Message-2: Payment Received
- Message-3: Order Shipped

Kafka guarantees: Order Created → Payment Received → Order Shipped. Consumers will observe this exact sequence.

However, Kafka does not guarantee ordering across partitions. Consider:

Partition-0
- Message-1: Order Created

Partition-1
- Message-2: Payment Received
- Message-3: Order Shipped

Consumers may observe events in different orders. This is extremely important when designing event-driven systems.

Using Message Keys to Preserve Ordering

Kafka uses message keys to determine partition assignment. Example:

ProducerRecord<String, String> record =
    new ProducerRecord<>(
        "orders",
        "customer-123",
        "Order Created"
    );

Kafka calculates: hash(key) % numberOfPartitions. All messages with the same key go to the same partition. Example:

customer-123

- Message-1: Order Created
- Message-2: Payment Received
- Message-3: Order Shipped
     ↓
Partition-1

Ordering is preserved. This is one of the primary reasons message keys exist.

Choosing the Right Partition Key

Partition key selection is one of the most important Kafka design decisions. In an e-commerce system, common choices include:

Customer ID
Order ID
Account ID
User ID

A good key should:

- Preserve Ordering
- Distribute Traffic Evenly
- Avoid Hotspots Poor key selection can create serious scalability problems.

Suppose an online marketplace has one large customer responsible for 50% of all traffic. If messages are keyed by customer:

Customer-A → 50% Traffic
Customer-B → 5%
Customer-C → 2%

Kafka may produce:

Partition-0 → 65% Traffic
Partition-1 → 15% Traffic
Partition-2 → 10% Traffic
Partition-3 → 10% Traffic

One partition becomes overloaded. This is called a Hot Partition. Hot partitions often cause:

- Consumer Lag
- Uneven Resource Usage
- Reduced Throughput

Senior engineers must evaluate key distribution carefully.

Estimating Partition Count

One common question is: "How many partitions should a topic have?" There is no universal answer. Partition count depends on:

- Expected Throughput
- Consumer Parallelism
- Future Growth
- Broker Capacity

A practical approach is:

- Estimate target throughput.
- Estimate consumer processing capacity.
- Calculate required parallelism.
- Add growth capacity.

Example:

Target Throughput = 100,000 messages/second
Consumer Capacity = 10,000 messages/second
Required Partitions = 10

A team may choose: 12 Partitions, to allow future growth.

Can Partitions Be Increased Later?

Yes. Kafka allows increasing partition count. Example:

Before

orders
 ├── Partition-0
 ├── Partition-1
 └── Partition-2

Later:

orders
 ├── Partition-0
 ├── Partition-1
 ├── Partition-2
 ├── Partition-3
 ├── Partition-4
 └── Partition-5

However, there is an important consequence. Hash calculations change. Messages with the same key may start landing in different partitions. This can affect ordering guarantees for future records.

For this reason, partition increases should be planned carefully.

Partition Count and Rebalancing

More partitions improve scalability but increase rebalance complexity.

Example: 1. 10 Partitions, Rebalancing is relatively quick.
2. 5,000 Partitions. When a consumer joins or leaves:
- Partition Ownership Changes
- Offset Transfers
- Leader Coordination

Rebalancing becomes significantly more expensive. This is another reason not to over-partition unnecessarily.

Recommended Partition Design Guidelines

For most enterprise applications, start by estimating expected throughput and consumer parallelism requirements.

1. Choose enough partitions to support expected load plus future growth.
2. Avoid creating extremely high partition counts without a clear justification.
3. Use meaningful keys to preserve ordering and distribute traffic evenly.
4. Monitor partition skew and consumer lag regularly.
5. Most importantly, remember that increasing partitions later is easier than reducing them.

Because partitions cannot be reduced directly, engineers should avoid excessive partition counts during initial design.

Topic Design Strategies

Another common architectural question is whether to create one large topic or multiple specialized topics.

Poor design: Topic: events. Everything enters a single topic.

- Orders.
- Payments.
- Shipments.
- Notifications.

This often creates operational complexity. A better design is:

Topic-1: orders
Topic-2: payments
Topic-3: shipments
Topic-4: notifications

Each topic represents a business domain. This improves isolation, security, monitoring, and scalability. Separate topics make sense when:

- Different Retention Policies
- Different Consumer Groups
- Different Security Requirements
- Different Throughput Characteristics

For example: Audit Events → Retention = 1 Year, Metrics → Retention = 7 Days. Separate topics are clearly preferable.

Common Kafka Interview Questions

Why Kafka uses partitions?
- The answer is that partitions provide scalability, parallel processing, and fault tolerance.

Whether Kafka guarantees message ordering?
- Kafka guarantees ordering only within a partition.

Partition count.
- The number of partitions determines maximum consumer parallelism within a consumer group.

Whether partitions can be increased?
- The answer is yes, but increasing partition count can affect key distribution and future ordering behavior.

Hot partitions.
- Hot partitions occur when traffic is distributed unevenly due to poor key selection, causing one partition to receive a disproportionate share of messages.

Conclusion

Topic and partition design sits at the heart of Kafka architecture. While topics provide logical organization, partitions determine scalability, throughput, fault tolerance, and consumer parallelism. Choosing the correct partition count and partition key is one of the most important design decisions a Kafka engineer can make.

The central trade-off is simple. More partitions improve throughput and scalability but increase operational overhead. Fewer partitions simplify management but limit parallelism. Similarly, preserving ordering often requires key-based partitioning, but poor key selection can create hot partitions and uneven load distribution.