GCP – How to benchmark and scale your Google Cloud Managed Service for Kafka deployment
Businesses that rely on real-time data for decision-making and application development need a robust and scalable streaming platform, and Apache Kafka has emerged as the leading solution.
At its core, Kafka is a distributed streaming platform that allows applications to publish and subscribe to streams of records, much like a message queue or enterprise messaging system, and goes beyond traditional messaging with features like high throughput, persistent storage, and real-time processing capabilities. However, deploying, managing, and scaling Kafka clusters can be challenging. This is what Google Cloud’s Managed Service for Apache Kafka solves. This managed Kafka service is open-source compatible and portable, easy to operate, and secure, allowing you to focus on building and deploying streaming applications without worrying about infrastructure management, software upgrades, or scaling. It’s also integrated for optimal performance with other Google Cloud data services such as BigQuery, Cloud Storage and Dataflow.
While Apache Kafka offers immense power, achieving optimal performance isn’t automatic. It requires careful tuning and benchmarking. This post provides a hands-on guide to optimize your deployments for throughput and latency.
Note: We assume a high-level understanding of Apache Kafka and BASH scripting. For an introduction and overview of Apache Kafka, visit the Apache Software Foundation website. For an introduction to BASH, please visit this Geeks for Geeks tutorial.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e29765f6280>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Benchmarking Kafka producers, consumers and latencies
Benchmarking your Kafka deployment is crucial for understanding its performance characteristics and ensuring it can serve your application’s requirements. This involves a deep dive into metrics like throughput and latency, along with systematic experimentation by optimizing your producer and consumer configurations. It’s important to note that this is done at a topic / application level and should be replicated for each topic.
Optimizing for throughput and latency
The Apache Kafka bundle includes two utilities – kafka-producer-perf-test.sh
and kafka-consumer-perf-test.sh
– to assess producer and consumer performance as well as latencies.
Note: while we are using some config values in order to demonstrate tool usage, it’s recommended that you use configurations (e.g. message size, message rates, etc…) that mirror your workloads.
kafka-producer-perf-test
This tool simulates producer behavior by sending a specified number of messages to a topic while measuring throughput and latencies, and takes the following flags:
-
topic (required): Specifies the target Kafka topic
-
num-records (required): Sets the total number of messages to send
-
record-size (required): Defines the size of each message in bytes
-
throughput (required): Sets a target throughput in messages per second (use -1 to disable throttling)
-
producer-props:
-
bootstrap.servers (required): Comma-separated list of Kafka bootstrap server or broker addresses.
-
acks (optional): Controls the level of acknowledgment required from brokers (0, 1, or all). 0 for no broker, 1 for leader broker and ‘all’ for all brokers. The default value is ‘all’.
-
batch.size (optional): The maximum size of a batch of messages in bytes. The default value is 16KB.
-
linger.ms (optional): The maximum time to wait for a batch to fill before sending. The default value is 0 ms.
-
Compression.type (optional): any one of: none, gzip, snappy, lz4, zstd. The default value is none.
Sample code block #1: Kafka producer performance test
- code_block
- <ListValue: [StructValue([(‘code’, ‘./kafka-producer-perf-test.sh \rn–topic <topic_name> \rn–num-records 5000000 \rn–record-size 1024 \rn–throughput -1 \rn–producer-props bootstrap.servers=<bootstrap_servers> \rnacks=1 \rnbatch.size=10000 \rnlinger.ms=10 \rncompression.type=<compression_type>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e29765f6520>)])]>
Important considerations
The most crucial properties are acks, batch.size, linger.ms, and compression because they directly influence producer throughput and latency. While exact settings depend on your application, we suggest these baseline configurations:
-
acks: acks=1 requires acknowledgement from the leader broker only. This will give the best performance unless you need acks from all the leaders and followers.
-
batch.size: 10000B, or 10 KB, is a good baseline value to start with. Increasing the batch size allows producers to send more messages in a single request, reducing overhead.
-
linger.ms: 10ms is a good value as a baseline. You can try within a range of 0-50ms. Increasing linger time further can result in increased latencies.
-
compression: The recommendation is to use compression to further increase your throughput and reduce latencies.
kafka-consumer-perf-test
This tool simulates consumer behavior by fetching messages from a Kafka topic and measuring the achieved throughput and latencies. Key properties include:
-
topic (required): Specifies the Kafka topic to consume from.
-
bootstrap-server (required): Comma-separated list of kafka bootstrap server or broker addresses.
-
messages (required): The total number of messages to consume.
-
group (optional): The consumer group ID.
-
fetch-size (optional): The maximum amount of data to fetch in a single request. The default value is 1048576 bytes or 1.04MB.
Sample code block #2: Kafka consumer test
- code_block
- <ListValue: [StructValue([(‘code’, ‘./kafka-consumer-perf-test.sh \rn–topic <topic_name> \rn–bootstrap-server <bootstrap_servers> \rn–messages 1000000 \rn–group <consumer_group>\rn–fetch-size 10000000’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e29765f6700>)])]>
Important considerations
To achieve optimal consumer throughput, the fetch-size property is crucial for tuning. The default fetch-size configuration is largely determined by your consumption and throughput needs, and can range from up to 1MB for smaller messages to 1-50MB for larger ones. It’s advisable to analyze the effects of different fetch sizes on both application responsiveness and throughput. By carefully documenting these tests and examining the resulting information, you can pinpoint performance limitations and refine your settings accordingly.
How to benchmark throughput and latencies
Benchmarking the producer
When conducting tests to measure the throughput and latencies of Kafka producers, the key parameters are batch.size, or the maximum size of a batch of messages, and linger.ms, the maximum time to wait for a batch to fill before sending. For the purposes of this benchmark, we suggest keeping acks at 1 (acknowledgment from the leader broker) to balance durability and performance. This helps us to estimate the expected throughput and latencies for a producer. Note that message size is kept constant as 1KB.
Throughput (messages/s) |
Throughput (MBs) |
Latency(ms) |
ack(=1) |
batch_size |
linger_ms |
48049 |
45 |
608 |
Leader |
1KB |
10 |
160694 |
153 |
171 |
Leader |
10KB |
10 |
117187 |
111 |
268 |
Leader |
100KB |
10 |
111524 |
106 |
283 |
Leader |
100KB |
100 |
Analysis and findings
-
The impact of batch size: As expected, increasing batch size generally leads to higher throughput (messages/s and MBs). We see a significant jump in throughput as we move from 1KB to 10KB batch sizes. However, further increasing the batch size to 100KB does not show a significant improvement in throughput. This suggests that an optimal batch size exists beyond which further increases may not yield substantial throughput gains.
-
Impact of linger time: Increasing the linger time from 10ms to 100ms with a 100KB batch size slightly reduced throughput (from 117,187 to 111,524 messages/s). This indicates that, in this scenario, a longer linger time might not be much beneficial for maximizing throughput.
-
Latency considerations: Latency tends to increase with larger batch sizes. This is because messages wait longer to be included in a larger batch before being sent. This is clearly visible when batch_size is increased from 10KB to 100KB.
Together, these findings highlight the importance of careful tuning when configuring Kafka producers. Finding the optimal balance between batch.size and linger.ms is crucial for achieving desired throughput and latency goals.
Benchmarking for consumer
To assess consumer performance, we conducted a series of experiments using kafka-consumer-perf-test, systematically varying the fetch size.
Throughput(messages/sec) |
Throughput(MBs) |
fetch-size |
2825 |
2.6951 |
10KB |
3645 |
3.477 |
100KB |
18086 |
17.8 |
1MB |
49048 |
46 |
10MB |
61334 |
58 |
100MB |
62562 |
60 |
500MB |
Analysis and findings
-
Impact of fetch size on throughput: The results clearly demonstrate a strong correlation between fetch.size and consumer throughput. As we increase the fetch size, both message throughput (messages/s) and data throughput (MBs) improve significantly. This is because larger fetch sizes allow the consumer to retrieve more messages in a single request, reducing the overhead of frequent requests and improving data transfer efficiency.
-
Diminishing returns: While increasing fetch.size generally improves throughput, we observe diminishing returns as we move beyond 100MB. The difference in throughput between 100MB and 500MB is not significant, suggesting that there’s a point where further increasing the fetch size provides minimal additional benefit.
Scaling the Google Managed Service for Apache Kafka
Based on some more experiments, we explored optimal configurations for the managed Kafka cluster. Please note that for this exercise, we kept message size as 1KB, batch size as 10KB, the topic has 1000 partitions, and the replication number is 3. The results were as follows.
Producer threads |
cluster_bytes_in_count (MBs) |
CPU Util |
Memory Util |
vCPU |
Memory |
1 |
56 |
98% |
58% |
3 |
12gb |
1 |
61 |
24% |
41% |
12 |
48gb |
2 |
104 |
56% |
57% |
12 |
48gb |
4 |
199 |
64% |
60% |
12 |
48gb |
Scaling your managed Kafka cluster effectively is crucial to ensure optimal performance as your requirements grow. To determine the right cluster configuration, we conducted experiments with varying numbers of producer threads, vCPUs, and memory. Our findings indicate that vertical scaling, by increasing vCPUs and memory from 3 vCPUs/12GB to 12 vCPUs/48GB, significantly improved resource utilization. With two producer threads, the cluster’s byte_in_count metric doubled and CPU utilization increased to 56% from 24%. Your throughput requirements play a vital role. With 12 vCPUs/48GB, moving from 2 to 4 producer threads nearly doubled the cluster’s bytes_in_count. You also need to monitor resource utilization to avoid bottlenecks, as increasing throughput can increase CPU and memory utilization. Ultimately, optimizing managed Kafka service performance requires a careful balance between vertical scaling of the cluster and your throughput requirements, tailored to your specific workload and resource constraints.
Build the Kafka cluster you need
In conclusion, optimizing your Google Cloud Managed Service for Apache Kafka deployment involves a thorough understanding of producer and consumer behavior, careful benchmarking, and strategic scaling. By actively monitoring resource utilization and adjusting your configurations based on your specific workload demands, you can ensure your managed Kafka clusters deliver the high throughput and low latency required for your real-time data streaming applications.
Interested in diving deeper? Explore the resources and documentations linked below:
Read More for the details.