We’ve been running a lot of Kafka on Azul Platform Prime lately, gauging the max Kafka throughput on Azul Platform Prime vs OpenJDK and measuring the practical ROI of running workloads on Azul Platform Prime. And along the way we’ve learned a few things about getting the most out of Kafka on Azul Platform Prime. Here they are, in no particular order.
I/O Really Matters
We found that the biggest factor in achieving higher max throughput on any JVM was the speed of I/O to local storage. On AWS, we got best results with the i3en I/O optimized instances, specifically the i3en.2xlarge instances.
Memory Matters Less
All loads are different, and some loads can exercise both Kafka and the underlying JVM in different ways. But in our Simple Kafka Benchmark we found that Kafka did not stress the Java heap very much. We use 40GB of the i3en.2xlarge’s available 64GB of memory for the Java heap. Under 32GB of Java heap, OpenJDK performance against Azul Platform Prime improves slightly due to OpenJDK’s Compressed OOPs.
We often recommend configuring Transparent Huge Pages on your operating system when running latency-sensitive workloads. Since memory does not play a critical role in Kafka, however, this is not necessary when running Kafka.
Rev Up That CPU
Many people try to keep maximum CPU utilization low when running latency-sensitive workloads and scale out new instances whenever CPU utilization tops 50%. But with Azul Platform Prime’s C4 pauseless garbage collector, you can push your CPU utilization without worrying about incurring response time outliers.
More Producers, More Throughput
In our config, we found that the higher the ratio of producers to consumers, the higher the throughput on both JDKs. The effects were more pronounced on Azul Platform Prime, however, meaning that the more producers we added, the higher the ratio of throughput on Azul Platform Prime versus OpenJDK became.
Global Customers, World-class Results
We’ve helped Mastercard modernize their 8-year-old architecture to deliver real-time risk analytics for fraud prevention with a combination of Kafka, Spark and Hadoop. For Workday, Azul Platform Prime reduced operational issues by over 95% eliminating at least 42,000 developer person-hours over 1.5 years that would have been spent on performance tuning. You can find more details here.