Co-authored by Marc Selwan, Senior Product Manager at Confluent.
Provisioning Apache Kafka clusters and retaining the right amount of data with Apache Kafka is complex. Setting it up properly to handle GBps throughputs and petabytes of data can turn into months of work that push delivery timelines, delaying business go-to-market. Similarly, businesses that don't need GBps scale or petabytes of storage at the beginning but face unexpected spikes or storage needs risk causing downtime or degrading cluster performance, which can lead to customer frustration.
We are excited to share how you can further simplify your operations and increase the performance of Confluent Platform storage by using NetApp® ONTAP® as both primary and tiered storage. For decades, NetApp has maintained a laser focus on innovations that help our customers build stronger, smarter, and more efficient data infrastructures. You can now disaggregate compute and storage resources with NetApp for Confluent Platform customers. This disaggregated configuration can be employed for both in-cloud and on-premises deployments.
Previously, Kafka deployments were inhibited from using network file services for primary storage due to an issue commonly referred to as the silly rename issue. After rebalancing of partitions, some of the brokers end up with silly-looking filenames (‘.nfsXXX’). and the Kafka cluster gets into an indeterministic state.
Through several enhancements to the upstream Linux NFS client and in combination with changes in the latest ONTAP 9.12.1 release, ONTAP customers can safely run Kafka. These NFS client changes will be generally available starting in RHEL 8.7 and RHEL 9.1 (Confluent Platform is certified on RHEL 8.x).
The following figure shows an example setup on Amazon Web Services Cloud is running a 4-node Confluent Platform v7.2.1 cluster using NetApp Cloud Volumes ONTAP for primary storage.
In addition, running Confluent Kafka with ONTAP software also delivers the benefits described below.
Because data is available on a shared storage server, Kafka node recovery does not need to rebuild the data.
Because most of the data processing is performed on the NFS server, running a lightweight NFS client on the NFS broker reduces the CPU resources used for data processing and makes them available for your data in analytics processing instead.
Because you co-locate Confluent for Kubernetes on the same Kubernetes clusters that are hosting other stateful applications, you do not need to make separate storage sizing and capacity planning exercises.
If you are a NetApp ONTAP user who wants to extend your Confluent Platform storage, reach out to us.
Gunna Marripudi is a product management leader in enterprise storage and data management software for VMs, containers, and cloud. Gunna is a Senior Director of Product Management for Google Cloud NetApp Volumes, a fully managed file service from Google Cloud, built on ONTAP. Prior to NetApp, he ran product management for scale-out object storage and all-flash arrays at Western Digital. Earlier in his career, he was a principal storage software architect at Samsung and HPE.