NetApp AFF A90
At NetApp INSIGHT, this past September, we shared our vision for the future of intelligent data infrastructure. From my conversations with customers and analysts, it's clear that many of you are eager to see what's next. However, it's equally important to appreciate where we are right now and how rapidly we are advancing the capabilities of our products to support your AI initiatives.
A perfect example of our accelerating innovation is demonstrated by our benchmarking results for NVIDIA Magnum IO GPUDirect Storage (GDS). GDS is crucial for AI workloads, allowing GPUs to bypass the CPU and interact directly with storage. In 2023, we shared the GDS metrics of our previous generation of AFF.
Now, in 2024, we have a completely refreshed hardware lineup with more powerful processors, faster memory, and improved interconnects, resulting in faster performance. However, the real leap forward comes from the synergy of world-class hardware and software engineering. I'm pleased to share that we have indeed made a leap with our new A90 and the latest NetApp ONTAP software, delivering a performance boost of over 2X!
For these benchmarks we tested our AFF A90 storage system with NVIDIA GPUDirect Storage (GDS). I'll explain why these results are exciting for customers looking to leverage NVIDIA GPUs for enterprise AI workloads.
In the chart below you can see that we achieved 351 GiB/s for an 4-system A90 cluster:
This is over twice the performance compared to the benchmarks that we published just last year with the previous generation AFF A800! We achieved near linear performance scaling starting with a single system cluster and iterating tests on up to 4-system clusters. And one more thing: the upgrade from the A800 used in our 2023 testing to the A90 is a completely non-disruptive in-chassis refresh.
There is no shortage of AI focused platforms claiming impressive performance numbers for GPU workloads, so what makes these A90 results special? NetApp is unique in offering customers a simple and non-disruptive path to massive-scale enterprise AI. When you realize your data is already on the platform that can scale to meet your AI ambitions, the barriers to making those ambitions a reality disappear.
In addition to performance, this solution also delivers the following key benefits:
Simplicity. Enhancements to our NFS offerings (pNFS support with NFSv4.1 over RDMA plus storage session trunking) make it possible to get full performance of the entire cluster from a single flexgroup or mount point. You get increased data transfer performance and resiliency with multi-pathing on a configuration managed by enterprise IT infrastructure teams without the need for specialized training.
Scalability. Innovating with our frontend NFS stack, coupled with enhancements in our backend cluster networking, enables linear scaling for performance-hungry AI workloads. Customers can scale up their clusters to meet performance needs as demand grows without over-provisioning.
No silos.Only NetApp delivers the same full-featured, no-compromise ONTAP data management OS on high-performance storage designed for mission-critical workloads and AI, capacity-oriented storage for general-purpose workloads, and hybrid flash storage for cold data tiering and backup, as well as first-party offerings with the major hyperscalers. Customers can start their AI journey on their current infrastructure, scale to meet any demand, and seamlessly operate on-prem and in the cloud.
For this test, the NetApp Performance team built a unified storage cluster using four of the new NetApp AFF A90 storage systems. Each A90 consists of two HA storage controllers in a 4RU chassis containing up to 48 NVMe SSD drives. The four A90 systems plus two 1RU NetApp cluster interconnect switches resulted in a total storage system footprint of only 18RU.
The AFF A90 controllers support up to 6 high-performance I/O expansion cards each with two dual-ported 200 Gb Ethernet cards (NetApp P/N X50131A – NVIDIA ConnectX-7). Four ports from each controller were connected to a pair of NVIDIA Spectrum-3 SN4600 switches running Cumulus Linux 5.9.1. For clients in this environment, we used three NVIDIA DGX A100 systems connected to NVIDIA Spectrum-3 SN4600V switches with two 200 GbE ports each, and five additional Lenovo SR675v3 servers with NVIDIA L40S GPUs also connected with two 200 GbE ports.
Here some more details:
- Workload: sequential reads
- Data set: 256GB dataset
- Storage efficiency on (This is our default)
- ONTAP version: 9.15.1
- DGX GDS version 1.7
- OVX GDS version 1.11
From a logical perspective, the network was provisioned with two separate VLANs with two ports from each storage controller, one port from each NVIDIA DGX A100 system and one port from each Lenovo SR675v3 server. Host and storage ports in each VLAN were configured with appropriate addresses to allow each server to utilize all available bandwidth and avoid routing issues, while L3 routing was enabled for traffic between VLANs to accommodate failure scenarios.
To store the data for the evaluation, we created a single FlexGroup using default settings, each with sixteen constituents, and created folders to isolate the data used by separate concurrent test instances. This is a far simpler configuration in contrast to our previous tests where 8 flexgroups were required. For details on how to configure ONTAP for RDMA and the necessary client mount options please see the ONTAP documentation.
Finally the performance was measured via GDSIO, a tool provided by NVIDIA for validating GDS performance and functionality. The parameters below illustrate settings that generate results demonstrated here.
GDSIO command parameters:
As the chart at the top of this blog shows, our 4-system cluster was more than capable of meeting the demands of the most I/O intensive GDS applications. Keep in mind that we can further scale out an AFF A90 cluster to 12 systems to deliver even more bandwidth.
These benchmarks are a significant validation of our current AI capabilities, offering just a glimpse of what's to come. While benchmark results are gratifying, it's the real-world outcomes our customers achieve that truly matter. We are deeply grateful for your trust and partnership and I can’t wait to share with you what’s coming next!
Pranoop Erasani is vice president of engineering in the Shared Platform team at NetApp, where he is responsible for AI/ML, NAS, and replication technologies for NetApp’s hybrid cloud storage platform, ONTAP. In this role, he is tasked with building a cost-effective next-generation ONTAP data platform optimized for AI/ML applications. Leveraging the best of ONTAP data management capabilities, the next-generation data platform for AI/ML will not only be optimized for all AI workflows of training, inference, and checkpointing, it will also provide the ability to make data ready for AI/ML applications natively on storage.