All industries and modern applications are undergoing rapid transformation powered by advances in accelerated computing, deep learning, and artificial intelligence. The next phase of this transformation requires an intelligent data infrastructure that can bring AI and compute closer to enterprise data.
When I speak with our customers, the challenges they talk about involve integrating their data and their enterprise AI workflows. The core of their problem is applying AI technology to the data they already have, whether in the cloud, on their premises, or more likely both.
Imagine that you’re a data engineer. You pull an open-source large language model (LLM) to train on your corporate data so that the marketing team can build better assets, and the customer service team can provide customer-facing chatbots. The data is spread out across your different storage systems, and you don’t know what is where. You export and move and centralize your data for training purposes with all the associated time and capacity inefficiencies that entails. You build your model, but the history and context of the data you used is lost, so there is no way to trace your model back to the source. And all of that data is stored on premises, but your training is taking place on the cloud where your GPUs live.
These challenges are quite common for the data engineers and data scientists we speak to. NetApp is already addressing many of these challenges. But as model training becomes more advanced and the need increases for ever more data to train, these problems will be magnified.
As the next generation of AI training and fine-tuning workloads takes shape, limits to existing infrastructure will risk slowing innovation. Some challenges include data infrastructure that allows scaling and optimizing for AI; data management to inform AI workflows where data lives and how it can be used; and associated data services that help data scientists protect AI workflows and keep their models clean.
Scalable data infrastructure
As AI models become more complex, their computational requirements increase. Enterprises need infrastructure that can scale and provide the high performance required for intensive AI tasks, such as training and fine-tuning large language models. At the same time, optimizing nonstorage resource usage, such as maximizing GPU usage, is critical for cost-effective AI operations, because underused resources can result in increased expenses. Maximizing GPU use is critical for cost-effective AI operations, and the ability to achieve it requires improved storage throughput for both read and write operations. And finally, training data is typically stored on the premises, while AI models are often trained in the cloud, meaning that AI workloads often span across on-premises and various cloud environments. This means that the infrastructure needs to provide seamless data mobility and management across these systems.
Universal data management
AI workloads often require access to vast amounts of data, which can be scattered across an enterprise in different systems and formats. This challenge becomes even greater as businesses use their proprietary data spread across their data infrastructure for fine-tuning and retrieval-augmented generation (RAG) use cases. Data silos make it difficult to aggregate and analyze data effectively for AI. And managing the lifecycle of AI data, from ingestion to processing to storage, requires sophisticated data management solutions that can manage the complexity and volume of unstructured data. For AI to be effective, the relevant data must be easily discoverable and accessible, which requires powerful metadata management and data exploration tools.
Intelligent data services
With the rise of AI, there is an increasing need for robust security and governance to protect sensitive data and to comply with regulatory requirements, especially in the face of threats like ransomware. Models built from poisoned data or intentional tampering have the potential to cause great harm to business operations that increasingly rely on AI. And as with any enterprise workload, data needs to be available and protected from natural disasters and system outages in order to continue operations and prevent costly downtime.
Today, NetApp is a recognized leader in AI infrastructure. For well over a decade, innovative customers have been extracting AI-powered insights from data managed on NetApp solutions. As a long-time partner with NVIDIA, NetApp has delivered certified NVIDIA DGX SuperPOD and NetApp® AIPod™ architectures and has seen rapid adoption of AI workflows on first-party cloud offerings at the hyperscalers. As the leader in unstructured data storage, customers trust NetApp with their most valuable data assets.
How did we achieve this level of trust? Through relentless innovation. As customers entrust us with their data, we see even more opportunities ahead to help them operationalize AI and high-performance workloads. That's why we’re introducing a new disaggregated architecture that will enable our customers to continue pushing the boundaries of performance and scale. An enhanced metadata management engine helps customers understand all the data assets in their organization so that they can simplify model training and fine tuning. And an integrated set of data services helps manage that data and infrastructure, protecting it from natural and human-made threats. It’s all built on NetApp ONTAP®, the leading unified storage architecture, to provide a unified architecture that integrates all of your data infrastructure. The core DNA of NetApp has always enabled us to evolve and adopt new technologies while maintaining the robust security, enterprise features, and ease of use that our customers depend on. I’m excited to give you a preview of what's around the corner for ONTAP.
Our vision of a unified AI data management engine will revolutionize how organizations approach and harness the power of AI. Our data management engine will be designed to eliminate data silos by providing a unified view of data assets, automating the capture of changes in data for rapid inferencing, and tightly integrating with AI tools for end-to-end AI workflows. NetApp is also innovating at the infrastructure layer with scalable, high-performance systems and at the intelligence layer with policy-based governance and security.
Planned innovations
At NetApp, we foresee a future in which data scientists can sit down at their AI tool of choice and fine tune a model by using a catalog of data that covers their entire data estate. They won’t need to know where it's stored—the catalog will have that detail. And the catalog will even block data that is too sensitive for model training. Training data will be captured in state with a space-efficient point-in-time NetApp Snapshot™ copy so that the data scientists can always go back and analyze the data in its original state if they need to understand a model’s decisions. And they will be able to do all of this from the cloud of their choice, no matter whether the training data is in the same cloud, another cloud, or stored on the premises. Meanwhile, the infrastructure that serves the data will provide the scale and performance needed to fully saturate the rest of the AI infrastructure, making the best use of those critical resources and delivering fine-tuned models quickly. This future is not far-fetched or far off. NetApp has already built much of this infrastructure and is building for the next stage of AI today.
We are unwavering in our pursuit to advance the capabilities of ONTAP, aiming to meet and exceed the demands of AI-driven enterprises. By creating a unified data environment, enhancing AI tool integration, automating intelligent data management, and prioritizing performance and scalability, we are reinforcing our leadership position in data storage and management for AI. These strategic advances are designed to simplify AI project complexities, expand data accessibility, enhance data availability and security, and reduce associated costs, thereby making AI technologies more accessible to diverse organizations. To learn more about the coming developments for NetApp ONTAP and our AI data management engine, read the whitepaper: ONTAP – pioneering data management in the era of Deep Learning
Disclaimer: This blog post discusses NetApp’s vision for future innovation, some of which may concern unreleased offerings. NetApp is sharing this information solely for informational purposes, and this information should not be relied upon in making purchasing decisions. NetApp makes no commitment and has no obligation to develop or deliver any products or services, or any related features, material, code, or functionality. The development, release, and timing of any features or functionality for NetApp products and services remains at the sole discretion of NetApp. NetApp’s strategy and possible future developments, product and platform directions, and functionality are all subject to change without notice. We disclaim any obligation to update information contained in this blog post, whether as a result of new information, future events, or otherwise. No ransomware detection or recovery system can completely guarantee safety from a ransomware attack. Although it’s possible that an attack might go undetected, NetApp technology acts as an important additional layer of defense. All information is provided without any warranty and without any liability to NetApp.
Krish is the senior vice president for Core Platforms at NetApp. The Core Platforms team is responsible for the unified storage platform, manageability platform, Customer Experience Office (CXO), and Chief Design Office (CDO), and it enables the delivery of various NetApp offerings across on-premises, hybrid cloud, and data services. Krish holds an MBA degree from Santa Clara University and a master’s degree in information systems engineering from Arizona State University. Krish is also a proven innovator and hacker with more than 30 patents primarily in distributed systems, spam detection models, using graphs and networks for anomaly detection.