Sign in to my account

Don't have an account?

Create an account
Menu

Unlocking the power of visual insights

BlueXP workload factory for GenAI now supports image processing

Table Of Contents

Share this page

Puneet Dhawan
Puneet Dhawan
108 views

We are thrilled to announce an enhancement to NetApp® BlueXP™ workload factory for generative AI (GenAI): the ability to derive insights from image files and images embedded in PDF and Word documents. This new capability complements the existing text-based insights, helping you gain more relevant, deeper, and more accurate insights from your data.

Moving beyond text to handle multimodal data

As more companies adopt GenAI-powered assistants that are connected to enterprise data, it’s important to get insights that are derived from the entirety of the relevant documents. In many cases, answers to user questions may not be found in the text, but are buried in images, graphs, and charts within documents. Early language models could understand only text, but recent advances in foundation models now enable them to extract insights from multimodal data such as images. By incorporating multimodal data, AI-powered applications can provide a more comprehensive understanding of complex scenarios, enhancing the accuracy and depth of insights.

For instance, in healthcare, combining medical images with patient records can lead to more precise diagnoses and personalized treatment plans. And in retail, analyzing product images alongside customer reviews can offer nuanced market trends and consumer preferences.

To illustrate the benefit of handling multimodal data, let’s take an example of how to understand energy trends by using the July 2024 Short-Term Energy Outlook (STEO) report from the U.S. Energy Information Administration. This report provides comprehensive and timely information about the current and the projected states of the energy sector. Let’s consider an analyst at a utility company who’s using a GenAI-powered research assistant application and needs to understand energy pricing and consumption trends. The analyst asks the following questions by using the STEO report as the source dataset.

Question: How did the Henry Hub natural gas spot price change between 2021 and 2024? And what’s the forecast for 2025?

If the response is based on only the text that’s contained in the report, and it ignores any graphs and images, the analyst gets the following answer.

Answer without visual insights: The Henry Hub natural gas spot price averaged around $5/MMBtu in 2021. It fell to around $2.10/MMBtu in the first half of 2024 (1H24). The forecast is for it to average almost $2.90/MMBtu in the second half of 2024 (2H24) and then rise to an average of $3.30/MMBtu in 2025.

However, if in addition to the text, the images that are included in the STEO PDF are also processed, the answer to the question contains a lot more context and relevant information.

Answer with visual insights: The Henry Hub natural gas spot price started at around $3/MMBtu in early 2021, experienced significant volatility during 2022-2023 with peaks reaching $8-$9/MMBtu, and then declined significantly through 2023, stabilizing around $2-$3/MMBtu in 2024. The forecast for 2025 projects a gradual increase and stabilization of prices at around $3-$4/MMBtu.

As you can appreciate, the second answer provides more relevant insights in responding to the question and describes the trends based on the visual insights that are presented in the document. To get the more detailed answer, the knowledge base extracted insights from the following graph, which is on page 8 of the STEO document.

Graph of U.S. natural gas spot prices (2021-2025), peaking above $8 in 2022 and dropping below $2 in 2023. Source: EIA, July 2024.

Let’s take another example, using the same report.

Question: What was the world’s liquid fuel production in 2023?

Answer without visual insights: Unfortunately, there is no clear information provided in the passage to definitively state what the world’s liquid fuel production was in 2023. The passage discusses projections for 2024 and 2025, as well as discussing some historical data, but does not provide a specific number for global liquid fuel production in 2023. Without a clear data point or statement, I cannot provide a specific answer to this question based on the information given.

As you can see, for the preceding question, if the model constructs the answer based on insights from the text alone, the text does not contain specific information about fuel production in 2023. Therefore, the knowledge base is not able to answer the user query.

For this question, the only place in the STEO report that includes relevant information is in the second graph on page 15.

Graph of world liquid fuels production and consumption (2019-2025), showing production slightly exceeding consumption. Source: EIA, July 2024.

By including visual insights from the preceding graph, the answer to the same query yields the following.

Answer with visual insights: Based on the charts in the information, world liquid fuels production in 2023 was approximately 101-102 million barrels per day on average.

As you can see in these examples, handling and analyzing multimodal data can help your organization make informed decisions, enhancing efficiency, innovation, and your competitive advantage.

How it works under the hood

The latest update to NetApp BlueXP workload factory introduces advanced image-processing capabilities. As the source documents are processed, the knowledge base data-processing engine processes any images that are included in the source SMB and NFS exports. It also extracts images and graphs that are included in the source Word and PDF documents. The data processing engine then uses a multimodal model such as Anthropic Claude 3.5 Sonnet to extract any text from the images and provide a description of the image or graph. The detailed description is then passed on a text-embedding model and is stored next to the text chunks that were extracted from the documents. The data processing also captures the text before and after the image to obtain any extra context for the model to describe that image.

When the knowledge base receives a query, it performs a hybrid search, with results returned from both full text and vector embeddings including the results from image descriptions as well. A reranker model, such as Cohere Rerank 3.5 or Amazon Rerank 1.0, is then used to return the most relevant documents to the chat model to generate the response to user query.

How to configure the knowledge base for image processing

The image-processing feature is enabled as part of the knowledge base creation workflow in BlueXP workload factory. To enable the feature, you configure a supported multimodal language model for chat during the knowledge base creation process, and you can optionally configure any file filters to include specific image types. Supported image formats include .jpg, .jpeg, .png, .gif, and .webp. Supported multimodal models include Anthropic Claude 3 and Claude 3.5 and Amazon Nova. The maximum supported image size is 3.75MB, with a resolution of up to 8000 x 8000 pixels per image. In addition to any images that are present in the source shares, images that are embedded in .pdf, .doc, and .docx files are automatically processed by the knowledge base.

Get started today to get deeper insights from your data

With NetApp BlueXP workload factory, you can now enable GenAI applications with deeper insights from your enterprise data by incorporating visual data for more comprehensive query responses. You can gain insights from visual information like graphs and charts, and you can enhance your knowledge workers’ productivity with richer, more informative answers.

To get started on enabling your GenAI applications with insights from enterprise datasets, sign up for BlueXP workload factory and review the BlueXP workload factory documentation.

Puneet Dhawan

Puneet is a Senior Director of Product Management at NetApp where he leads product management for FSx for NetApp ONTAP service offering with AWS with specific focus on AI and Generative AI solutions. Before joining NetApp, Puneet held multiple product leadership roles at Amazon Web Services (AWS) and Dell Technologies in areas of hybrid cloud infrastructure, cloud storage, scale-out and distributed systems, high performance computing and enterprise solutions, etc. In those roles he led product vision and strategy, roadmap planning and execution, partnerships, and go-to-market strategy.

View all Posts by Puneet Dhawan

Next Steps

Drift chat loading