Sign in to my dashboard Create an account
Menu

Explainable AI: What is it? How does it work? And what role does data play?

boy and woman working together at a computer
Contents

Share this page

Mike McNamara
Mike McNamara

As enterprises expand their artificial intelligence (AI) efforts, they must address important and often difficult questions. Is AI being used responsibly? Can the results that AI produces be explained? Because data underpins all AI processes, this series of blog posts looks at important AI questions from the standpoint of data, data management, and data governance. This second post focuses on explainable AI. The final post in the series will examine federated learning.

The first post in this series discussed four principles of responsible and ethical AI: fairness, privacy, security, and interpretability (aka explainability). AI models are now embedded in all aspects of our lives, affecting important decisions from who gets hired to who gets approved for a loan. Explainable artificial intelligence (XAI) has become crucial for understanding how an AI model reaches decisions and for identifying sources of error.

This post examines why explainable AI is important, the associated challenges, and the crucial role that data plays.

Explainable AI made simple

First, it’s important to understand what XAI is and why it’s needed. AI algorithms often operate as “black boxes” that take input and provide output with no way to understand their inner workings. The goal of XAI is to make the rationale behind the output of an algorithm understandable by humans.

For example, many AI algorithms use deep learning, in which algorithms learn to identify patterns based on mountains of training data. Deep learning is a neural network approach that mimics the way our own brains are wired. Just as with human thought processes, it can be difficult or impossible to determine how a deep learning algorithm arrived at a prediction or decision.

Decisions about hiring and financial services use case such as credit scores and loan approvals are important and worth explaining. However, no one is likely to be physically harmed (at least not right away) if one of those algorithms makes a bad recommendation. But there are many examples where the consequences are much more dire.

Deep learning algorithms are increasingly important in healthcare use cases such as cancer screening, where it’s important for doctors to understand the basis for an algorithm’s diagnosis. A false negative could mean that a patient doesn’t receive life-saving treatment. A false positive, on the other hand, might result in a patient receiving expensive and invasive treatment when it’s not necessary. A level of explainability is essential for radiologists and oncologists seeking to take full advantage the growing benefits of AI.

flow chart

Explainable AI principles

To expand on the idea of what constitutes XAI, the National Institute of Standards (NIST), part of the U.S. Department of Commerce, defines four principles of explainable artificial intelligence:

  • An AI system should supply “evidence, support, or reasoning for each output.”
  • An AI system should provide explanations that its users can understand.
  • Explanation accuracy. An explanation should accurately reflect the process the AI system used to arrive at the output.
  • Knowledge limits. An AI system should operate only under the conditions it was designed for and not provide output when it lacks sufficient confidence in the result.

Examples of XAI principles

Here are examples of how these principles apply.

Explanation
NIST defines five types of explanation:

  • Inform the subject of an algorithm. An obvious example would be an explanation of why a loan was or wasn’t approved.
  • Build societal trust in an AI system. Rather than explain particular outputs, some types of explanations justify the model and the approach used in order to increase trust. This might include explaining the purpose of the algorithm, how it was created, what data was used and where it came from, and what its strengths and limitations are.
  • Satisfy compliance or regulatory requirements. As AI algorithms become increasingly important in regulated industries, they need to be able to demonstrate adherence to regulations. For example, AI algorithms for self-driving should explain how they comply with applicable traffic regulations.
  • Assist with further system development. During AI development, technical staff need to understand where and why a system generates the wrong output in order to improve the system.
  • Benefit the algorithm’s owner. Enterprises are deploying AI across all industries, expecting to gain significant benefit. For example, a streaming service benefits from explainable recommendations that keep users subscribing to the service.

Meaningful
The principle of meaningfulness is satisfied when a user understands the explanation provided. For a given AI algorithm, there may be different types of users who require explanations. In the self-driving car example, an explanation that satisfies the driver of the car, like “the AI categorized the plastic bag in the road as a rock, and therefore took action to avoid hitting it” would not satisfy the needs of an AI developer attempting to correct the problem. The developer needs to understand why the plastic bag was misclassified.

Explanation accuracy
Explanation accuracy is separate from output accuracy. An AI algorithm needs to accurately explain how it reached its output. If a loan approval algorithm explains a decision based on an applicant’s income and debt when the decision was actually based on the applicant’s zip code, the explanation is not accurate.

Knowledge limits
An AI system can reach its knowledge limits in two ways. The input could be outside the expertise of the system. NIST uses the example of a system built to classify bird species. If you give it a picture of an apple, the system should explain that the input is not a bird. Alternatively, if you give the system a blurry picture, it should report that it cannot identify the bird in the image, or that its identification has very low confidence.

How does explainable AI work?

These principles help define the output expected from XAI, but they don’t offer any guidance on how to reach that output. It can be useful to subdivide XAI into three categories:

  • Explainable data. What data went into training a model? Why was that data chosen? How was fairness assessed? Was any effort made to remove bias?
  • Explainable predictions. What features of a model were activated or used to reach a particular output?
  • Explainable algorithms. What are the individual layers that make up the model, and how do they lead to the output or prediction?

For neural networks in particular, explainable data is the only category that is straightforward to achieve—at least in principle. Much ongoing research is focused on how to achieve explainable predictions and algorithms. There are two current approaches to explainability:

  • Proxy modeling. A different type of model such as a decision tree is used to approximate the actual model. Because it’s an approximation, it can differ from the true model results.
  • Design for interpretability. Models are designed to be easy to explain. This approach runs the risk of reducing the predictive power or overall accuracy of a model.

Explainable models are sometimes referred to as “white box” models. As noted in a recent blog, “with explainable white box AI, users can understand the rationale behind its decisions, making it increasingly popular in business settings. These models are not as technically impressive as black box algorithms.” Explainable techniques include decision trees, Bayesian networks, sparse linear models, and others.

Researchers are also looking for ways to make black box models more explainable, for instance by incorporating knowledge graphs and other graph-related techniques.

Data and explainable AI

Explainable data is the most attainable category of XAI. However, given the mountains of data that may be used to train an AI algorithm, “attainable” is not as easy as it sounds. The GPT-3 natural language algorithm is an extreme example. Although the model is capable of mimicking human language, it also internalized a lot of toxic content from the internet during training.

As Google notes, an “AI system is best understood by the underlying training data and training process, as well as the resulting AI model.” This understanding requires the ability to map a trained AI model to the exact dataset that was used to train it, with the ability to examine that data closely, even if it’s been years since a version of a model was trained.

husky comparison

One of the easiest ways to enhance the explainability of a model is to pay close attention to the data used to train it. During the design phase, teams have to determine where the data to train an algorithm will come from, whether or not that data—assuming that it exists—was obtained legally and ethically, whether the data contains bias, and what can be done to mitigate that bias. This is a big job that shouldn’t be underestimated; 67% of companies draw from more than 20 data sources for their AI.

It’s also important to carefully exclude data that is irrelevant or should be irrelevant to the outcome. Earlier, I mentioned the possibility that a loan approval algorithm could base decisions in large part on an applicant’s zip code. The best way to ensure that an algorithm’s output isn’t based on a factor that should be irrelevant—like a zip code that often serves as a proxy for race—is not to include that data in the training set or the input data.

NetApp, explainable AI, and your organization

Because explainable data is essential to XAI, your organization needs to cultivate best practices for data management and data governance. These best practices include complete traceability for the datasets used to train each version of each AI model you operate.

At NetApp, we specialize in helping companies get more from their data. We help you manage data everywhere—on premises and in the cloud. It’s how we make data accessible, protected, and cost optimized.

NetApp® AI experts can work with you to build a data fabric—a unified data management environment spanning across edge devices, data centers, and public clouds—so your AI data can be efficiently ingested, collected, stored, and protected.

NetApp AI solutions give you the tools you need to expand your AI efforts.

  • ONTAP® AI accelerates all facets of AI training and inference.
  • NVIDIA DGX Foundry with NetApp offers world-class AI development without the struggle of building it yourself.
  • NetApp AI Control Plane pairs MLOps and NetApp technology to simplify data management and facilitate experimentation.
  • NetApp Data Ops Toolkit makes it easier to manage the large volumes of data necessary for AI.
  • NetApp Cloud Data Sense helps you discover, map, and classify data. Analyze a wide and growing range of data sources—structured or unstructured, in the cloud or on premises.

Adopting the NetApp AI Control Plane and Data Ops Toolkit can enable your team to manage data efficiently and securely, while ensuring the traceability and reproducibility that are an essential foundation of explainable data.

To find out how NetApp can help you deliver the data management and data governance that are crucial to explainable AI, visit netapp.com/artificial-intelligence/.

Mike McNamara

Mike McNamara is a senior product and solution marketing leader at NetApp with over 25 years of data management and cloud storage marketing experience. Before joining NetApp over ten years ago, Mike worked at Adaptec, Dell EMC, and HPE. Mike was a key team leader driving the launch of a first-party cloud storage offering and the industry’s first cloud-connected AI/ML solution (NetApp), unified scale-out and hybrid cloud storage system and software (NetApp), iSCSI and SAS storage system and software (Adaptec), and Fibre Channel storage system (EMC CLARiiON).

In addition to his past role as marketing chairperson for the Fibre Channel Industry Association, he is a member of the Ethernet Technology Summit Conference Advisory Board, a member of the Ethernet Alliance, a regular contributor to industry journals, and a frequent event speaker. Mike also published a book through FriesenPress titled "Scale-Out Storage - The Next Frontier in Enterprise Data Management" and was listed as a top 50 B2B product marketer to watch by Kapos.

View all Posts by Mike McNamara

Next Steps

Drift chat loading