How do these transparent tools decode AI mysteries, revealing neural network decisions?

Enhancing transparency with deep neural networks and sparse oblique trees

How do these transparent tools decode AI mysteries, revealing neural network decisions?

Deep neural networks (DNNs) have become indispensable tools in artificial intelligence, revolutionising areas like computer vision, speech recognition, and autonomous driving. These networks boast remarkable accuracy in tasks such as image classification, yet their decision-making process remains mysterious. This lack of transparency raises concerns, particularly in critical domains like healthcare and finance.

The rise of DNNs has prompted discussions about their interpretability and vulnerability to adversarial attacks. Unlike traditional machine learning models like linear models, which offer insights into their decision-making processes, DNNs function as opaque black boxes, complicating efforts to understand their internal mechanisms. This opacity presents significant challenges, particularly concerning the potential consequences of incorrect predictions in sensitive contexts.

A new study addresses these challenges by proposing a novel approach that employs sparse oblique decision trees to understand how DNNs arrive at decisions. These trees are applied to the inner layers of DNNs, mimicking the model’s classification function. This approach enables us to explore how the internally learned features of deep nets relate to the output classes. Additionally, analysing the tree allows us to adjust DNN outputs by inhibiting certain sets of neurons. The study aims to enhance the understanding and regulation of DNNs, promoting increased transparency and accountability in artificial intelligence systems.

The research focuses on interpreting DNNs by providing explanations for any input instance. Unlike methods that only explain specific input instances, the goal is to offer global explanations. Instead of employing other global explanation techniques, this approach mimics solely the network’s classifier using a highly accurate and interpretable model known as sparse oblique trees. These trees stand out by using hyperplane splits with minimal non-zero weights, unlike traditional axis-aligned trees. This distinctive approach not only ensures high accuracy but also yields compact trees, making them easier to understand. Additionally, the sparsity at each node further contributes to understanding the decision-making process of DNNs.


Figure 1. Mimicking part of a neural net with a decision tree. The figure shows the neural net equation y = f(x) = g(F(x)), viewed as comprising a feature extraction component z = F(x) and a classifier component y = g(z). The “neural net feature” vector z represents the outputs(“neuron activations”) of F, and can be interpreted as features extracted by the neural net from the original features x (pixel values). A sparse oblique tree is employed to mimic the classifier component y = g(z) by training the tree using the neural net features z as input and the corresponding ground-truth labels as output.
Credit. Papers – DMKD (Figure 2) and arXiv (Figure 2)

The overall approach can be broken down as follows. Let’s consider a trained DNN classifier “f(x) = y“, where “x” represents the input and “y” denotes the output. This classifier “f(x)” can be decomposed into two parts: “f(x) = g(F(x))”, where “F” denotes the feature-extraction component (represented by “z = F(x)”) and “g” represents the classifier component (“y = g(z)”).

To begin, a recently developed algorithm called Tree Alternating Optimization (TAO) is utilised to train a sparse oblique tree “T“, aiming to mimic the classifier component “g“, as illustrated in Figure 1. The sparsity of “T” is optimised to maximise validation accuracy while maintaining minimal complexity. Subsequently, the tree’s structure is analysed to reveal meaningful patterns regarding the behaviour of the deep net.

Inspecting the sparse oblique trees

Figure 2. Tree mimicking the VGG16 network trained on 16 classes.
Credit. Papers – DMKD (Figure 6) and arXiv (Figure 6)

In the example provided, the figure showcases a sparse oblique tree trained to interpret the widely used image classifier DNN, VGG16, which encompasses 16 classes. The sparsity is configured such that each class corresponds to only one leaf in the tree. Upon examination, the tree reveals a distinct hierarchy of classes, predominantly associated with the background or context surrounding the main object in the image.

For instance, the leftmost subtree includes man-made objects commonly found on roads, such as “warplane,” “airliner,” “school bus,” “fire engine,” and “sports car.” Conversely, the rightmost subtree features man-made objects typically found on the sea, like “container ship” and “speedboat,” alongside natural elements such as “killer whale,” “bald eagle,” and “coral reef,” commonly associated with sea environments.

However, “goldfish” appears in a separate subtree away from the sea background subtree. Upon examining the training data, it becomes evident that goldfish are primarily housed in fishbowls rather than inhabiting the sea. Additionally, a subtree positioned in the middle contains animals typically found in terrestrial natural environments, such as “tiger cat,” “white wolf,” “goose,” “Siberian husky,” and “lion.”

This example demonstrates how, in some cases, the deep net categorises objects based on their background or other confounding variables. It also highlights a potential vulnerability of the network, as it may misclassify an object based on an unusual background, such as a “bald eagle” standing on a road.

Manipulating DNN features to alter the output

Next, exploration delves into the potential influence on the network by controlling specific neuron activations. This involves referring back to the trained tree, which mimics the DNN classifier. Each node of the tree represents a hyperplane, where each weight corresponds to the position of a neuron in the final layer of the feature extraction component (F). Additionally, due to the sparsity enforced during training, only a few weights in each node are non-zero.

To identify neurons specific to a particular class, one needs to trace the path from the root of the tree to the leaf linked with that class. Along this route, all non-zero weights indicate the location of neurons associated with the class. The concept is that if an input traverses the tree with zero values everywhere except at the non-zero weight positions of a class, it will inevitably reach the corresponding class leaf. Since the tree accurately mimics the DNN’s classifier, the same principle should apply to the DNN as well.

In the DNN, this is achieved by allowing only class-specific neuron activations to pass through the classifier while inhibiting the rest, as illustrated in Figure 3. Likewise, to prevent the network from predicting a specific class, the activation of neurons belonging to that class is blocked.

Figure 3. Top: original network. Bottom: masking operation in the network. The symbols’ meanings are as follows: input x, feature extraction part of the network F, original features z, binary mask created using the tree μ, modified features z, classifier part of the network g, original output y, and modified output ȳ.
Credit. Papers – DMKD (Figure 3) and arXiv (Figure 3)

Consider the example depicted in Figure 4. Here, it demonstrates how the same image can be classified differently when only the neurons corresponding to a specific class are activated.

Figure 4. Illustration of masks for a particular image in VGG16. Column 1 shows the image masks (when available). Column 2 shows the histogram of corresponding softmax values. Row 1 shows the original image. Row 2 shows a mask in feature space intended to classify it as “Siberian husky”. Row 3 shows a mask manually cropped in the image, whose features resemble those of row 2. Row 4 shows a mask in feature space obtained by finding the top 3 superpixels whose features most closely resemble those of the masked features in row 2. Row 5 shows a mask in feature space intended to classify the image as “bald eagle”.
Credit. Papers – DMKD(Fig 9) and arXiv(Fig 9)


The study presents sparse oblique decision trees to understand and influence deep neural networks (DNNs). Serving as a potent “microscope,” these trees enable us to examine how DNNs arrive at decisions. By replicating the decision-making process of DNNs, they reveal insights into which groups of neurons influence specific outcomes, thereby enriching our understanding of neural network functions.

Furthermore, these trees can manipulate DNN behaviour by regulating neuron activations. This allows us to alter predicted outcomes for given inputs, offering insights into the network’s reactions under varying conditions. For instance, we can simulate adversarial attacks by adjusting network features to provoke incorrect predictions.

This approach is not confined to image data or specific types of DNNs; it extends to various data formats, such as audio or language. Additionally, its implications go beyond artificial intelligence, potentially benefiting fields such as biology. By examining how neurons relate to genes or diseases, we can utilise this technique to explore the effects of genetic mutations or treatments, thereby facilitating new discoveries and advancements in diverse research areas.


Journal reference

Hada, S. S., Carreira-Perpiñán, M. Á., & Zharmagambetov, A. (2023). Sparse oblique decision trees: A tool to understand and manipulate neural net features. Data Mining and Knowledge Discovery, 1-40.

Dr. Pranab K. Mohapatra, currently a Research Associate in the Department of Physics of Complex Systems at the Weizmann Institute of Science, holds a PhD from the Indian Institute of Technology Bombay. He has completed postdoctoral positions at Tel Aviv University, Israel, and Northwestern University, USA, specializing in optical physics, materials science, and engineering. With over a decade of experience, his expertise extends across diverse domains, particularly quantum materials for next-generation electronics/optoelectronics device applications. His present research concentrates on the epitaxial growth of birefringent dielectric biogenic crystals for diverse optical applications.

Dr. Pranab is a reporter at The Academic.