Explain Models Functioning and Decisions

Saimple static analyzer helps you demonstrate your model’s trustworthiness through formal proof, ensuring transparency and accountability. By explaining how your models function and make decisions, you can build trust, detect biases, and comply with regulations, improving both user confidence and system reliability.

What is AI Explainability?

As artificial intelligence is gaining popularity in many different areas, some concerns are being raised about the opaque nature of machine learning models. They are often described as black-boxes, and the reason is not that it cannot be looked into, but rather that the number of variables is simply just too high, so high that it cannot be fully understood by humans.

Figure 1: Efforts are being made to make machine learning models more understandable.

Efforts are currently being made to make machine learning models more understandable. These efforts already allow to highlight the important variables used by the models, and thus verify that they are correct for the right reasons.

This verification is crucial, as a model with good accuracy might harbor some hidden biases. A classical example of such bias was demonstrated in the article “Why Should I Trust You?”: Explaining the Predictions of Any Classifier [1], where a team of researchers of the University of Washington, Seattle, designed a neural network to distinguish between pictures of wolves from pictures of husky dogs. The network was trained such that every wolf picture was taken in a snow background, while the husky pictures were not. Thus, the network was quite accurate, but it learned to distinguish wolves from huskies by using the presence of snow, so that the decision was not based on the right criterion, as demonstrated in Figure 2.

a husky on the left is confused with a wolf because the pixels on the right

Figure 2: The neural network distinguishes pictures of wolves from pictures of huskies by focusing on the presence of snow in the background

The notion of explainability, defined in the ISO/IEC 22989 and ISO/IEC TS 6254 standards, refers to the capacity of an artificial intelligence model to relate the reasons behind its behavior, in a way that is understandable to a human user. For instance, the above model was made explainable using a simple visualization. The notion of explainability is closely related to the one of interpretability, defined in the ISO/IEC TS 6254 standard. A model is interpretable if its decision process can be understood by experts, through a technical analysis of its inner mechanics. Note that an explainable model is interpretable, but the converse does not necessarily hold.

Why is AI Explainability Important?

As mentioned above, even seemingly accurate models may harbor hidden biases that make them untrustworthy. In safety-critical applications, this could have dramatic consequences. For instance, for self-driving cars, if the detection of stop signs relies on the presence or absence of pedestrians (it can be the case if, for instance, the proportion of pictures of stop signs with pedestrians is too high in the training data set), exposing and correcting this bias is paramount to ensure the security of citizens. For these reasons, legislative measures such as the IA Act have officially recognized the necessity for XAI, and standards like the ISO/IEC 24029 series, ISO/IEC 42001 and ISO/IEC 22989 explicitly define the key notions behind explainability. The validation through explainability analysis is thus essential not only for a better understanding and for enhancement of machine learning models, but also for their safe deployment.

How to Verify and Explain AI?

Different approaches can be used in order to provide an explainability analysis of a machine learning model. Firstly, the explanation can be either global or local. A global explanation highlights the important features in general, i.e. for any input, while a local explanation focuses on the influential variables for a specific input.

Most of the explanation methods provide local explanations. This is the case of the famous LIME (Local Interpretable Model-agnostic Explanations) method. In the LIME method, data points are generated around the input through small perturbations. Then, a linear model is constructed by minimizing a loss function that is the sum of two terms. The first one is the sum of the distances between the original output and the new data points outputs, weighted by the distance between the two inputs (the further the new data point is, the less influence it has). The second term represents the complexity of the linear model, illustrating the fact that we want as many zero coefficients as possible. Once the linear model is constructed, an interpretation of its coefficients provides a local explanation.

Another widely used method is the Shapley additive explanation (SHAP) method, that uses game theory principles to calculate the impact of each input variable on the output. It relies on the fact that the variables can be seen as players of a coalition game whose contribution we want to determine. This method allows to provide global or local explanations.

Many other methods aim to enhance the explainability of ML models, including gradient-based techniques. These methods use the output partial derivatives associated to each variable in order to calculate their influence, thus providing local explanations.

All the local explanations seen above only use the values of the variables of the input (or, in the case of LIME, a small number of perturbations around the input). In Saimple, the local explanations are provided by taking into account the whole set of all chosen perturbations around the input. Using abstract interpretation, we can access the impact of the variation of each variable on the output within this whole set at once.

Saimple and the Notion of Relevance

Saimple allows to provide explainability to neural networks and support vector machines. To do so, it uses a measure called the relevance, defined in ISO/IEC 24029-2 as a value that expresses the influence of each variable, implying an ordering of their importance. In Saimple, the relevance is calculated simultaneously as the dominance (see the model robustness validation page). Specifically, since the final abstract shape is expressed as a function of the input variables, the terms associated to each of these input variables can be used to deduce its influence on the output, i.e. its relevance. Note that with this method, we do not take into account the influence of the variables only on the specific data point corresponding to the input, but inside the set of all studied perturbations.

Figure 3: A relevance plot obtained with Saimple, for a machine learning model that classifies 30 and 50 speed limitations signs. The high (in absolute value) relevance pixels are located on the 3 digit, that allows to distinguish between the two classes.

In the case of image classifiers, Saimple provides a relevance value for each pixel and each class. In this context, the pixels with high (in absolute value) relevance are the ones for which a variation of given amplitude has the most impact on the score of the class. Saimple offers different visualizations that allow to highlight the important pixels. An example is provided in Figure 3, where an image classifier was trained to distinguish between 30 and 50 speed limitation traffic signs.

The relevance values and visualizations provided by Saimple can therefore make your opaque machine learning model explainable, and help you improve it. On one hand, if your model is not accurate, Saimple can help you understand why. On the other hand, if your model is accurate for the wrong reasons, Saimple can help you detect these biases that would have remained unseen otherwise. Either way, it allows you to correct your model easily, using methods like data augmentation or adversarial training. Once your model is perfected, Saimple helps you validate it and demonstrate its explainability, allowing you to put it on the market with conformity to the norms and regulations.

Bibliography

[1]. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).