 So, this session is live and there will be a time for your questions at the end of this session. So, feel free to use Q&A section as well. And from this moment, I'm giving the work to our speaker and see you soon. Thank you, Andrei. So, my name is Rue Viera. I'm a software engineer at Red Hat and I've been working in different kinds of projects from distributed data processing with streaming data until more recently with between the line between business process automation and machine learning and specifically explainability. So, I'll just start by sharing my screen and so, okay, let's get on with it. So I'd like to talk to you today a little bit about explainable AI and how it can be used in the context of business process modeling and process automation in general. So I'll give you a brief introduction of what business process modeling is, what it consists of, what explainable AI is, why is it very important, and also give a few examples of explainable AI algorithms. So first I'll talk a little bit about business processing modeling and process automation. So business process modeling and process automation are powerful techniques to represent, abstract, and automate complex tasks. So it's ever present in the modern enterprise with common applications such as streamlining business operations and automating repetitive tasks. And the business automation tools that we commonly use are typically built on open standards such as BPMN and DMN. And we express these workflows in a standardized format. So that means we can formally verify them and apply best practices and testing, automation, simulation, and it can also be reused across different business domains if needed. So business automation processes in this sense build on open standards and by this they increase transparency and accountability since the business intents are declared using well-documented formats. So into this mix we're now throwing AI. So we're throwing artificial intelligence and machine learning. And AI as we all know is prevalent in modern life that includes business decisions as well. So machine learning is increasingly used to enrich this business decisions by embedding predictive models in the process automation. And although the benefits of learning say from historical data or being able to classify or predict scenarios are without question very useful and a game changer in some situations, many of the machine learning techniques we use might not be subject to the same level of transparency, observability, and interpretability as the process automation counterparts. So when we talk about AI and machine learning, what are we talking about specifically? So in the context of this talk, when we talk about AI and machine learning, we'll be talking mostly about black box predictive models. So black box predictive models are in the simplest definition any type of model that will produce an output for a set of inputs. But we are completely unaware of its internal workings. And the model inside our box or the types of models that we use can vary greatly in complexity. So they can be extremely simple ones to usually complex ones. But from our point of view, we're unaware of what's happening inside. We have no idea. So I mean, you can have intuition, but we have no idea. And you can only observe the inputs we provide and the results that the model gives us. And on top of this, different kinds of models will have different complexities and consequently different levels of interpretability. So we can say in a pretty uncontroversial way that the simple decision tree would be easier to interpret than a complex neural network. And in fact, some models usually entail such high complexity that even if you have all the training data, even if you have the model's algorithm and even if you have the predictions, an expert can sometimes not be able to explain the specific calculations and why a model returns a certain prediction. So even though a complex model can have huge benefits in terms of, say, high accuracy, we lose this ability to explain individual decisions. So let's take an example of just to illustrate how important it is to understand the model and being able to explain the model's outcome. So let's start with this basic model just for example purposes. So it provides a recommendation of whether an applicant should get a loan approved or not. So the input date would be something like the applicant's gender, salary, and credit score. Now if you look at this model, we can start suspecting that something is wrong, right? Because common sense will dictate that the higher the income, the more likely the loan will be approved. And on top of everything, we say that gender shouldn't be a determining factor in whether a loan is approved or denied. But if we don't have access to the internals of the model, or even if we have, but say it was in this model, it was a model so complex that humans cannot understand it, how can we understand the result and interpret the result? So the bottom line is that we are not able to understand models in simple terms. So quite often we realize that the less we understand the models, the less we will trust them and their predictions. And why is this important? Well, because the ability to assess, understand, debug, and benchmark machinery models becomes a fundamental issue when using them in processes, which could potentially have a direct impact in people's lives. So this is an ethical concern because when a company is using predictive models in their business workflows, they want to ensure that the business practices are followed and that everyone is treated fairly and also that they comply with law. So we see that more and more aspects of machine learning are becoming regulated. And an example of this is the General Data Protection Regulation, the GDPR, which legally defines something called the right to explanation. That is, individuals have the right to a meaningful and human understandable explanation for a machine learning solution that sorry, I apologize, for a machine learning decision that has an impact on their lives. So explainability, so explainability or explainable AI is precisely the set of methodologies and algorithms that we can use to allow to create that allow us to create a human understandable explanations for black box or even for opaque models for complex models. So explainable AI is a large field and it's an active area of research. So includes interesting methods where the complexity of a model is restricted in order to produce more interpretable ones and post hoc methods where we deal with an already trained model with any degree of complexity. And 40 stock will focus on post hoc explanations. So additionally, post hoc methods can be further divided into global or local explanations. So while global methods at providing explanations for the general behavior of the model, local explanations restrict the explanation to a small neighborhood of the prediction space. So although accuracy is expected to be higher for local models, we might lose the ability to extrapolate our explanations to the model as a whole. And further, local explanations or local explanation methods can be divided into two general categories. So model specific explainability employs methods which target a specific structure or algorithm of certain machine learning models in case we know what was used. And model agnostic methods provide general tools to obtain explanations for any type of model regardless of the structure. So these type of model agnostic explanations typically involve input manipulation and we use that input manipulation to understand how the outputs of the model change at the local level. So at this point, I would like to introduce trusty AI. So trusty AI is an open source community project, which I'm involved with. And it's a component of Kajito and Kajito is an open source end-to-end process automation community project for building intelligent co-ordinated applications. So it's a cloud-first runtime environment and it leverages technologies such as process models and decision technologies such as JVPM, Jules and Optipliner. And it also provides crucially for our scope an explainability service, which features some of the methods mentioned here before. And the explainability features of trusty AI are also available as a library and also other community projects are in the works, such as Python bindings to use these explainability methods from Python. So we can see that Kajito can provide a bridge between process automation and machine learning explainability. So trusty AI provides several out-of-the-box explainability methods which mostly fall in the post hoc local model agnostic family. And this includes LIME counterfactual explanations and SHAP. And we'll look at each one of these methods individual in the next slides. So we can start with LIME actually and so excuse me. So the first explainability method I'd like to talk is local interpretable model agnostic explanations or LIME for short. So LIME tries to answer the question of which features are more important and how do they affect the result? So LIME was first introduced in the 2016 paper, Why Should I Trust You? by Riberos Singh and Kostrin. And LIME is part of, as I mentioned, the local explainer family. That is, LIME tries to explain potentially complex decision functions which are difficult to explain as a whole. By focusing on a neighborhood of the prediction and providing an explanation locally consistent with the model. So local explanations cannot be extrapolated, as we mentioned, to the entirety of the space. And they do not aim at explaining the global behavior of the model. But so having said that, how does LIME work? So LIME works by starting by calculating in LIME permutations of the original data. So, for instance, in this case, and this is just for illustration purposes, we could implement permutation by, say, adding noise to the original input. It's obvious that the permutation types or the method we use will depend on the type of features we're dealing with. So it will be a different thing for image data, for text data, or for tabular data. But for this example, we're going to use the noise adding scenario. So then after creating these permutations, we wait on new data points according to their distance to the original input. So once we have those new data points, we use our original model to predict the outcome for each of our simulated data points. And using this new data set, we train a surrogate model which has a higher interpretability and which locally can be an approximation of our black box model. So, for instance, we could train a weighted linear regression model. And we can also tune our surrogate model in terms of selected features. So there is a trade-off where the more features we use, the better the local approximation will be, or the fewer features we use, the more interpretable the model will be. So, assuming that we use, in this example, a weighted linear regression, we can consider the regression weights as the feature importances. And these importances are contrastive and they give, in the example of tabular data, a quantification of how important each feature was for the final result. So here's an example of how you can use LIME in trust AEI, an example of the visualization of the output. So on the left, we have the code used to generate the LIME explanation. And on the right, we have the visualization that has the feature importance for a specific outcome on a lone-approval black box model. So we can see, for instance, that age has a positive impact on the income. While the number of installments has a negative impact on the outcome. And we can immediately see how this is useful to interpret a black box model prediction. We can, at a glance, have a quantification of which features are more important in relation to our outcome. And with this, we can diagnose our model accordingly and better understand why the decision was made or how the decision was made. So the second explainability method I'd like to talk about is counterfactuals. So counterfactuals answer the question in the form of, to get this specific outcome, what should my input be? So let's say, taking again the example of the lone-approval model. Before a specific input, we have a prediction that the loan is not approved. Counterfactuals will provide us with an alternative set of inputs, which will lead to the desired outcome, that is the approved loan. The approved loan, I'm sorry. So counterfactuals have a set of desired properties and this is an active area of research, actually. So we'll just focus on the most common ones. And the most common ones are validity, actionability and sparsity. So the first property, validity, states that not just any solution that satisfies our desired outcome is a valid counterfactual. So let's say we look at this example where we have two possible solutions and both are valid, but one of them is much closer to the original inputs than the other one. So the validity property states that the counterfactual that minimizes this distance from the original one, it is the valid counterfactual. So or having saying that in a better way, the validity property says that we should minimize the distance between the inputs and the original inputs and the counterfactual result inputs. So the next property is sparsity. And sparsity states that counterfactuals which change the minimum amount of features are preferred. And we can see this from an explainability lens because if we have a model with a huge number of inputs, it is easier to interpret the counterfactual if it only changes a few inputs rather than changing a lot of them. So that would favor the explainability, the interpretability of the counterfactual. So formally, we can define this as adding a penalty term to each term changed and including it in our minimization function in addition to the distances that we had previously. And finally, the last property is actionability. So actionability refers to the ability that the counterfactual has to distinguish between mutable and immutable properties. So certain features, be it for, say, business reasons, legal reasons or ethical reasons, should not be included in the counterfactual. That is, it should be immutable. And we can enforce this by adding a property that formally we can add a term which restricts the counterfactual search to the subset of the mutable feature space. That is, we are not searching counterfactuals in the immutable feature space. So how is this implemented? So many counterfactual algorithms in other open source solutions are built using gradient-based minimization. However, in trust-EI implementation, we use a constraint-solver algorithm and we leverage the opto-pointer solver. We start by defining our outcome goal and defining the input search boundaries. And having defined that, the constraint solver will then explore the feature space and will sample points in a direction which minimizes our defined loss function. That is, we'll be increasingly approaching solutions which simultaneously get closer to the original inputs, change the minimum amount of features, and have the desired model output. And finally, we'll have our counterfactuals. So again, just a quick example. Explainers in trust-EI are quite flexible in the sense they accommodate different types of predictive models. And to request a counterfactual explanation, we define our goal in terms of the output features, feature types, goal value, minimum output probability. We pass the original features. We pass the feature domains, which include lower and upper bounds for numerical features or a collection of values for categorical features. And we explicitly set the ones that should be immutable. And finally, we request a counterfactual. So finally, the last method I'd like to talk about is Shapley additive explanations or SHAP. So SHAP tries to answer... So where Lyme tried to answer the question, what is the importance of each feature to my final result? SHAP tries to answer the question of how much each feature actually contributes to the result. So what's the difference? As an example, if you were using a model that estimates the value of a used car by using a set of vehicle characteristics, Lyme would give us which features contribute the most to the final decision, to the value evaluation of our car. But SHAP would actually give us a breakdown of how much each characteristic actually adds or removes to the car value. So SHAP was initially published by one book in Leading 2017, and it relies heavily on a concept from game theory called Shapley values. So Shapley values simply put, they tried to establish how much did the player in each coalition contributed to the final result. And to apply the concept of Shapley values, we start by considering each feature as a player and the prediction as the result of a game. And we assume that we have a model where we can easily remove features and get an outcome based on a partial input. And we calculate the difference between the outcome from all features and say the outcome of a coalition without a certain feature, and that will give us the marginal contribution of that coalition. Now, if we calculate the outcomes of all coalitions that differ by having a feature or not, the mean marginal contribution will be the Shapley value of that feature. So this is Shapley values, but how do they apply to SHAP? So as with other explainability models, we're using a black box predictive model. So in general terms, we take an input i and the model returns an outcome y. Sorry, an input x and returns an outcome y. But if you define an alternative model g, which takes a simplified set of inputs x prime and returns an outcome y prime, and we impose a few conditions to this new setup, like if x is quite similar to x prime, then we expect y to be quite similar, the y prime, sorry, to be quite similar to y. And we also impose that the model g should have this linear form, where phi 0 is background data and phi i is the effect of each feature, which will be the Shapley value for each feature. So there are two problems here. The first problem is that the number of coalitions that we need to calculate suffers from a combinatorial explosion with a number of features. So for all but a few features, we quickly hit prohibitive computational costs. And the second problem is that most models don't allow us to use missing features. So the solution that Shap presents is to use something called the Shap kernel. So the Shap kernel works by replacing the missing features with values taken from a synthetic background data set. And that allows us to create the average output for the coalition with the values from the background data by taking the place of the missing features. So once we have the average of the contributions, then we have this in the shape of a linear system. And if you solve it, the coefficients that we have are equivalent to the Shapley values. And this allows us to quantify the individual contributions of each feature to define a result. So, I mean, similarly to counterfactuals and LIME, we see that the Shap application is quite straightforward. With just the AI, we start by defining the original input as a list of features. We then give the explanation context, which is the original input and the output. And we associate the explainer, and then we request the explanation, which will be in the form of per feature saliency. That includes a contribution of each feature to the output. So in summary, since we talk about a few things, we see that explainability and interpretability are critical concerns. So be it from a legal compliance point of view or from a service that we provide to users, it is quite critical. Even as a tool for data scientists to debug, profile, and better understand machine learning models. So as artificial intelligence and machine learning become more prevalent in aspects of life, it is essential that we trust these methods. And due to their inherent complexity, that can be quite difficult to understand without proper tooling. And although this is an active area of research, several methods are available and they apply to a wide variety of scenarios and different types of models with good performance. And as usual, open source and its communities are taking the leading role here with making many of these tools available for immediate use and for everyone that just wants to be involved and help improving them. So these are just some resources, if you want to learn a bit more about trust AI or explainability. So our website, the preprint, we put on archive. You can chat with us if you want on Zulip. You're welcome. And you can check out our code in GitHub. And I also like to acknowledge some of my colleagues that helped me make some of these examples and with whom I work on a daily basis. And that is all from me. Thank you. And please let me know if you have any questions.