 Okay, so robustness ensures that specific functions of the systems are maintained. And despite external or internal perturbations, I just had a perturbation, I had a large button on the screen. And I will show you later on how we can make use now of human robustness by bringing in concept knowledge, concepts into the machine learning pipeline via a human in the loop and how we can benefit from this to make our algorithms more robust. So a typical example are adversaries and I'll show you quickly a well-known example. So the best performing algorithms to date are very sensitive to even very small perturbations and the problem is that those are even not visible to a human observer, but even worse, it is also not visible that the output has completely changed. So here from benign to malign, and you can imagine this, this can have traumatic consequences in the medical domain. So I want now to summarize and go a little bit back to data science. What are the main challenges in the medical domain before we can do any machine learning? So data, machine learning means to solve hard inference problems from data. And this is why Pierre Simon de la Blas, this is my personal hero spoke about inverse probability. So the most of our data in the medical domain is of high dimensionality. They are multi-model and in short, many factors contribute to one single result. So the real world in the medical domain is complex. This is the problems we are faced are non-linear, non-stationary, non-independent, identically distributed. And you know, this is one of the major problems because all our best algorithms work well on independent, identically distributed data. But when do we have such data? And you imagine why are so many cool papers if you go to Neurips or so, you have awesome papers there, but why are they done on toy examples with well-known data sets? For example, Amnist data set, why? So that brings me to point three. We rarely have top quality data sets. And to be honest, we spend a lot of our work not on machine learning, but on the quality of data. So clinically medical data are collected with a remarkable degree of uncertainty, variability and inaccuracy. So finally, our algorithm work either on a statistical, that means correlation-based or even on a model-free mode. So we have little prior knowledge and often no mechanics on the data. So I would say this is the grand challenge. But let's now go even one step back. So let me show you a machine learning 101 slide, which I show my students at the very beginning. But you will quickly realize that these are the grand challenges, particularly in our field in the medical domain. And I list them now with increasing complexity. So the first challenge is learning from few data. Often in the medical domain, we just have not this big data. And big data for us would be nothing bad. So everybody is complaining about big data. No, big data would be good, but we have not this big data in the medical domain. We learn from observed data by constructing stochastic models. And that can be used then for making prediction decisions. And here the challenge is really in learning from few data. Then this extracting knowledge, we will come back to knowledge later on because this is very difficult. Then we have to generalize. They are for always fight the curse of dimensionality. And then the descent angle is underlying explanatory factors of the data. That means that, ultimately, we have to ensure in the medical domain a causal understanding in the context. In the context of a specific problem. And in the medical domain, we call it a theology. And this is an old science of the study of causation, these causes of disease. And this goes back to, let's say, 100 or 150 years ago. But now let's go even further back. You're sure Benj here emphasized during his post-nolecture in Europe's 2019 Vancouver, the work of Amos Tverske and Daniel Kahneman. This was first published in 1974. Amos Tverske and Daniel Kahneman, judgment under uncertainty. And the central hypothesis is that we have a dichotomy between two modes of cognition. And I have highlighted this here for you in blue. They speak about a system one, which is fast, instinctive and unconscious, which Benj here considered as what current deep learning can do as state of the art. And then I have here a system two cognition, which is slow, logical and conscious, and which is according to Benj here totally missing in the current state of the art in AI. So the assumption is obvious that we have to combine both systems to solve certain problems. And on a side note, this was described as I have mentioned in 1974 and Tverske died in 1996. And Kahneman received the Nobel Prize in 2002 for this work. And in 2011, he published his best seller. And I think everybody of you know this thinking fast and slow by Daniel Kahneman. OK, another success story. And I just want because I have unfortunately forgotten to have a clock here. And I have only this timing, which is now not correct. Can I have some time feedback from Professor Ziegler from the chairman? Is the time correct? I'm in time. You think you did. It's 9.20, I guess. So we have 20 minutes left. Yeah, something like that. OK, super. Thank you. So OK, so maybe this one is maybe the most famous paper. This has meanwhile over 4,000 citations. And maybe while I speak now, they get citations for this. And this is really awesome work. And this is sort under AI is better than doctors. So what did they do actually? So they classified skin lesions here, melanomas, using a single CNN here, Inception V3, and trained end-to-end from the derma images directly, used only pixels and disease labors as inputs. And for the pre-training, they used 1.3 million images from the ImageNet challenge. And then they used 130,000 clinical images consisting approximately 2,000 or so diseases. So look at the rock curve. The results are with 92% average classification performance on par with human dermatologies. So the red dots are the humans. And or even better, you see it even better. So if you consider that the algorithm does not get tired. And this is really an awesome result. And doctors have asked, consider this a very good performance. So the question remains open, why? And this brings us now to the human in the loop concept. So all the shown deep learning examples, which you have seen now, are beautiful demonstrations for the success of data-intensive AI. However, and I have said this before, sometimes we do not have this large amount of data. But most of all, the problem of the stochastic model free learning is in missing the ground truth. The discrepancy between empirical evidence and inference and missing conceptual understanding. And this is exactly where I want to emphasize the human in the loop. Because here a human in the loop may be beneficial because the human can bring in sometimes, and I emphasize sometimes not always, this contextual understanding, which we can say it's a context understanding or implicit or conceptual knowledge. And this is going back to the idea of making use both as Tversk and Kahneman said, both the system one and system two. Okay, so why are people considered robust and are able to generalize from a few examples? So sometimes and not always humans are able to understand this understanding I've highlighted this. And they can understand the context, they can make inferences from little, from noisy, from incomplete data sets. They can learn to discriminate relevant and I emphasize again relevant. This is a hard question, what is relevant to learn relevant representation and to find this shared underlying experiment factors across modalities. Remember, we need this always across modalities. And in particular, they can find this P of X and P of Y given X with the causal link between Y map to X. So this is very interesting that humans can do this, can do this very well. The only problem is, according to Tannenbaum, we do not know how to do that because then it would be easy, we could re-engineer it and use it for machine learning. But we don't have an idea how this works. So how can we model this, for example? So if we compare now AI, this is the symbolic AI here. If we compare this AI with humans brains, with biological brains, then we notice a significant difference. And as we have seen in all our examples, the AI achieves very good performance in certain tasks, but require huge examples and or explicit definitions. So contrary to humans, the human brain is able to generalize from few examples. And how can we model these cognitive capacities and make it use for machine learning? So again, this is a very good example for the usefulness of the theory of Laplace. And I know you are used to say base, but it actually goes back to the awesome work of Pierre Simon de Laplace. So I prefer to mention Laplace here. And it shows also how close the statistical machine learning and cognitive science is. And remember, always statistics tells us what structure we can infer from the data. For example, given the hypothesis and the data, we set a prior probability and multiplied the prior by this likelihood and we get the posterior probability. Of course, then we have in machine learning to normalize, that means we have to divide through the evidence. So, but in short, posterior is the prior times the likelihood divided by the evidence. And this is a very awesome work of Joshua Denenbaum. This shows how we can model this common sense. In German, we would say probably gesunder Hausverstand. So we have this hypothesis age about our world, that for example, is a meaning of a word or a cause. And given the observed data X, and this is subject to the constraints of the background theory capital C. This is the background theory, which we have. This is your set of expectations, your prior knowledge. And then you can score each hypothesis by computing the posterior probabilities and the likelihood P of X given H and T measures then how well each of those hypothesis predicts the data. And the prior probability of P of H given T expresses now the plausibility of the hypothesis given. And this is our background knowledge. And you will see how important this is. So, but how does the mind get out so much of it? I briefly want to show you this really nice experiment, I like this very much from the Denenbaum group at MIT. It quickly becomes clear how our mind works. Look at the slide. And these are pure fantasy objects here. And now I give you one random arbitrary chosen object. I give it a name, let's say Quaxle, a German name. And now I say this is a Quaxle and you look at it and you can now identify other Quaxle. If I give you further examples, you quickly can understand and find out this in a very short time. So you can make strong generalizations from this very sparse input data, which is noisy, ambiguous, and so on. Okay, this is still state of the art in context understanding. So the human mind makes empirical inferences that go far beyond the data available. And there are a lot of questions open, so I will not concentrate on this. There are a lot of questions open here. For example, what forms does our knowledge take across different modalities as I have mentioned before? This needs, this brings us immediately to what I will give you as a take-home message then that the necessity of this multi-model explainability or multi-model causability, which I explain in a short time. So the explainable eye community separates between two interpreting models, between anti-hawk and post-hawk interpretable models. So interpretable models, anti-hawk, this is theoretically explainable by design. You can see it with this class box approach because it is also very tricky because a decision tree or a graph can also be very large. So, but it can be interpreted theoretically. And on the other hand, we have this post-hawk models M. Here I have modeled it with this M. And we have seen such when I presented you the success of deep learning because these models are regarded as so-called black box models, not quite correctly because we understand the mathematical principles, but it still remains opaque for the human end user. So the XI community developed quite a number of such useful techniques and let us briefly look at a few of them. So for example, the simplest method are gradients and multi-verbal generalization of the derivative, for example, or you can do sensitivity analysis or decomposition. So luckily our world is compositional. So you break down the parts in smaller pieces to make it more understandable. Then you have optimization techniques, like the best element, which contributed to a single output. For example, local interpreted model agnostic explanations, very popular. When order method is deconvolution, you reverse the effects to understand the model. Then model understanding, this goes back to the oldest method by Rob Ferguson and Matt Seiler from 2014, for example, inverting CNN. This brings out two functions of two functions, a third function, and this is the product of both and then you can learn which contributed. And there are newer methods, for example, concept activation vectors and so on. So network dissection, for example, also generative adversarial networks can be used for this and extends these ideas of generative networks. Okay, just one method from the T.O. Berlin. This is a very interesting method which we use on a day-to-day basis, layer-wise relevance propagation. So the image is converted here. This is the image here. You convert this into feature vector representation. Then a classifier is applied to the image, given a category, category cat, no cat here. And then we compose the classification output into sums of the features and pixel relevance score. So you produce a relevance score and then you can produce a heat map and the heat map shows you which pixels or layers contribute with which probability to the outcome. So which makes a cat, what makes a cat a cat? Okay, so this is further extended. So very recent work searches for functional similarities between neural networks and our brain. So here, this is an extension of the before mentioned method. And this is maybe the first practically application from histopathology again. So here, plausibility checking is done via interpretability. I will come to this back later a little bit for a more closer look. But let us maybe here at this point look on how a human pathologist work, okay? Human centred, okay? So pathologists look at such histopathological images. They follow the famous Schneiderman paradigm overview first and zoom filter on demand. And they use medical terms. For example, they look for cells, tissue types, anatomical elements. They speak about an architecture. They speak about the penetration, a distribution, a differentiation, a grading. Here, for example, is a portal field and so on. So basically they form descriptions and they interpret this description with their implicit knowledge. For example, here cells are smaller and closer together. It might be a tumor, but it can also be an inflammation. It depends on the context. It is important to note that generally in the medical domain, different modalities contribute to a medical result. And here, one extremely important aspect is ground truth. And what is ground truth? Let us quickly rehearse ground truth because this is so important just for a mutual understanding. So ground truth is information provided by direct observation. This is empirical evidence in contrast to information provided by inference. We know empirical evidence. We know that empirical inference is then drawing these conclusions from empirical data. This is then leading us to causal inference and causal inference actually is a good example for causal reasoning. In short, all this together inspired us for the design and development of our Kandinsky patterns exploration environment because our leading principle was no correlation without causation. And this is a real challenge here to infer these causal relationships from these pure observational data. And Hans Reichenbach stated it in the Common Cause principle. And this is very important because this links causality with probability. So these are now the fundamentals for our Kandinsky patterns exploration environment. So what is our story? Fact, medically AI is currently extremely successful. I've given you some examples. Some AI even reaches human level performance. However, in the real world, we are facing three challenges. Ground truth is not always well defined, especially when making a medical diagnosis. Although human scientific models are often based on understanding causal mechanisms, today's success for machine learning models or algorithms are typically based on a correlation or related concepts of similarity and distance. And among others, this or was the motivation for the development of our Kandinsky patterns exploration environment. So we have named it after Vasily Kandinsky for a good reason because this Russian painter, Kandinsky analyzed geometrical elements which make up every painting, the point, the circle, the line. He did not analyze it objectively from the viewpoint of physics but from the point of view of the inner effect on the observer. So from let's say from usability perspectives, from psychology. So he described that there are basic forms and these can be used to set up more complex compositions. And this is very similar to the work done by Hubel and Wiesel, which later inspired the deep learning people around Jeffrey Hinton. And luckily, the world is compositional because otherwise all this stuff would not work. So we define a Kandinsky figure as a square image containing one-to-end geometric objects and each object is characterized by shapes, color, size, position, and so on. So a statement can then either be a mathematical function, this is the strong ground truth or a natural language statement which can be true or false. And this is important because this is can be used by us for producing counterfactuals, this what if questions. And several Kandinsky figures together form a Kandinsky pattern and to save time now, I show you this Kandinsky pattern. For example here, the Kandinsky pattern has two pairs of objects with the same shape. In one pair, the object have the same color. In the other pair, different colors, two pairs are always disjunct so they don't share an object. So this is a typical Kandinsky pattern. And within our Kandinsky pattern experimental environment, we can build such concepts along with the ground truth. For example, here, a true statement, the cells are smaller, closer together, it's a tumor. Or a false statement here, a complete random pattern or a counterfactual. What if the cells are slightly bigger, for example? And currently we are running a machine learning challenge where we challenge the AI community. For example, task one, explain Kandinsky patterns algorithmically. For example, train a network which classifies such Kandinsky figures according to the ground truth. And the greater challenge explained is Kandinsky patterns in natural language. So what can we do now with these Kandinsky patterns? And now I answer the question of Jung-Zilian in the beginning, causability. We can provide the underlying external factors of why the 92% are true or the 8% are false. So this is based on causality. So our Kandinsky patterns are extremely helpful because they enables us to study causality. This is the art and science of cause and effect. So causality is the effectiveness connects the cause with the effect, the relationship in principles of one process or state to another process or state. And this is the fundamental principles of all natural sciences. Yet I would say this is science per se. And now we have extended this concept and introduced causability as the measurable extent to which such an explanation of a statement which we have seen before, which we receive from an explainable AI method. For example, from this layer wise relevance propagation where you can remember where the most relevant parts which contributed to result are highlighted via heat map. And we can now map this to the human user which achieves a specified level of causal understanding. And we can measure this with criteria, effectiveness, efficiency and satisfaction in the specified context of use. And the mapping, this is very important. And I should mention that this mapping comes from this cartography. And this mapping is central for us because this brings us now to the key point to understand how shapes can be explained, how these ground tools can be mapped to our observations in the real world. So a recent outcome of our research is the systems causability scale to measure the quality of explanations, the systems causability scale. And this is the systems causability scale, the theory behind that. So you have to define very carefully a statement S is either be made by a human, S subscript H, I highlight this, S subscript H, or a machine, S subscript M. And the function of R, K and C. And R is the representation of an unknown or unobserved fact, U subscript A, E, related to an entity, E's entity. And K is the pre-existing previous knowledge, which I have mentioned before, which is for a machine embedded within the algorithm or made up for the human by explicit, implicit and tacit knowledge. And this is the human in the loop. And C is the context, C is the context for a machine, it is the technical runtime environment. And for humans, it is our physical environment, the context within the decision was made so we call it here in our model, a pragmatic dimension. So now an unknown, unobserved fact, U subscript E, this is this one here, this is the unknown, represents a ground truth, GD, this is the ground truth here, that we try to model with machines, M subscript M and or humans, M subscript H. And unobserved hidden or latent variables I found for Bayesian models, hidden Markov models and methods such as we use in probabilistic latent component analysis. So the overall goal is that the statement is congruent with the ground truth and the explanation of the statement highlights applies part of the model. So with this, we can measure the quality of explanations. So I come to the end. Yeah, I should of course mention what to whom and how, this is very important to state before. And what is now my final take home message for you? So we have learned current AI does not generalize well, we know that we cannot learn from few examples, we cannot infer causal relationships. So we need robust and interpretable AI to reduce costs, computational costs and limitation. We want to get causal explanations. This is why and the counterfactuals. What if, what if the cells are slightly bigger? And this should help us to ensure transparency and understandability. And this is the aim of science per se to make something understandable. Yeah, and my final slide now, so what should future human centered AI ensure? So in medicine, we need something which may be called multimodal causability. So our future goal is to provide human AI interfaces which enable a domain expert, a medical domain expert to ask questions to the result in order to understand why a machine came up with the result. So to ask what if questions, these are the counterfactual because then the domain expert can gain insight into this underlying explanatory factors. For example, how will the result be affected when leaving specific features away? And this is my final message. And now I say, thank you very much for the kind attention and I hope I stayed in time.