 So what I want to talk about today is not necessarily language learning, but decoding what deep neural models learn about language. So as I'm sure all the audience here knows that deep neural network models are everywhere. In any domain of artificial intelligence machine learning data science when there is enough data there are enough examples of a particular task. You see traces of neural network based models that learn patterns from these large datasets and the domain of language and speech processing is no exception. Here I have put a sketch of the transformer network which is now the front runner in neural network architectures. And as you can see I'm not going to explain the architecture but what I hope that you can see from a very shallow glance is that this is a very complex architecture. On the right hand side you see a language model called birth, which is basically the foundation of many models not natural language processing models for various applications. And these birth based models have several different layers so you see that the small base birth has 12 layers and the large model has 24 layers of some part of the transformer architecture. So that's the where the power comes from, from this complexity and from the depth of these models, but with this power comes a cost. So, as I was saying, deep neural network models are very powerful but they're very hard to understand because of their complex architecture and because of the very many parameters that they have. We can't easily understand the inner dynamics of these models and this poses a challenge for both people who want to use these in applications. As well as the researchers want to understand how these models work. So some of the main questions that we are faced with are, for example, what aspects of input does a model pay attention to. What knowledge does a model learn to perform a particular task. How can we use this information to improve both the performance and the generalizability of such a model for a new domain or for the new task or new data set. So what I'm going to do today is not to provide any answers but to give you some glimpse of some general approaches that people have proposed for getting some insight into how these models work. The first approach that I want to talk about is input manipulation so this is a, an approach that's actually inspired by behavioral psychology is the idea is that you have a black box you want to understand how it operates. So you basically manipulate the input that you give to this box and then observe its behavior. For example, let's say that we want to measure how much the output of a natural language processing model changes, if you remove one word from an input sentence. Can you move to the next slide. Yeah, so as an example of a task. I think that we train a model that learns to map images to captions. So, if we give enough examples of the model it can actually map this picture of a baby laughing and looking at the computer to the caption. A baby sits on a bed laughing with the laptop computer open. Next slide. If you manage to quantify the contribution of each word, you will see that on the X axis we have each word and on the Y axis we have the contribution of that. And if you look at the blue line, which is the version that they just told you about, you see that the word baby is the most important word in this caption. So that's giving us some very general idea of what kinds of words this model's pay attention to if you are curious to see what happens if we remove the word baby from this example. Next slide will show you next slide please. Yeah, so if you remove the word baby this is the picture that the model will return so there is there seems to be a true understanding of what happens and the corresponding between words and elements in the caption. If you accumulate this kind of information. The next slide please. You will see that the model like this for example pays attention mostly to adjectives or nouns that are labeled as JJ and then and ignore certain types of words such as determiners prepositions and so on. So these kinds of input manipulations give you some insight about what types or what parts of the inputs the model actually pays more attention to and uses in the past that it performs. But it does tell us little about the actual representations that the model learns. But this takes us to the next approach when you show me the next slide. Yes, so here the idea is that we want to actually analyze these internal representations that these models learn the questions that we might have to want to ask is what aspects of language does the model encode the model encode, for example, does the model encode linguistic forms such as words morphines phonemes, syntactic structure and on which layers does this information gets encoded. Does it encode anything about the meaning and again on which layers. So one approach that one technique that has been widely used next slide please. And this is what is really called auxiliary tasks or probing classifiers. The idea here is that you have this model that you have trained to perform tasks. But now you want to know whether this model learns anything or includes anything about the knowledge why. So what you do is, you give the model original model and input. You record and extract the activations on any layer of this model, which is usually a vector of numbers and then you pass this as input to a classifier or regressor that predicts knowledge why. The argumentation here is that if you manage to train a classifier or regressor that predicts that your hypothesized knowledge with acceptable accuracy then the original model must have learned something about that type of knowledge. So here is, yeah, next slide. Here's a number of sample auxiliary tasks that you could potentially use for the example model that I showed you before mapping images through sentences. You can train classifiers or regressors to predict for example the length of the utterance that the presence of specific words representation of semantic or form similarity, etc, etc. And the next slide. This is a sample snapshot of the results of such analysis on a model that we've been working on where you see that the best performing probes probing classifiers or probing regressors for tasks that include some sort of information about linguistic form are often the ones that use input activation layers from lower layers in the model. And if you use activations or representations from top layers, then models that involve some sort of a meaning usually perform better. So these are very widespread techniques these probing classifiers or auxiliary tasks. The last approach that I wanted to tell you about is representational similarity analysis. So what happens in the case of probing classifiers or auxiliary tasks that we are trying to map representations learned by a model to a different representational space for example to the space of morphology or syntax. But when these representational spaces became become more complex or structure than these probing classifiers don't work very well. And a different type of technique that is actually borrowed from computational neuroscience comes to our rescue this is called representational similarity analysis. And the idea is that you have the same set of stimuli or data points but these are arranged differently in two different representational spaces. One is the representational space that our model learns and the other one is the target or hypothesis to representational space that we're interested in. And what we want to do is to actually measure the correlation between the similarities pairwise similarities between points across these two spaces. Next slide. So what we need here is basically a similarity metric within two spaces a and b and what we don't need is a mapping between these two spaces. So in that sense, these models are very versatile. And just as an example of their application next up next slide. So to BERT that I mentioned on the first slide to see whether it includes anything about syntactic structure and here you see that BERT the red lines. The BERT model actually includes most syntactic structure on its top layers, mainly layers 22 to 24. So to wrap up what I just said next slide. So this domain of developing interpretability techniques for deep neural models of language is a very young domain, but it's a very fastly moving and developing. And we see new techniques being introduced pretty much every day. And you can generally think of these as two different general approaches, one when we have a hypothesis in advance, we are looking for the encoding of a hypothesized type of information in our models. So probing classifiers or representation of similarity analysis are examples of this approach or data driven approaches when we look at how these representations are formed. For example by manipulating input or following the propagation in this network. I wanted to finish my presentation by telling you about this recent project that was just funded by NWA, which is exactly focused on this pursuit of interpretability techniques for deep learning models for deep models of text and sound. And last slide please. There are quite a few PIs involved in this project, but as you can see we also have three people from Tilburg, Tomlens, Czechoslovakia and myself, all from School of Humanities and Digital Sciences. So if you have any questions, I would be very happy to answer. Thank you very much.