 A lot of work has gone into trying to look inside these complicated models such as BERT and understand what they're doing. One way to break up the understanding of these methods is fairly simple models that view them as input-output boxes, trying to explain the world if you take this tweet and classify it as a positive. Which words are causing you to rate this as a positive tweet or a negative tweet as an angry tweet or a frightened tweet. And many of these methods are ones that look at what we would call feature importance. Each word in the sentence going in is a feature and then you can use methods which we won't have time to cover in this course such as Shapley values which roughly say how important is this word if I were to remove it how much does it change the meaning of the sentence and in what direction. There's a little complication because the subword models the byte pair encodings make it a little bit harder to map out feature importance in terms of a byte pair encoding to a word but people are still working around that. The other sorts of explanations are looking at the model trying to look at the attention weights for an attentional model or to look inside of BERT to probe BERT and to see what's being computed where. So one thing one can do is remember that BERT takes in a input sentence the earth revolves around the sun it goes through a whole bunch of layers with the whole bunch of heads right roughly a dozen layers dozen heads and then it does some sort of prediction where it predicts the missing tokens the masked tokens and one can then look in and say okay if we were to take any one layer say layer two take the outputs of the hidden knowns in layer two and feed them into a simple model a logistic regression model how well does it do for a part of speech tagging or for dependency parsing or for co-reference or for any problem we might have and what people then find is that there is in some sense a very sloppily loosely defined hierarchy where some things like part of speech tagging occur closer to the embedding and some things like co-reference how does this word the it the she the he referred to something before occurs deeper in the network and you can see that there is some sort of a roughly analogous to a cnn which has a low-level feature detection of input and higher level deeper semantic feature recognizing going on deeper there's something sort of kind of like that going on BERT but it's not as clean as the visual system which is a little bit more clear about how it goes from very local to much more global the other thing you can do with BERT which is super cool as you can look at what each of the heads are attending to and so if you go and look for each word and see for the self-attention on the encoding what does it pay attention to you can see that there are different things are attended to in different cases so in this case you can see the look at the red lines here plans to be discussed when discusses plans you upgrade what you upgrade the line or its lines you see a reflection between upgrading its what are you upgrading the line what are you discussing the plans what are you plugging it goes on to plug a few diversified the plugging the funds so see that you can see the attachments here of the objects and their verbs being learned by these heads and these two different heads learn different things similarly looking at the red on the right hand side and again we're at levels eight through 11 in the space fairly deep in what do we have the complicated language in the huge new law the fight in each case here each of these words is it is being matched up with its attending to the words that modify it the former executive this time so again following the red lines you can see that they're really attending to something relevant to them and different heads attend to different relevances and this provides a nice way to look and see what's going on under the hood it's a little messy because there's a dozen different heads you have to check but you can get a flavor for what they're looking for on the blue lines which were too hard for you to remove you can note that the thing is like the sep the end of sentence attends to all sorts of things including the cls and including the periods so much more general attention for things like separators as opposed to specific words that are modified in different ways and these attentions are useful for many things including a topic we'll come to shortly which is looking at bias