 Welcome back to analyzing software using deep learning. This is the second part of the module on using hierarchical neural networks for analyzing software. And specifically what we'll do in the second part is to look at one application which looks into types and programs and how to predict types in programs using hierarchical neural networks. As most of the applications that we look at in this course, this is based on very recent research and actually this is based on a piece of research I've done myself while being on a sabbatical at Facebook. Together with a couple of other people, if you're interested in more details about this work, you're invited of course to read the paper which you can see here on the slide. Let me start by saying something about the motivation for this work. So as you all have probably seen, dynamically typed languages have become extremely popular. So languages like Python and JavaScript are used all over the place and a lot of software, including a lot of sort of important software is written in these languages. Now these languages are dynamically typed, which means that developers do not have to write on type annotations or types at all for variables and functions and so on. But instead these types are known at runtime only and usually unless someone annotates these types, they are not known statically, so they are not known by just looking at the source code. Now this lack of type annotations has a couple of negative consequences. One of them is obviously that you may have type errors that you may not detect while just looking at the code simply because there's no compiler or type checker that tells you about it, but you may only detect them later while your code is running, possibly while it's running in production. Another problem is that in the absence of type annotations, it may be pretty hard to understand some APIs. So assume you want to use some function that is provided somewhere, but you do not really know what kind of parameters to pass and what kind of return value to expect. Well, this is exactly the problem that you have if your APIs do not have type annotations. Finally, the lack of type annotations also makes it pretty hard for an IDE to provide good IDE support. So for example, if you want to have auto-completion or automatically jump to the definition of a function, all of this is much easier if the IDE knows some of the types in your program, but of course if you do not have any type annotations, it simply doesn't know. To make all of this more concrete, we will look at one specific example throughout this whole part of the lecture and this is the example that you can see here. So it's essentially two Python functions, findMatch and getColors. And as you can see and as it is the case in a lot of Python code, these functions do not have any type annotations. So specifically, there's no annotation, for example, that tells us what type of this parameter is. The same for the return type of this function. So we see that there are some return statements, but we do not really see immediately what the return type of this function is. And the same down here for the getColors function. Again, we do not have any type annotation for the return type and there are no parameters, so there's no need for an annotation here. One way of adding some types to a code base, for example, adding some types to the example that we've just seen, is Gradual typing. So the idea of Gradual typing is that you do not have to add all the types at once, which would be really hard to do if you have a lot of code written in a language like Python, but that you can add some types at some code locations basically as much as you want. For example, you could pick a few functions that you know are very important to your code and that are used widely by other code locations, and then you could annotate the parameter types and return types of these functions. And then what a Gradual type checker is doing is to essentially take all these type annotations that you've added and it will warn you about inconsistencies. Let's say you say the parameter type of one function should be an int, but somewhere you're calling it and you're obviously passing a string into this parameter, then it will tell you, well, this is an inconsistency, this is a type error, and something is wrong and you should fix it. But at the same time, it will not complain about missing information. So if there are some functions that are not annotated, the Gradual type checker will just ignore this, which makes Gradual typing a nice way of adding types to an existing code base. Now, while all of this is nice and good, annotating types is actually still pretty painful because it means you have to go through a lot of already written legacy code and carefully reason about what the types of all the variables and parameters and functions and so on might be, and then add these annotations. So doing this manually for a large code base is a lot of work and it's not really the kind of fun work that developers would like to do. Now, instead of adding type annotations manually, there are a couple of options to do it automatically. Option number one is to use static type inference. So what this basically means is that you have a conservative static analysis that reasons about your code and reasons about values in your code that you created at some point and then propagate this information such that at the end it may be able to tell for some code locations what types, for example, a return type of a function is or a parameter type of a function should be. These static type inference approaches typically guarantee type correctness, so they only tell you that this program element has that type if they know for sure. The downside of this is that it's pretty limited in terms of how many types they can predict. So we have run some experiments and have found that many types can actually not be predicted using static type inference. So it's a good approach that gives you some types, but it also misses a lot of other types. The second option is dynamic type inference. So if you're running your code, you can see the types, right? You see what kinds of arguments are given to a function. You can see what kind of return values a function produces. And because this is at runtime, you know for sure that this is a possible type that can be passed as an argument or can be returned by a function. The downside is that doing so, so actually running the code requires you to have some inputs that run this code. You may have a test suite, you may be able to run your code in production, but you somehow rely on a way to get these inputs. The other problem is that it may miss some types because of course not all code will be covered and even if a piece of code is covered, it will be covered with one kind of type. So for example, you may see a lot of ins given to a function, but maybe this function also is able to receive strings, but if you never see it, you will just miss this type. So it's a good approach, but it has some limitations. So this is also not the full solution to our problem. Option number three, which of course is also not perfect, but that's the focus of what we are talking about here, is to have some kind of probabilistic type prediction, for example based on a neural network that learns from existing type annotations to add the missing type annotations. And this third option is what we look at here and we will look at how to use a hierarchical neural network for this purpose. So the approach that I want to describe here is called typewriter, because it's basically writing the types into your code. And this typewriter approach consists of two big components. So the overall input is a program, and actually for training the model, we need more than one program. We need a lot of programs, but once you've trained it, you just pass one program and then the output is the same program, but now with type annotations added. And this happens in two steps. So the first step is a probabilistic model that predicts the types that are missing, based on a lightweight static analysis that extracts some information from the code. We'll see what kind of information this is, which is then used in a neural network to do the actual type prediction. What this gives at the end is a type vector for every piece of code where a type is missing. So for example, for a return type of a function that is not yet annotated, it will give you a vector that you can interpret as a probability distribution over the possible types that this return type may be. Now, given these type vectors for different parts of the code, the second step is to combine the different predictions for different elements in the code using a search approach that searches for a consistent way of adding types to this program. Consistency essentially means that the program at the end is type correct and type checks according to our Gradle type checker. And how exactly this works will also see in a few minutes. Let's start by having a look at this first part of the approach and specifically let's start by having a look at this lightweight static analysis that is extracting some kinds of information from the code that we will then use for the neural type prediction. So the information that we extract here is of two kinds. On the one hand, we have natural language information and this is something that a lot of program analyses are actually not doing but which turns out to be pretty helpful. So specifically what this analysis extracts are the names of functions and arguments and also comments that are associated with functions. And the intuition is that both these names and also the comments often tell you something about the types that the function parameters or the function return value will have. So getting this natural language information is pretty valuable. On the other hand, we are also extracting programming language information basically what a more traditional program analysis would also do and specifically we extract two kinds of information here. One is occurrences of the code element that we want to type. For example, if you want to type a parameter, we will extract occurrences of this parameter in the function body and we also extract information about which types are imported by the current file because that tells us something about the types that the programmer is actually using in this file in which we may want to use as the types to add as annotations. So let's look back at the running example that we have already introduced earlier. So now we have the same piece of code here. The only thing I've added is this comment which actually does exist in a lot of real functions and which we will here use as part of the input to make a type prediction. So one thing that one kind of information that we're extracting, as I said, is the natural language information. And for this specific example, we will extract a couple of pieces of such information. One is the names of this function here and also the name of this parameter. The same for the other function where we also extract the name and then we also extract this entire comment here. So basically all of these words that are here in this doc string or this Python function level comment will be extracted as natural language information. In addition to the natural language information, we extract information about the program itself. And specifically here we look at the occurrences of the parameters and at the return statements in the code. So the first function has this parameter color. So we look at occurrences of color which occurs here and here. And then instead of just extracting this one token, which wouldn't be very useful, we look at the window of tokens around this occurrence of the parameter. So we go back a couple of tokens, let's say until here, and then also go forward a couple of tokens until here and extract this window of tokens as a short sequence of tokens that tells us something about how this parameter is used. And we do this for every occurrence, so not just for this one, but also for the other one. For return statements what the approach is doing is to extract the entire return statement. So basically we would extract this and also this and then for the other function this whole long statement with this array expression here. And each of these are represented as a sequence of tokens which we will then use in our hierarchical neural network. So after extracting all this information, it is fed into a neural model and this happens to be a hierarchical neural network. And what we'll do with this neural network is to predict types. So this is for type prediction. So as I said we have different parts of the input. We have the code tokens that we extract from all the occurrences of a parameter and also from the return statements. We have the identifiers that are extracted for every function and every parameters. We have the comments that are associated with a function. And then we also have the available types which we know from the imports that exist in the file. Each of these pieces of information are extracted for every code element that we want to predict a type for. So for example for every parameter that does not yet have type we're extracting all these four parts of our input and then try to make the best possible prediction based on all of this information. Now to handle these different kinds of input we are using different sub-models because this is a hierarchical neural network. And specifically what we will have here are recurrent neural networks that reason about the sequence of code tokens. We will also have a recurrent neural network that reasons about the different identifiers that are associated with a specific program element because this is not just one identifier but it may be multiple. For example for a parameter we have the name of the parameter but also the name of the function and also possibly the names of other parameters that also go into the same function. Then for comments we will have yet another recurrent neural network that reasons about the sequence of words in this comment. And for the available types we actually do not really need any specific other model because there are not so many types that may be available and therefore a short vector representation is actually sufficient. Now in order to feed the code tokens and the identifier names and the words in a comment into recurrent neural networks we need to somehow represent them as vectors. And for this we will use embeddings. What exactly these embeddings are goes beyond the scope of this particular module of the course but there will be a different module that just talks about embeddings. For now all you have to know is that we have some function that takes a code token or takes an identifier name or maybe takes a word in a comment and maps this word or token to one short vector that represents this token or word in a way that somehow preserves the semantics of the word or the token. So to do this we have one kind of embedding that works for code and for identifier names and this is what we'll use for the code tokens and also for the identifiers and then we have a different embedding that works well for natural language words and this is what we'll use for the comments. So essentially what happens here is that each of these individual tokens and words is represented as one vector and then the sequence of vectors is given to the corresponding RNNs. So this here is basically a sequence of vectors and the same also here and also here. Now what each of these RNNs does is to summarize this given sequence into a vector. So we will have one summary vector for the code tokens another summary vector for the identifiers and yet another summary vector for the comment and all of these basically compress the information that is given into a relatively short vector. What's important is that each of these RNNs is independent of the others. So they will be jointly trained but apart from that they each can do well their own reasoning about the given sequence of tokens or words so that the way we reason about the words in a comment is of course different from the way we reason about tokens in code. Now given these four vectors they are concatenated into one large vector that essentially represents all the information that we have about the code element that we want to type. And then this vector is given to a feed forward neural network which may have a couple of layers and then in the end predicts basically a vector that we will talk about in more detail in a second which is called the type vector and essentially what this type vector contains is a probability distribution over the possible types. And this here is just a feed forward neural network plus a softmax function in order to turn this type vector in something we can interpret as a probability distribution. Let's now look a little more into this type vector which is the output of our hierarchical neural network. So what we do here is to formulate type prediction as a classification problem where we basically say the possible classes to pick from are the different types that exist in our code base. Specifically we do not look at all possible types because in a large code base there may be many, many such types but we just look at the top 1000 types which turns out to cover more than 90% of all the occurrences of types in the large code bases that we look at here. So this output then is a vector of 1000 numbers and each of these numbers basically tells us how likely it is that the type we want to predict is this specific type. Now what we do during training is to set all of these values to zero except for the one type that we want to have as the type that is going to be predicted. So we are basically saying all of the types have zero probability but there is one type that should have probability one and this is the one you should predict. Now the network is trying to learn this and getting as close as possible to this perfect prediction. Of course it's not perfect so what we will then do during the prediction is to take the type vector that is actually predicted by the network and we interpret it as a probability distribution over the possible types. So there will be some type that maybe has a 60% probability and another one that has a 20% probability and a few other types with even smaller probability so that all of this sums up to one because we have used the softmax function to make sure that this actually looks like a probability distribution and what this essentially gives us is a prediction of how likely the different types are which we can then interpret as a ranked list of possible types with the type that has the highest probability at the top of this ranked list followed by the type of the second highest probability and so on. So now you know how to build a model that can predict types let's now have a look at how to actually train such a model. So as usual for training we need some kind of training data and the training data that we are using here are the existing type annotations that already exist in the code base that we want to analyze. So fortunately some people have already annotated types either manually or using some other existing approach for example the static type inference or the dynamic type inference that I've talked about earlier. In practice this typewriter approach has been applied to a multi-million line code base so really a lot of code in which some types depending on what kind of types we talk about so it's between 20 and 50% are already annotated. So there's already a lot of data to learn from but at the same time like 50 to 70% of the types, 50 to 80% of the types are still missing so there's still a lot to do for the type prediction model. And then what the model basically learns is to predict the missing types from the existing type annotations because we use the existing type annotations as training data and then once the model is trained applied to all the code locations that are still missing types in order to hopefully fill in many of these missing types. So to make this more concrete let's get back to our example. So here is again the code that you've already seen earlier and now once we have trained our neural model in order to make type predictions we will get for every missing type a list of likely types that the model things should be annotated at this code location. So specifically what we'll get here for the parameter called color is a prediction that says well I think this is an int and if it's not an int then maybe it's a string and my third option would be a bool. Similarly for the return type of this findMatch function we get also three actually more likely types the first being a string, the second being an optional of string which basically means it could be a string or none and then the third one is none. And similarly for the return type of getColors we also get some predictions for example the model is guessing or predicting that this could be a list of strings a list of any's where any just means any possible value that you have in Python or just a string. Now if you carefully look at this code you see that some of these predictions are wrong some of them are right and one challenge is how to actually find out which of these predictions to use. So talking about challenges let's think more about what we can now use this neural network for and what kind of challenges we still have to overcome in order to make it really useful. So one challenge is what I've already mentioned it's this imprecision. So some of the predictions that we get are just wrong because any neural network will be imperfect for any kind of interesting prediction task and in practice this here means that someone for example the developer must decide which of the type suggestions that the model makes to follow. Now this could be done but we'll see a better way of handling this problem. Another challenge is the combinatorial explosion that results from this imprecision. So for every missing type we will have one or more suggestions by the model. Now because there's not just one type missing we have different lists of suggestions for every location where a type is missing and in practice you can combine all of these different type predictions so maybe you want to take the first predicted type here and the second predicted type there and maybe the third predicted type here but there are a lot of different combinations and actually there's so many that it's practically impossible to really reason about all of them so we need some more clever way of finding the right predictions out of these ranked list of predictions that we get from the neural model. So these challenges that we just discussed lead me to the second part of typewriter that we look at now. So essentially what we get as the output of the first part as the output of the hierarchical neural network is this type vector that gives us a ranked list of likely types for every code location and now we need to find a set of types that we actually want to use so that at the end we can decide which types we want to add or at least suggest the developer to add to the program and this is done in the form of a search where we're looking for consistent types so basically types that if we add all of them together still lead to a type correct program so the gradual type checker does not report any type errors and in order to do this we combine two things one is we use an existing static type checker which is this gradual type checker that I've already mentioned and the other component is this feedback directed search which is a new technique that we are proposing here in this typewriter work. So let's have a more detailed look at this search for consistent types. So what we get from the neural type predictor is a list of top K predictions for each missing types and what we'll now do is we will use an existing radial type checker essentially as a filter to find the predictions in this top K list that are actually correct in the sense that these predictions if we add them as type annotations will lead to a type correct program. In practice there are many of these gradual type checkers that we could use for Python for example there's Pyre and also Mypy or for JavaScript there's Flow in the concrete implementation that we talk about here we've used Pyre but in principle any of these gradual type checkers could be used. Now if you think about this problem of picking the right type from this ranked list or these ranked lists of different types then this is actually a combinatorial search problem. So we have some number of places where types are missing and we call those places type slots and for each of these type slots we have some number of predictions which is K if you look at the top K list of types. Now if you think about how many possible type assignments you get then this is K plus 1 to the power of size of S the plus 1 is because we can also just not add any type so this is always an option on top of the top K predictions we get from the model and if you think about how many possible combinations these are then it's basically too much to really explore exhaustively so we need some more clever way of searching the space of possible types to add. To navigate the space of possible type annotations typewriter is using a feedback function so there is some function that basically tells us for a potential assignment of types to type slots how good this assignment of types is. The goal of this feedback function is to minimize the number of missing types so we want to add as many types as possible without introducing any type errors because a type error is obviously something that a developer does not want to have. Now in order to put this into a single value that can be returned by our function we return these two numbers here we combine these two numbers namely the number of missing types and the number of type errors into a weighted sum and specifically the weights that we are using here is a weight of 1 for the missing types and the weight of 2 for the type errors so we give a higher weight for type errors which basically means if you add a type and this is wrong and it leads to a type error then this will not be as good as not adding the type. Now given such a feedback function a search can basically go through the different type assignments or the different ways of putting types to the missing type annotations and the different strategies in which the search could work so we have explored a couple of different strategies that are summarized here so one dimension to decide how the strategy should work is whether the strategy should be optimistic or pessimistic. Optimistic essentially means that we are adding the top most predicted types everywhere in the code optimistically assuming that those are correct and then if you see that some type errors have been introduced that way we start removing types again. The pessimistic approach in contrast is assuming that probably most of our types are not that good so we should add one type at a time and be very careful about not introducing type errors so if we add one at a time this is very easy to do because if we add a type and it's wrong we just step back and remove it again. The other dimension for the strategy is whether we want to be greedy or non-greedy while adding these types Greedy and non-greedy here is with respect to the score that we get from this feedback function. So the greedy approach means whenever the score decreases so when things are getting better we keep the type and we just keep adding and modifying other types but this type we know that it's good so we keep it. The non-greedy approach also sometimes backtracks basically it means even if it has added a type which has decreased the score it will sometimes remove this type again and the reason is that otherwise you can get stuck in local minima which basically means you have added a type which looks correct at first but then it prevents you from adding other types which would actually be the correct types and in order to not get stuck in such a local minima the non-greedy search is also backtracking sometimes. So let's get back to our example and illustrate this whole process a little bit. So what you see here is the ranked list of predictions in the neural model and as we've already seen earlier not all of these predictions are correct and it's not always the first prediction that is actually correct. Now let's assume we do have an optimistic strategy that basically adds the first predicted type to each of these type locations. So we will say this color parameter is an int the return type of findMatch is a string and the return type of getColors is a list of strings. Now given this set of types we will give this code to the gradial type checker which we'll then see if there are any inconsistencies and in this case it would basically see that yes there is an inconsistency because if getColors returns a list of strings then this means that this candidate here is a string because this is what we get from getColors and then we're iterating through these candidates but at the same time if we say that color is an int it means that here we're comparing an int and a string and this is something that is not correct so this will give us a type error and the gradial type checker basically tells us that this way of picking the types is just not correct. So now let's assume that instead our search now takes the second option for the parameter called color so it says it's a string but it sticks to the other first option here and what we now see is that this type error goes away because this now is a comparison of two strings which is fine but there's one other warning that we'll get or one other type error that we'll get from the type checker and this is about this return statement because if we annotate findMatch to return a string then we cannot at the same time return none because none is not a string. What we instead actually need to do so instead of the string here we need to annotate the return type as optional of string which means it could be a string or it could be none and once our search has found this assignment of types we will see that we have added all type annotations wonderful so we've minimized the number of missing types to zero while at the same time having not introduced any new type errors which means we are done and our score goes down to zero and we have found a perfect solution. Finally let's have a look at some of the results of this model in practice so if you just look at the neural model alone so just a prediction but not the search then we can measure two things one is precision which basically means whenever the model makes a prediction of a type how likely is it that this type prediction is actually correct if we only look at the top one prediction this precision ranges between 58 and 73 percent if we look at the top five prediction then the precision goes up to 92 percent the other thing you can measure is recall which basically means how many of all types that we would like to add can actually be added by this model and here again we see a difference between top one and top five so if you just look at the top one prediction it's between 50 and 58 percent and if you look at the top five prediction then 69 to 72 percent of all types can be successfully predicted now this is only the neural model on its own if we combine this neural model with a search guided by the by the radial type checker then the results get even better so what we see here is that 72 percent of all types that we would like to add can actually be added in a way that makes the program type correct if we look at the file level then we find that for 44 percent of the files the approach can add all types in this file basically completely annotating the specific file as a result this typewriter tool is now used at Facebook where it's basically deployed as part of the development infrastructure and now makes suggestions to developers saying that hey maybe you want to add this type and by now a couple of thousands of these suggested types have already been accepted by the developers with just minimal changes and have become part of this code base which at the end makes products like Instagram and WhatsApp and also Facebook itself more reliable because it's adding these annotations to the Python code of these projects alright and this is everything I have to say in this second part of this module on hierarchical neural networks so what you've seen now is one application of these hierarchical neural networks for type prediction and what you've also seen is that just using a neural model alone may not be sufficient because you can actually use some other techniques in this case, gradual type checking to get better results that at the end are more useful to developers so sometimes it's worth looking beyond just the neural models but also combining it with more traditional program analysis techniques thank you very much for listening and see you next time