 Hi and welcome back to program analysis. So we are in the lecture on call graph analysis and this is part three of this lecture in which we'll look into another set of two algorithms to construct call graphs in aesthetic analysis, namely VTA and DTA. You'll see in a second what this actually means. The key idea of these two algorithms is that in contrast to what we've seen in the previous lecture, they do reason about assignments in the and by doing this can rule out some calls that according to the analysis that we've seen in the previous video may happen, but actually may not happen if you reason about assignments. The first of the two algorithms that we want to look at here is called VTA or Variable Type Analysis. So as I said the main idea of this and also the next algorithm is to reason about assignments and here in case of VTA the idea is that it does that in order to infer what types the objects involved in the call may actually have. So for example, if you know the type of a base object of a call, then you can rule out some other call targets that are not possible because it's simply not one of the types that you know the base object to have. And by knowing this type of information what the Variable Type Analysis algorithm can do is to prune calls and also notes from the call graph that are infeasible at runtime. So before explaining exactly how the algorithm works, let's have a look at a simple example. So in this example there is some class X that is instantiated, then the variable that stores this instance of X is assigned to a different variable and this is then passed as an argument into a call to F. And not knowing anything else we look at these two classes here A and B that implement this method F. So we know that the call will end up in one of these two methods. And now looking at the assignments, which is what this algorithm is basically about, we see that this newly created object is assigned to A and then we see that A is assigned to B and because of this call we also see that this B is assigned to this parameter C here and it may also be assigned to the other parameter C here and this other implementation of the method F. So in a nutshell what the algorithm will do for this example is to create a graph where every variable in the program is represented by a node. So we will have one node for this variable A, another node for the variable B and then another node for the parameter C of F in class A. So we will just write this as A.F.C and then another one for the other implementation of F where we also have a parameter called C and that's B.F.C. And then what the algorithm is doing is to look at the types that are propagated through assignments. So we know that initially we have this type X here because we know that this variable will have this type and then looking at the assignments these types are propagated so that for this simple example at the very end what we'll have is a graph where we know that B may also have type X, A.F.C may also have type X and B.F.C may also have type X. So this is the idea in a nutshell and now we'll see how this works in general and also we'll look at slightly more complex examples. So in general this algorithm consists of four steps which are needed to propagate the types through the program. The first one is that we need some initial conservative call graph because we want to propagate the information from the call site to potential parameters where the arguments may go to. We somehow need to know where calls may end up so we already need a call graph to start with and this call graph needs to be conservative in the sense that it is guaranteed to contain all the edges that may happen at runtime because otherwise the result of this VTA algorithm will also not be conservative. For example we could use the two algorithms that we've seen in the last video class hierarchy analysis or rapid type analysis in order to construct this initial call graph. Once we have this initial call graph we can build the initial type propagation graph which is the first step that I've shown you for the simple example on the previous slide and then as an optimization what the algorithm will do is to collapse strongly connected components. So this is just an optimization it doesn't really need to be done but it helps to make the remaining propagation a little faster and then once this is done the final step is to propagate the types in a single iteration through this graph and we'll see how this works in a second. So to build this initial type propagation graph what the algorithm is doing is to look at all the assignments that may happen in the program. So for example if you have a statement a equal b so an assignment of b to a which happens to be in a method c dot m then what we'll get is two nodes that represent these two variables c m a and c m b and an edge in between that corresponds to this data flow that the assignment in uses. As another example if you have a statement like this where b is assigned to the field f of some variable a and let's assume that f is actually a field of a class capital a then what we'll get is a representation of this field as capital a dot f so this represents all the fields f of all instances of this class a and we'll say that the variable b which let's assume is again in c dot m gets assigned to this field by having this edge here in the graph. So as an example let's assume we have this piece of code here where we have three classes a b and c which are in a sub type relationship so that b is a sub type of a and c is a sub type of b and then we have a couple of variables a1 a2 and so on of these types. We instantiate all of them into different objects that we assign to the variables a1 a2 and so on and then down here we have a list of assignments where we are basically assigning some objects to some of the other variables and note in particular that here we are assigning a3 to an instance of sorry to a variable declared to be of type b and therefore we need to cast it in order to make this whole code type correct. Let's now look at how the graph the type propagation graph for this example would look like so we will have one node for every variable in this piece of code so one for a one another one for a2 yet another one for a3 the same for all the b variables b1 b2 and b3 and then finally one for c and now what the algorithm does is to look at all the assignments in the program where one of these variables is assigned to another one and for example because we have this one assignment here where a2 is assigned to a1 meaning that a2 is flowing to a1 we will have the corresponding edge in our graph. Similarly for example for the second assignment here the one where a1 is assigned to a3 we will have a corresponding edge going from a1 to a3 and this is then done for all the assignments that we have here so we also see one for a3 getting the value of b3 then we have the inverse with this assignment where we have the cast we also see that b2 is assigned to b1 and then eventually c is also assigned to b1. Having this graph the algorithm is now looking at the call sites of constructors in order to find out what types these variables may have and initially it just assigns the instantiated type of every constructor call to the corresponding variable so based on these constructor calls that we have up here what we'll do in this example is that we say that a2 has type a and the same for a1 so these are always sets of types but initially there's exactly one type in each set namely the one that gets instantiated and then the same here for b2 and b1 where we know that their type can only be b at this point and the same here for variable c which must have type c or at least initially will have this type. Next the algorithm is going to collapse strongly connected components so in this example we have exactly one such strongly connected component so it's basically a subgraph where every node is connected to every other node through an edge and this is this subgraph here which means that we can actually consider this these two nodes this strongly connected component as just a single node and we do not have to propagate types more in this strongly connected components because anyway we know that whatever type one of them has all of them will have it and now the final step of the algorithm is to propagate the types along the edges of our graph so we will start by propagating along this edge which means that we take the type that is at the source of this edge which in this case is this type a and add it to the types that are at the target of this edge which already contains a which means for this particular edge we do not have to do anything. These are more interesting for this edge up here because that means we will propagate this type a here to the to this node that is this large node represented by the strongly connected component and this basically means we now know that each of these two variables a3 and b3 can have also type a and then let's do the same down here for the remaining two edges so looking at this edge we can now add type c to this type that b1 may have so it's not only b anymore but it can be oc and propagating this edge doesn't change anything because we already know that b1 may have type b and now knowing the types that these variables may potentially have will help the algorithm to construct a more precise call graph in this small example we do not actually have any interesting calls where this could help but just to complete this example let's assume that we do have an additional call down here which is calling let's say the method m of let's say b1 so let's say we have something like b1.m here and let's assume that this method m is implemented by a b and c then a very naive approach would be to say well we could call a.m or b.m or c.m here but in with this algorithm here we know that b1 can only have type b or c which means that it will never call a.m but only b.m or c.m as a little side note one interesting question in this kind of algorithm is how to actually represent fields so if you have a field of an object and you want to reason about the types that this field may have then there are different ways to represent this field so one option is to represent the field as the field of this specific object so if you have an object stored in variable a and and the field f of that field is accessed then you would just represent this as a.f this is called field sensitive and this is the most precise way of doing this but also the most expensive way because then you have to reason about all the different variables and their fields separately another option is what is called field insensitive in this case you would represent all fields of a variable a as the same field so you would basically collapse all the fields that a class may have into just one artificial field which is more manageable in the analysis but of course less precise and this is called field insensitive and then the fourth approach is kind of in between these two and this is called field based so here the idea is that you're not collapsing all the fields of a class but you're collapsing all the instances of a class with respect to a particular field and you would represent a field a.f as capital a.f where a is the class of your variable a so that basically all the instances of that class have or are thought to have the same field f even though of course in practice this is not the case but it makes the analysis more scalable because it doesn't have to distinguish between the different fields f of all the different instances of your class a now you may wonder what of these which of these approaches does the variable type analysis algorithm actually use the answer is that vta is field based so it will collapse all the different instances of a class a with respect to a particular field f so that it doesn't have to distinguish between all the different variables and the fields that they have but can scale more easily to larger programs so let's summarize this algorithm variable type analysis or vta so it has a couple of advantages in particular it's more precise than the two previous algorithms that we have seen in particular it's also more precise than rapid type analysis rta because it does only consider those types of a variable or field that may actually reach the particular call site because it reasons about assignments instead of just looking at all the types that are used somewhere in the program at the same time it's still relatively fast because it only propagates information once through this graph which is still relatively fast and allows the algorithm to scale to relatively large programs on the downside the vta algorithm requires some initial call graph because it's actually a refinement algorithm that starts from the call graph which is required in order to know what concrete arguments are propagated to particular parameters and then it can refine this initially given call graph the other downside is that it still has some imprecision for example because it's a field based analysis which as we've seen on the previous slide is going to merge fields that are actually not the same at runtime to simplify the overhead of the analysis the second algorithm that we want to briefly discuss in this lecture is called declared type analysis and this can be seen as a variant or maybe the small brother of the variable type analysis that we've just seen it also reasons about assignments that happen in the program but it doesn't do this based on all the different variables and fields that we have in the program but based on the types that are assigned to each other so it's not per variable but actually per type which makes the whole algorithm more scalable but also less precise so let's have a look what this declared types analysis is doing for the example that we've just seen before so in that case we will again have a graph but now the nodes in our graph are not the variables and fields that we have in the program but they are the types that are present in this program so in this example we will have one node for type a another node for type b and then yet another node for type c and now looking at the assignments in the program we will add edges to this graph so for example because we have this assignment here where we assigning something of declared type b to something of declared type a we will have an edge that goes from b to a implicitly there are also of course edges between a's but these are not represented in this graph because anyway all instances of a or sorry all variables of a are represented by a single node then just one line below we also have an assignment from an a to b so we also have the reverse edge and then because in the last line of the program we have an assignment from something declared to be c to something declared to be b we also have this edge in our graph apart from the fact that this graph is not about variables but about types the rest of the algorithm is essentially the same so we will now add the initial types to each of the nodes will then merge strongly connected components in the graph and then propagate types in order to get a final solution so adding the initial types is pretty trivial because since each node represents a type the initial type is obviously exactly that type itself so for a it's a and so on then the algorithm looks for strongly connected components which here means that it'll find this one because these two nodes are connected to each other so it's a strongly connected component and as a result we can now actually also because we merge them anyway we also have to merge their types which means that we know that all these instances of variables declared to be a or b may have either the type a or the type b and then the final step is to propagate the types according to the graph that we have right now and in this case this means there's just one edge namely this one along which we need to propagate and what this means here is that we will propagate a type c into the set of types that this new big node may have which then tells us that every variable that is either declared to be a or b may have objects of type a b and c now let's compare what we get from this declared type analysis with what we have seen in the variable type analysis that we've just seen before so if we just go back to what we found there we see that for variable c we'll essentially find the same because according to vta c may have type c and this is also what we find from dta because here we see that c because it has declared type c may only have type c so for that variable everything stays the same but the second algorithm that we've seen is actually less precise for some of the other variables for example for variable b2 the vta algorithm has told us that it may only have type b right so this is what we found here but now if we look at what we know about b2 according to the dta algorithm then we see that because b2 has statically declared type b it's one of the variables that is represented by this big node here and as we can see it has according to this dta analysis three possible types abc which is of course less precise than just saying that it has only type b so dta is simpler and the graph is smaller and therefore everything will be a little faster but at the same time it's of course less precise so just to summarize the pros and cons of this algorithm it's definitely faster than vta because the graph is much smaller and as a result the whole propagation is is faster which allows this algorithm to scale to larger programs and at the same time it's more precise than the rapid type analysis algorithm that we've seen in the previous video on the downside dta is less precise than its bigger brother variable type analysis simply because it does not distinguish the different variables that have the same statically declared type and therefore merges all the information and because it always merges this information in a conservative way so that it doesn't miss any type that a variable may have it's as a result less precise than vta all right and that's all that I have in video number three of this lecture on call graph analysis so you have now seen already four different algorithms two of which do not reason about assignments and two of which seen in this lecture that do reason about assignments and what we'll do in the next and final lecture is to have a look at an even more sophisticated algorithm for computing call graphs that also looks at the objects that a variable may point to thank you very much for listening and see you next time