 Welcome back to program analysis. This is part two of this lecture on call graph analysis, where we now will look into two algorithms to construct call graphs, which are relatively simple and also relatively efficient, and will give us a call graph that is more precise than the most naive approach that one might implement, but not as precise as some of the algorithms that we'll see later in this lecture. So in general, there are many, many different ways how you can construct a call graph in a static analysis. What we'll do in this lecture is to focus on five algorithms that are relatively popular for doing this, and we'll basically go through them by increasing complexity. Starting with these relatively simple ones called class hierarchy analysis and rapid type analysis here in this video. Then in the next video, we will look at variable type analysis and declared type analysis. Then in video number four, we look at the Spark framework, which is a generalization of all the approaches that we've seen before, and that combines call graph analysis with points to analysis. Don't get confused by all these abbreviations. They are hopefully there to make your life a little easier, but I'll try to use the full name for these algorithms whenever possible. So let's start with the most simple of these algorithms, which is the class hierarchy analysis algorithm. What it does can be summarized very briefly. It basically looks at all polymorphic call sites where some method M is called, and then checks the statically declared type T of the base object of this call. And then for every subclass of this type T that implements the method M, it'll add an edge to T.M. So basically it considers all the possible subtypes that the base object may have, and all the methods that may be called because the object may be one of these, maybe an instance of one of these subtypes. So let's look back at the example that we had seen earlier in this lecture in the first video. So in that example, we had this method called make collection that creates a collection and then returns it. And then on this return value, the code is calling the add method. And so now in the initial look at this example, we had assumed that aesthetic analysis is clever enough to find out that actually this call to add can only go to array list dot add or hash set dot add. But a more naive algorithm like the class hierarchy analysis algorithm that we're talking about here doesn't really know this because all it does is to look at the statically declared type of the base object, which is C in this example. And this type is collection. So it will look at all the implementations of the collection interface. And each of these implementations that provides an add method is a possible target for the call of C dot add. So for example, what it will see is that there is a class called entry set which is an implementation of collection. And this class provides an add method and therefore it'll have an edge to that method as well. Or for example, it will also see that there is a linked list class which also offers an add method. And because all we know is the statically declared type of the C variable, it'll also add an edge to that add method. And because there are many, many more subtypes and subclasses of collection, actually there will be many, many more of these edges in this example. Note also that the add methods are not just those that are in Java util but potentially also other add methods that are provided in custom classes if these custom classes also implement the collection interface because if the static analysis sees these classes based on what it knows about this core site, it will also assume that these custom classes may be called here. So summarizing the pros and cons of this approach, on the pro side, we see that it's a very simple approach. So it can be explained in pretty much a minute or two. Another advantage is that it's correct in the sense that all the edges that may happen while the program is executing are actually in the call graph. So the call graph is not missing anything. And it has very few requirements. So all you need is a class hierarchy which is relatively easy to get for most languages and you do not need any other analysis information. So it's also pretty easy to actually implement this kind of analysis. But then there's some obvious disadvantages and the biggest one here is that it's very imprecise analysis because as we have already seen with the simple example, there will be many edges in the call graph that actually will never get executed because these are actually calls that are not possible but based on the simple analysis approach, the analysis just doesn't know and therefore in order to be correct will include these edges. To address this imprecision of class hierarchy analysis, to some extent, there is a variant of it that is actually slightly more clever and this is called rapid type analysis. So the idea of rapid type analysis is that it's basically like class hierarchy analysis but it only takes those types into account that the program actually instantiates at least once. So if you think about this collection example, if the program is not ever using a linked list, then we know for sure that this object will not be a linked list and therefore we certainly do not have to have this edge in our call graph. So this is essentially the idea that rapid type analysis implements. So let's look at the example again. What we see here is that in this entire program and let's assume this one piece of code that we see here is the entire program, in this entire program only array list and hash set are ever instantiated. That means that without really knowing what exactly the type of C is here, we know for sure that because array lists and hash set are the only two subtypes of collection that are ever used in this program, we know for sure that the only two ad methods that may be called here are array list add and hash set dot add, which is then why this will be the two methods that are actually included here as possible targets of this call in the main method. But now let's assume that we have a small variant of this program where in addition to this, to the code that you've already seen, we now also have this pretty useless call of new linked list here at the end of the method. So it's just calling the constructor and it's not even assigning the return value to anything. But because rapid type analysis will now see that linked list is also instantiated at least once in this program, it will consider this as a possible subtype of collection that may be called at this C dot ad statement. And therefore it will actually add another node and edge to our graph to reflect exactly this possible call by having this additional node linked list dot add here and another edge that goes from the main method to that new node. So as you can see, it helps to reduce the call graph quite a bit, but it can also be easily fooled by types that are instantiated but actually cannot be called at a particular call set. So summarizing the pros and cons of this rapid type analysis algorithm, we'll see the following. So first of all, it's still pretty fast. So if you reason about the complexity of this algorithm, you'll see that it's in the order of the size of the program because it basically has to go through the program once in order to figure out which classes are instantiated, but it doesn't have to do any more sophisticated analysis. It's also guaranteed to be correct. So whenever it removes a potential call edge, then this is certainly an edge that cannot really happen. And at the same time, it's much more precise than the very simple class hierarchy analysis algorithm that we've seen before because many unnecessary nodes and edges can actually get proved. But as we've also seen, it's far from perfect and the biggest disadvantage is that it doesn't really reason about assignments so that as an example with this new linked list constructor call that we've seen, it doesn't really know that this constructor call will never end up in the variable C on which the add method is called. All right, so now you've seen two simple algorithms to construct call graphs, both of which are pretty fast and both of which have obvious disadvantages. And in the next video, we'll see how to actually construct a call graph while reasoning about assignments so that some of the disadvantages that we've seen with these two algorithms here are removed. Thank you very much for listening and see you next time.