 Hi and welcome to Program Analysis. In this lecture, we will look into call graphs, which are one representation of a program that tells you what calls may happen when you execute a program. We will look into static call graph analysis, so techniques for statically reason about the calls that may happen when you execute a program. And most of the examples and analyses we look here are designed for Java, but they are also generalized to other object-oriented languages. Let's start by looking at what call graph analysis and call graphs actually are. So a call graph is an abstraction of the program that considers all the method calls that may happen when you execute this program. In such a graph, you have nodes that represent methods, and you have edges that represent calls or possible calls between these edges. All of this is flow insensitive, which as we've seen last time essentially means that the analysis does not consider the execution order. So it doesn't really tell you in which order these calls happen. It just tells you that there may be a call from this one method to this other method. All the approaches we talk about here will be static call graph analyses. That means these approaches abstract what calls may happen at runtime without really executing the program. You can also have dynamic call graph analysis, which we do not cover in this course, but it would essentially execute a program and then look at what calls really happen at runtime, which typically gives you an under-approximation of the calls, whereas a static call graph analysis gives you an over-approximation of the calls that may actually happen in the program. Let's start by looking at an example. And that example is an implementation of the observer pattern, where you have some object that may change at some point, and then some other object that is observing the first object in order to get notified whenever such a change happens. So in our example here, we have one class called subject, which is an extension of the observable class. So that's the kind of object that you can observe. And then the main class itself implements the observer interface. So this is the class that will then get notified whenever the subject is changing. And then here in the main method of this example, we are first of all instantiating the main class itself, then create a subject, and then add the main object to the subject as an observer in this ad observer call down here. And then whenever the subject gets changed, for example, because we just call this modify method, what will happen is that the subject is notifying its observers, and this will lead to this update method being called, which then knows that the subject has been changed. So let's look at the call graph that we would get for this simple example. As I said, every method in the program is represented by a node, and the edges represent the calls relationships between these nodes. So for the example, we would have one method for our main method of the main class. So this would be one of the nodes here. Then we would have another node for the constructors of the two classes that we have here. So there would be one for main init, and we are denoting constructors with this init notation similar to what happens in Java. And another one for the subject, where we also have a constructor. And now looking at what the main method is actually doing, we see that it's invoking the constructors of the main class and of the subject class, and therefore we will have edges here that go from the corresponding call node to the call node. Then we also have some more calls in the main method. For example, there's a call to subject.addObserver, and again we will have a corresponding edge for that. And finally, there's a call to subject.modify, and again there will be a corresponding edge. Note that the order of these calls is not specified, so I'm drawing them in some order which happens to be the one that you see in the source code, but this is just coincidental. It's just a graph and the edges that are outgoing from a specific node are not ordered. Now let's look at the modify method itself, and as you can see there are two more calls here. One of them goes to observable.setChanged. So there will be another node for that method here, which is called from subject.modify, and then when the subject has been modified it will notify its observers. So there's another call to observable.notifyObserver, and we're having another edge for that. Note that when the observers are notified because of the implementation of the observable, what will eventually happen is that main.update is called. This piece of code is not shown here because it's in the observable class, but in the complete call graph there will also be a call to main.update that comes from the observable.notifyObservers method possibly through some other methods, so I'm just putting a dashed line here to show this. There's also another call in the update method itself to system.printLine, which I'm also just omitting here for simplicity. So far so good. So this was the easy case. Now the interesting cases occur when the program has polymorphic calls. What is a polymorphic call? Well, a polymorphic call is essentially a call that may go to different methods and you just don't know which method is called based on the statically declared type of the base object. So here's an example of such a polymorphic call where in the main method we are creating some collection using this makeCollection method, which is defined down here. And what this makeCollection method is doing is that it's sometimes creating an array list and sometimes creating a hash set. And then no matter what this collection really is, in the main method we are calling the add method of this collection. How will the call graph for this program look like? So again, we will have a note for the main method. And because it's calling the makeCollection method, which happens to be implemented in the same class, there will be an edge for that. So far so good. So now the interesting call is the one that goes to c.add. And here the problem is that by just looking at the code we do not really know what method exactly gets called here. If you just look at the main method itself, you only know that c is a collection and that means in principle every subclass of a connection that has an add method could be the method that gets called here. So there may be an array list.add, call, there may be a link list.add, call, there may be a hash set.add, call, and many others. Now if you have a slightly more clever static analysis, it could maybe also find out that in this program c can only be an array list or a hash set. So we have at least these two nodes here, one for array list.add and one for hash set.add. But because we statically do not know which of these two gets called, we will have two edges that go to these two methods because we just don't know which one will be called. And again if the static analysis is a little bit more naive, it would also have edges to many more methods, not just these two subclasses of collection. So to better understand the problem of building a good call graph, let's look at a slightly larger example which does not come with concrete source code but just remains an abstract example. So in general in the program you will have many methods and each of them is represented as a node in this graph here. Some of these nodes are so-called entry points, which are basically nodes that you assume can be called from somewhere. For example if you have a library, these may be all the public API methods that the library is exposing. And then in the program itself there are lots of core relationships that we have depicted here and what you see in this version of the graph here is that there are a lot of them. And in many cases methods have more than one outgoing node edge, sorry. So there may be different reasons for that. One may be that the method is just calling different methods, so it may have different call sites and it calls one method in the first call site, another method in the second and then you have two outgoing edges. But it could also be that some of these call sites are polymorphic or at least potentially polymorphic based on what the static analysis knows. And in our example here we have a lot of those, so for example this one is a potentially polymorphic call site simply because it has two outgoing edges here. And then we see many more of those, so this one also has two outgoing edges which could be because it's a polymorphic call site. Same here, same here, same here, same here, same here. Here you have even three outgoing edges and then again two here and two here. Now let's assume that this is the most conservative call graph that our static analysis could produce. Then what you actually want to do is that you want the static analysis to rule out some of these potential calls in order to get a smaller call graph that still captures all the calls that may happen at runtime but does not include too many of those that may actually never happen. And to do this one thing a static analysis can do is to distinguish between methods that are reachable and methods that are not reachable for example because they are certainly not called anywhere in the program. So for our example let's assume that we have some reachable methods for which we know that they can be reached and I'll just use this green color to denote those. And then let's assume we also have some unreachable methods where we statically know that they can actually not be reached. So for this example let's for example say that this one is a reachable method and that means that this one is also reachable method and this one as well. And then some others that are called here are also reachable, let's say also this one and this one and also this and this and this. But then there are also some that are not reachable. For example this may be the case for some of these entry points. So let's say you're analyzing a library but you know for sure that none of the library clients is ever calling this one public API method then this is unreachable and therefore you can discard it from the call graph. And then if one of these methods is unreachable there are also some calls that may never happen and therefore you will find out that some of the edges may also be unreachable. And let me just make the arrows red to denote that fact. So these red edges will recall edges that actually may be eliminated. So for our example here let's for example assume that this edge is one of these edges that may be eliminated and the reason why the analysis may know about this is for example that it knows that the type that is called here can actually never be used in this program. And if we know that some edge can be eliminated it also means that we do not need the method node anymore because then effectively if this method is not called anywhere else this method will also be unreachable. So for example let's say this also happens here and at a couple of other places then we can step by step remove nodes and edges from this call graph. If we have like in this example a node that can be reached by some other method but maybe not by the one above here then it still means that we need to keep this node because there is an edge that can reach this method and therefore it's still a reachable method. So for the example let's assume that we know about a few more edges that are unreachable let's say these and also these and these maybe also this one here. Then based on that also that one based on that we can remove some more methods because they become unreachable and in this example that would be this one also this one and then because that is unreachable we can also get rid of this one and also of that one. And now once the analysis has found out that some of these methods are unreachable and some of the call edges that we had in the conservative call graph are actually not needed it can prune the graph by essentially removing everything that is unreachable which means in our example that we will get rid of all of this and also we can remove that edge and that edge and those parts here and then also this edge which means that the call graph that we now have is much smaller and is much closer to what may actually happen at runtime. So in general what a static call graph analysis wants to do is to prune the call graph as much as possible in order to focus only on feasible behavior so things that may actually happen when you execute the program and in order to do this it typically wants to minimize three things one is the set of reachable methods so that some methods that certainly do not get called are just removed from the call graph entirely also it wants to minimize the call edges so even if a method is included and maybe called somewhere maybe it doesn't have to be called everywhere so we can remove the edges that we know may not be feasible at or are not feasible at runtime and it by doing this also tries to minimize the potentially polymorphic call sites by essentially figuring out that a call is monomorphic and we know that a particular method is called instead of considering all the methods that may be called all right so now you've seen what a call graph is and you know what the goal of aesthetic analysis that builds a call graph is namely to get a precise but compact call graph meaning that it has all the edges that may happen at runtime but ideally not too many in addition to that and what we'll do in the remaining videos of this lecture is to look at different algorithms for achieving these goals and we'll start with a relatively simple and efficient set of algorithms called cha and rta then we will look at slightly more complex algorithms that also consider assignments in the program and then finally we look at the most complex set of algorithms that we consider here in this lecture which will not only create call graphs but at the same time also perform a points to analysis which as we'll see in the fourth video of that lecture is actually pretty useful to get precise call graphs all right and that's all I have in this first video thank you very much for listening and see you next time