 Hi and welcome to program analysis. In this lecture of the course we will look into path profiling which in short is an efficient approach to tell which path through a program are Executed. We'll see why this is an interesting approach. What algorithm you can use to solve this question And then we'll also look into some applications of this idea of path profiling. This lecture consists of three videos And this one is the first of these three So here's an overview of the three videos that we have in this lecture So in the first one I'll give an introduction into the topic and will motivate why this idea of path profiling is Interesting at all and we'll also look into some of the challenges And we will see why a naive approach that you might come up with if you maybe think about this problem for a few minutes Does actually not work that well Then in the second video, which will probably take most time We will look into a very beautiful algorithm to address the path profiling problem And that's the so-called ball-larus algorithm and we will go through that algorithm step-by-step using an example and then finally in the third video We look at the generalization of the algorithm that we talk about in the second part of the lecture And we'll also look into some applications of this overall idea of path profiling to see where it can actually be Used as usual everything I'm saying here is based on some other literature In particular, there's this one paper on efficient path profiling that describes the algorithm We are covering here and that is at the core of this whole lecture But there are also some other papers that you may want to look at that basically generalize this initial idea or show one of the many applications of path profiling All right, so let's get started by defining what this problem of path profiling actually is So in short, what path profiling wants to do is to give in a program and some input that makes this program run Count how often every path through a function or more generally to a program is executed Why is this interesting? There are many many different applications I'm listing only three of them here and we look at another one in the third video of the lecture So one application of this idea of path profiling is so-called profile directed Compiler optimization where the compiler does not optimize the program ahead of time without having seen any Rounds of the program but actually looks at an execution profile that tells you something about how this program executes and then uses this to Optimize the program and specifically a path profile can for example be used to optimize those part of a program that are Most most relevant and that are that are executed most often Be your automated automated compiler optimizations You can of course do the same through manual performance tuning So if a developer wants to know which parts of a program are most worth to be optimized then one way to Answer that question is to run path profiling which tells you which path through a function or maybe an entire program I executed the most and then this is where you should probably spend your time on if you want to tune performance and Then beyond performance, of course, you can also use path profiling in testing for example To find out which paths are not yet tested So if you know which path I executed and how often they executed you also know which path through a function Or maybe an entire program are not yet tested at all or maybe not tested well enough and should Get a little bit more testing time Now if you hear this for the first time this idea of counting how often every path through a function or a program is executed This may sound like a simple problem But doing this is actually not so easy and the reason are three challenges that I'm listing here So one is that we would like to have an approach that imposes low runtime overhead Of course, if you have all resources of the world, it's relatively easy to come up with an approach But in practice you do not want to slow down the program too much when you do this path profiling And doing this whole analysis with low runtime overhead is actually a pretty interesting challenge the second challenge that That needs to be addressed here is a curiousy So what we want to have is a precise path profile that really tells us how often every path has been executed And we do not want to use any heuristics or approximations here Of course, you could take a different design decision and say it's okay to use heuristics It's okay to do approximations That's also a valid choice. But what we look at here aims at precise profiles that do not approximate the correct profile And then finally, of course, there may be infinitely many paths through program in particular because we have loops and also recursive calls which essentially mean that there are cycles in the control flow path graph and how to deal with these Potentially infinitely many paths is part of the challenge of addressing this path through finding problem To make all of this more concrete, we'll use a running example throughout this lecture, which looks as follows So we look at this example not by looking at source code, but by directly looking at the control flow graph And the the reason is simply that you've by now seen enough Examples where we went from source code to a control flow graph. So I'm assuming that You know how to do this. So the graph that we look at looks like this So we have these basic blocks or they could also be statements that I'll call abc def And then we have edges that tell us which statement or basic block may be executed After another so b may be executed after a But also c may be executed after a so there's some kind of conditional Then after b c may be executed Then there is d which may be executed both after b and c And then down here we have again this kind of situation where after d e and f may be executed and after e We may also get to f So assuming that a is our entry node and that f is our exit node You can now think about which different path through this Control flow graph exists and let's just draw a little table To do this. So I'll give each of them a number and then I'll list the different nodes that the path goes through and what we want to know at the end and that's the problem of Path profiling is what is the frequency of each of these paths? So given an input that may go through this Piece of code that is shown here in the control flow graph multiple times How often is each of these paths actually executed? So this is the big question that at the end we hope to answer through path profiling So let's now go through all the possible paths one by one And I'm starting with the path that goes from a to c to d to f. So basically Like this So that's one of them Then there's another one that goes from a to c to d but then also goes to e before going to f So this one will look like this And then of course we can also Take the turn From a to b at the beginning. So for example, there's one path that goes from a to b And then to c Then to d and then to f So this will look like This Then there's yet another one Number three that Also goes from a to b Then also to c then also to d but then also goes through e before ending in f So this will look like that and then yet another one that Takes a short path Because it goes from a to b directly to d and then to f So like this And finally there's one left that we haven't covered yet, which is a b d e and f so like This and now given these six different paths that exist um If we Execute this function or this piece of code that is represented here multiple times The question is how often is each of those executed and this is the question that we want to answer here So looking at this problem one idea that you might have is what Is called edge profiling So this approach basically starts by looking at each of the edges in the control flow graph and by counting How often each of these edges is executed? So in practice if you would implement this you would basically instrument every branching point In the code so that you then know whenever there's more than one One than more edge that you could take which of these is actually taken so that you at the end know how often Each of the control flow graph edges is indeed executed And then giving these individual edge counts one could compute or maybe estimate the most frequent path by basically following The most frequent edge all the time So we would start at our start node and then whenever there's more than one edge We would follow that edge that has um the higher frequency until we reach the exit node and then assume that this is the path That um is also the most frequent one So let's illustrate this idea of edge profiling with our running example again So this is the same control flow graph that we've seen before and now let's assume that we have computed How often each of the edges in this graph is taken and that we basically have Associated a frequency value with each of these edges So for the example, let's say that This edge has been executed 120 times this edge has been executed 150 times which basically means The code has reached a 270 times and then 120 out of those it has Continued with b and 150 out of those it has continued with c Then let's say that um 100 times we have seen this edge to be executed 20 times that edge 250 times this edge here 160 for this edge 110 here and 160 the e2 f edge Now using this edge profiling um idea We could now try to answer the question which of all these paths through the program is the most frequent one and as I said, um the idea here would be to basically look at All nodes where we have more than one outgoing edge and then always take the edge that has the higher frequency So we would start at a because that's the entry node We can choose whether you go to b or c. Um the edge from a to c has the higher frequency So we would go to c and see there's only one option which is to go to d In d we have two options again. We could go to e or to f Based on the edge profiling idea We assume that the edge with the higher frequency is taken Which would be the one that goes to e and then from e there's only one choice which Gives us this Supposedly most frequent path a c d e and f So from the way i'm explaining this approach you might already see that This may not be the correct way to actually do it So um if you're a bit skeptical you may ask well really Is this really the most frequent path and is this the only one possible correct solution here? and to Answer that question let's have a look at two different possible path profiles that may actually um have happened So in this table you can now see again the six path that we have already seen on the on the previous handwritten slide where we Yeah, just list all the six ways to go from a to f and now let's assume that we have executed our program where This control for graph is part of um two times and that we have obtained with some approach that we do not yet What that i haven't explained yet And let's assume that we have obtained the following path profiles on profile number one We know that this very first path that goes from a to c to d to f has been executed 90 times The second path has been executed 60 times The third one the a b c d f one has never been executed The fourth one has been executed 100 times and the last two 20 and zero times Now this is one profile that we may have gotten by exercising this program with some inputs And now let's say we have some other inputs and exercise the program again and then get some other profile called profile two and again I'm just giving Some numbers here that we may have obtained In this profiling that again just tell us how often each of these Paths have been executed So now if you look carefully at these numbers and basically do the math You will see that for each of these two profiles both profile one and profile two We would have or we could have gotten um the blue frequencies of Edge executions So both of these profiles are compatible with the edge Frequencies that that are annotated here on this graph But now if you look at um the most frequent path that we see in these profiles Then you see that here for profile one. It's actually this one a b c d e f Which is different from the one that we have seen up here Whereas for profile two the most frequent one is this one A c d f which again is also different from this supposedly most frequent path So the bottom line of this is basically that um this idea of edge profiling even though it sounds simple and maybe very efficient um Doesn't really tell us The most frequent path simply because the information that we are gathering is too local in a sense and it doesn't tell us enough about The entire execution of the program and how it goes through this control flow graph So i'm getting back to this general algorithm of edge profiling What we've basically seen in this example is that this idea of edge profiling Fails to uniquely identify the most frequent path and may actually give us a wrong answer So this is not the solution that we have been looking for All right, and this is already the end of the first uh video in this lecture on path profiling You now know what the problem of path profiling actually is and you've at least seen now how to not solve it So I hope you are now interested in seeing the second video where we will look at the ball arrows algorithm That will actually solve this problem of path profiling. Thank you very much for listening and see you next time