 Welcome back to the program analysis course and to this second part of the lecture on dataflow analysis So in the very first video of dataflow analysis We've seen an example of a dataflow analysis namely the available expression analysis And what I want to do in this video is to explain the basic principles Underlying this whole idea of dataflow analysis and for which you've already seen an example in the first video So here's just briefly again an outline of this whole lecture We have now already done this first video and now we are here in the second video Why I'm going to talk about these basic ideas that are valid for every dataflow analysis Including the available expression analysis that you've seen earlier in order to define a dataflow analysis There are six important properties and in a sense You just have to remember these six properties because everything else is derived from these six properties So a dataflow analysis is defined at first by a domain So the domain is basically telling us what kind of information the analysis is reasoning about and what is the analysis state That the analysis tries to propagate through the program Then every dataflow analysis has a direction which basically tells us whether it's a forward or backward analysis So whether it reasons about the code in the order in which Statements are actually executed or by reasoning about the code backwards, which for some analysis problems makes more sense than a forward reasoning Next we also have to define a transfer function for every dataflow analysis, which you've already seen in the first video through an example Basically defines what happens if the analysis reaches a particular statement How does the analysis that change when the statement is considered by the analysis? Then we'll have to define a meet operator Which basically tells us what happens if the flow of control merges or if you're reasoning backward When there is a branch, so it tells us what happens if there are two incoming two or more incoming statements That need to be somehow combined by the analysis to propagate the information Then you always have to define the boundary condition, which is about what happens When you do not know anything else So for example, this could be to assume you have an empty set of whatever you're reasoning about or you have the maximal Possible set of whatever you're reasoning about would see what exactly this means in a few seconds And then you also have to define The initial values telling us for example, what happens in case you enter a piece of code So what is the set of information that the analysis starts with? Let's now look into these six properties and some more detail and let's get started with the domain So what the dataflow analysis does is to compute some kind of information at every point in your program Or more specifically at the entry and exit of every statement in your program When we say information what this basically means is that we want to compute a set of things at each program point and this All these sets that we will compute at each program point Some subset of a larger set and this larger set is called the domain of the analysis So the domain of the analysis is basically contains all the possible elements that a set at a particular program point may have For the example analysis that we've already seen so for the available expressions analysis This domain of the analysis is the set of all non-trivial expressions So all the non-trivial expressions that occurs somewhere in the program are in the possible set of available expressions that we may see at the different points in the program and What the analysis then does is to compute the right subset for every program point The second property of a data flow analysis is about the direction and as I've already briefly mentioned This is basically about whether the analysis propagates information Along the control flow graph following the direction of the edges as we normally see them in the control flow graph And in this case the analysis is called a forward analysis because it just propagates information forward Along the normal flow of control In contrast an analysis can also be a backwards analysis Which essentially means that you take the control flow graph and then invert all actions so you just turn them around and That means that the analysis is reasoning about Executions in reverse. So it's you can think of it It's starting at the end of the program and then going backwards in order to compute whatever the analysis wants to compute We'll see examples of backward analyses In a later video, but for the example that you have already seen the available expressions Analysis, this was actually a forward analysis because we started at the entry node of our control flow graph and then follow the control flow edges in the normal direction until We reached the end of the Code that was to be analyzed Property number three of a data flow analysis is the transfer function So what this transfer function is doing is to define how a statement affects the information that is propagated by the analysis And the way this is written down is by basically writing down an equation that tells us what the state of the analysis is at the exit Of a statement given what we know at the beginning of this statement and maybe some other information So in general for a particular or for yeah for any kind of data flow analysis So DF here means data flow analysis the Transfer function tells us about the state at the exit of a statement s by defining some function of The f entry of s so basically the state of the analysis at the entry of the statement s Now this is the most general way to to put this transfer function We have already seen The example of the available expression analysis where we had defined The transfer function as follows so in this case we say a e for available expression and we say that the exit of a statement s What we know is what we have known at the beginning of the statement So a entry of s and then we remove everything that is defined by this helper function called kill and then afterwards add new elements to our resulting set by adding everything that is in Gen of s where Gen of s is this other helper function that tells us what a statement s is generating in Principled you do not have to define the transfer function based on a kill set and a generate set But in practice for most data flow analysis This is the most natural way to define the transfer function. So you'll very often see this kind of Definition where it's whatever you have at the entry minus whatever the statement kills plus whatever is generated by the statement Property number four for data flow analysis is the so-called meet operator So what the meet operator does is to answer that question here, which is well What if you have two statements as one and as two that flow to a statement s? What kind of information should we propagate? That's the question that is really asked here and now depending on whether we are reasoning forward or backward This question may come up in different kinds of situation So if you have a forward analysis, then this question comes up whenever the execution Branches merge again. So for example, if you have an if somewhere and the execution is branching Then at some point the then branch and the else branch will merge again and will reach and this is where this question comes up So what what what kind of information do we take the information from the then branch or the information from the else branch? If we reason about the code in a backward analysis Then the same question comes up already at the branching point for example If you have an if and reason backwards then at the branching point The two branches come together and we need to say what information to propagate from the two branches as well from two statements That are incoming namely s1 and s2 Now the meet operator defines what to do in this case and there are two common answers to do this Which is set union and set intersection So in the union case, we are basically saying that whatever We have at the entry of Statement s is what we have at the exit of one of the incoming statements as one And then the union of this with whatever we have at the exit of the other incoming statement as two So we just take both these two sets and put them together as a union and Then the other option is to compute the intersection So this looks very similar in this case We say whatever we have at the entry of statement s is only what is in the intersection of what we have at the exit of statement s1 and at the exit of statement s2 So only elements that occur in both branches are considered in that case for the available expressions analysis We had taken the second option Intersections simply because we can only say that an expression is definitely available if it is available on both incoming statements Let's move on to property number five of a data flow analysis, which is about the so-called boundary condition So what the boundary condition defines is what kind of information to actually start with at the first note of the control program That we are considering So what this first note is depends on whether we are looking at the forward analysis or backward analysis in Case of a forward analysis The first note is the entry note of the control flow graph So the one that we have at the very beginning and in case of a backward analysis because we start at the end The first note is actually the exit note of the control flow graph So usually the point in the execution where the function has finished its execution Now in order to define what information to start with at this control for graph note You can in principle define any subset of the domain of the analysis But what typically happens is that you either define it as the empty set Which is what we did for the available expressions analysis, right? And all you can define it as the entire domain So the entire domain of the analysis where you say well We start with the full set and then sometimes we are removing things when we are visiting the individual statements And then finally property number six of a data flow analysis are the initial values which basically tell us What information to start with at the intermediate nodes? So when you start propagating the analysis state When you reach a note that you haven't seen before you need to start with some set as the current analysis state of this statement and again There are two typical choices either you take the empty set or you take the entire domain of the analysis And the empty set is what we had done for the available expressions analysis because before we have really thought about a statement We cannot really assume that any of the expressions is available. So we start with the empty set here So just to wrap this up here are the six properties again And here you can also see how we had to find them for the available expressions analysis So the domain is the set of all non-trivial expressions The direction is forward because the available expressions analysis reasons In the forward manner about the code the transfer function is defined as you see here So it takes whatever is the entry state of a statement removes everything that is killed and then adds everything that is Generated by the statement As to meet operator we had said union makes most sense because otherwise we cannot be sure that an expression is really available We start by saying that at the very beginning of the control flow graph the Set is empty because we do not know what Expressions are available in the same for initializing the sets of intermediate nodes All right, and that's already the end of video number two on data flow analysis So you have now seen the basic principles of data flow analysis namely these six properties that are defining a data flow analysis and What we'll do in the next video is to actually look at more examples and other kinds of data flow analysis That all are defined by defining these six properties and then look at different static analysis problems Then the example that you've seen so far. Thank you very much for listening and see you next time