 So, let us start discussion for today and to summarize what we started looking towards this class. We were looking at how compiler fits into the overall programming development environment and what compiler does and how compiler does. And we were really started the discussion on how compiler works. And we were saying that if I have to jump from a representation which is in a high level language to a representation which is low level language and the compiler is sitting somewhere in between then we do not want to do translation in one step, but we want to take small steps at a point of time and we want to move across these small steps. And we need to understand to understand compiler what these small steps are and if we can figure out each of these small steps then perhaps we will be able to understand how compiler works. So we want to translate into steps. We want to make sure that each step we are talking about here is doing some activity which is coherent and which is logically isolated to one of the phases that is going to help us in development that is going to help us in final debugging. So what we want to do now is we want to design a series of representation. As I said that what we are doing is really changing the representation from high level language to representation to low level language. And therefore if I need to have these small steps, I must have a representation here. I must have a representation at each of the steps. And slowly what I am doing is I am just converting this representation till I reach the target here. So what are these steps and we started looking at few things. So we are looking at intermediate representations. Now, kind of, so what we call these are intermediate representation. This is our initial representation, this is the final representation and all these are intermediate representation. And we need to understand what these intermediate representations are and how each of these phases is going to take one representation and then convert that into another representation. So as I start from this place, you can see that this representation is very close to the higher level language. And this representation is close to the machine language or is the machine language. And as I move along this path, representation is becoming closer and closer to the machine language. That is really the idea. And the first step we looked at was that we want to understand what this representation is, we want to find out what are the alphabets in my set. And I want to find out various words and so on. So we had examples of that. So first few steps can really be understood by analogy how we understand the language, and that is the example I was giving you. So what we want to do is that we want to first define what is the alphabet set I have in my language. So if I am trying to make a compiler for any language, first thing must be known to me is set of alphabets, this is very essential. So we had example of English, but if I talk of C, what is my set of alphabets? You can see as the character set, all the visible as the characters. You don't care about control characters. And once I have defined a set of alphabets, then I look at set of words. And what are my set of words in a typical programming language? These could be keywords, and apart from keywords, what else would be there? Identifiers, operators, I have all the so-called punctuation. So when I say I have semicolon and so on, I can have numbers. Numbers will come in various representations and so on. So I'm going to have all these set of words. And I must also know how to take my set of alphabets into a set of words. So this is what we're looking at. So here is another example. Now, it may sound that I mean, it's very easy to find out words, but look at a sentence like this. How many words does it have? What is the sentence? So you can see that normally I have violated all the rules I know of English, but you are able to figure it out. So you can go back and forth and find out that, well, this must be a word boundary, is this a sentence? And I have put blanks at all the wrong places, but you are still able to figure it out, but compilers can't do that. So compilers must have very well-defined set of rules, which will say that how to break a sequence of characters into a sequence of words. And this is the phase which is known as the lexical analysis. So this is really the first phase of compilation, where we say that, input is, so let me just use the same figure. So if I say, the first phase I'm talking about is lexical analysis, okay? And lexical analysis is the one where input is a set of characters, and outcome here is a set of, or sequence of words. So this is the first thing I want to do, given something not as difficult as this, but something much more trivial. So if I say I'm writing a statement like this, if I'm writing an expression like this, the first thing I want to do is, I want to break this into words and say, what are my word boundaries? I will identify all the word boundaries and ignoring anything which does not contribute to the program. So if they have comments, I'll ignore them, if they have blanks, I'll ignore them because they are just for indentation. They're not contributing anything to the meaning of the program, okay? And therefore, we must know exactly what the word separators are, right? Now word separators, depending on the language, would be different. So in English, you say that whenever you have a space, whenever you have a tab, and so on, you have a word separator. So the language must define what are the rules I must have to break a sequence of characters into a sequence of words. And normally what we have are white spaces, punctuations, etc. These are really the separators, okay? And in programming languages, characters from different class can also be treated as a word separator. So for example, I may not have a punctuation here, I may not have another word here. So you can see that sometimes, even when I don't have a blank or a punctuation, I'm saying this is a word boundary because this is a word which is coming or this is a character which is coming from a different class, okay? So for example, if I have a sentence like this, okay? Or a sequence of characters like this, which says if A is equal to B, then A is sine 1, as A is sine 2, okay? Then the sequence of words will be that I will break this and say that here is a break, here is another break, here is another break and so on. And I know the rules, how I'm going to break this. And ultimately, what my lexical analyzer is going to do is, he is going to take this as input and he's going to give this as output. And this is really the first step of what a compiler does, makes sense, okay? Now, once I have broken this into a sequence of words, what should I do? What does tag mean? So we have to find the specter of the. So there is, it's not called tagging. What it is called is that I want to really identify the structure and I want to find that with respect to a set of specifications, which we call in the language is as grammar. We want to find out whether this is a valid sentence or not, okay? So the next step really is that we are trying to now, once we have understood each word, we want to understand the structure of the sentence, okay? So for example, let me just pick up the previous, from the previous class. Now here is a sentence, okay? Now is this valid sentence grammatically, okay? What about this? So why this is not valid? What is wrong with this sentence? So some of you say that, so let's also look at context, okay? And understand the context clearly. What I'm trying to do now is, I'm trying to find out whether this is structurally a valid sentence. How do I define my structure? I define my structure by saying that my structure or a sentence will have, typically if you remember your Ren and Martin will say that there is a subject, there is a verb, there is an object. But it doesn't say to begin with when I talk of structure, with what kind of subject, what kind of verb can be used? Only structure, right? Now as far as structure is concerned, this is valid and this is also valid. There's a verb, right? I'm not looking at relationship between them, okay? So for example, if I say structurally this is also correct, okay? There's a verb. Now what may happen is that this is not the right verb as far as this kind of I is concerned, right? But structure is fine. So what we are doing here is, we are just testing the structure and not looking at the context, okay? Context will come somewhat later, okay? So this is the process which is known as syntax analysis or passing. And if I have a sentence like this, I'm going to mark it, okay? It will give me a structure like this which will say that a sentence must have subject verb and a verb phrase, and could be auxiliary verb and so on, okay? And this gives me a sentence. But if I put something else here, then that will also work. That is also a valid sentence as far as with respect to the grammar of the language, okay? With respect to the structure, okay? And therefore you have to be careful when you are thinking about saying that what is valid and what is invalid, okay? So for example, is this actually a valid sentence? So typical my specification could be, I can say that whenever you have an assignment, on the left-hand side you must have a variable and the right-hand side you must have an expression. But I don't say whether this is a valid expression or not, okay? That becomes a sequence, okay? So the second phase therefore becomes syntax analysis where input is a sequence of words and what is the output? So output will be something like this, and this is known as parse tree. But it will also tell me whether this parse tree is a valid parse tree or not. So for example, if I have something like this, then this will not construct a parse. This will say that whatever we are trying to parse is incorrect. So if it is input is correct, it will give me a parse tree. If input is incorrect, it will give me an error. That will flag where the error is. Now once we have done this, okay? So parsing is, program is exactly going to be the same as we saw in the previous case of English. So if I now try to say that here is an expression which I want to parse, okay? So this is saying if x is equal to y, then z is assigned 1, else z is assigned 2, okay? And this is going to be my specification which says that if statement must have predicate, must have a then part and an else part, okay? And this is how the predicate and then and else too, okay? So if this is not valid, I will not be able to construct parse tree and I will have to flag an error there, okay? So this is what we know as parse. Now beyond this point, we want to understand the meaning, okay? So this is where the meaning starts coming in. And we say that if I use a sentence like v is in csp35, you say this just doesn't make sense because I am not using the right kind of sub phrases to find the meaning of this. So this is also known as semantic analysis of program where I am trying to understand meaning of the program, okay? So here is an example, okay? And natural languages, the reason I am picking from natural languages is that they can be quite complex and obviously programming languages are going to be lot more easier but they become easier because we over specified that we try to put more specification. So when we say Pratik said Nitin left his assignment at home, okay? Now you are saying that, so look at the scenario, okay? His assignment, whom he is referring to. People can come up with multiple interpretations, okay? Now we don't want such a situation in programming languages. We want to make sure that when I write a program, it has only one meaning. There cannot be multiple interpretations of a program, okay? And therefore we have to be very precise, okay? And here is a worst case than the previous one, okay? When we say Amit said Amit left his assignment at home, okay? Maybe I mean they are two persons with the same name, okay? So how do I handle semantic analysis? Give me some ideas. You have already done this in some way. So let me go back to this phrase here. Is this a valid sentence in C? Cementically? Yes, no? Everyone says yes, wonderful, okay? So let's say A is of type character and B is of type integer and C is of type float. Is this valid? So what has gone wrong suddenly? So how could you say A assigned B plus C is valid without looking at type information? So answer should have been when I say, is this a valid sentence in C? You say I don't know because I don't have the context information. Like here if I say, is this the right word? Just by looking at this word, you cannot say whether this is right unless you look at the context. And in programming languages, the context is provided by the type information. So I have declarations in the beginning and I say that I declare type of each and every variable and then you have rules which say that what kind of type variables can be put together in an expression, okay? If I try to put wrong kind of type variables, so for example, maybe this assignment is not even defined, okay? So if I take this addition, maybe I mean if I try to add integer and floating point number, internally it will say let's convert everything to a floating point number, but if I say that assigned now a floating point number to a character, it will say this is an error. So without having the context, which is this information, I'll not be able to figure out whether something is valid sentence or not. So structure may be correct, but meaning may be absolutely wrong, okay? So compilers perform this analysis to understand meaning and we catch inconsistencies and we have very strict rules to avoid such ambiguity. So for example, when I was having these kind of sentences, okay? There are ambiguities here, which will not permit in programming languages. So I will immediately, if I say that I want to write a program like this and I say output omit, then using scoping information, I will know that in this particular scope, this variable omit is assigned value 4. There's no ambiguity there, okay? But if I put a C out after this, okay? Then I know from block structure that the value I'll output is going to be 3, okay? So precisely I know at each point of time in the program that if I refer to a variable, what type I'm using and therefore what is the binding of this variable, okay? And that is very essential for us, okay? So compilers perform many of the checks other than just binding these variables to many types, okay? And what are these, oops, sorry, far ahead, okay? Let me go back for a minute, okay? So here is type checking where we are looking at context information, okay? So if I write the sentence like this, okay? What does it mean? Do I have all the type information? So you say omit left her work at home. Is this a correct sentence? Why do you say it is not correct? Some people say it is correct, some people say it is not correct, okay? And so that is one possible error that omit may be referring to, we don't know the context and there may be other sentences, but suppose I say this is standalone sentence, there is no other context, then is this valid? Why? Yeah, you are saying something here, okay, and what about this? So these are Scandinavian names. So from your social context, you knew that omit is normally a male's name, not be a female name. Now do you know the social context here? I just picked up some names from Danish and 50 sectionary, now I don't know. So this context is very important for us, okay, and unless we know the context, we will not be able to find out the meaning of the sentence, okay? So here is some context information, but we want stronger context information and what we do here is therefore we have all this declaration and therefore now how does my, at this point of time, how does my compiler structure look, okay? We have a lexical analysis phase which is going to take this representation and convert this into a sequence of words and then we have, and this is also known as stream of tokens and then we have syntax analysis phase. Syntax analysis is going to take input as a sequence of tokens and is going to give me a syntax stream and then I'll have semantic analysis which is going to then find out the absolute meaning of the sentence which will give me an unambiguous program representation where I'll have no ambiguity left as far as meaning is concerned and this is also traditionally known as the front end phases of the compiler, okay, and why they are called front end? Because typically, historically, we'll go into little more details, slightly later, what this means is that I'm really dealing with the source programming language here, okay? You can see that when I did all this analysis, when I was trying to do all this analysis, I had no information about the machine. I did not know where this program is going to be compiled, where this program is going to get executed. Only information I had was of the source program representation in the source programming language, therefore this also is known as the front end. So this is what we do through the analysis of the program to find out what is the precise meaning of the program, okay? And if I cannot find a precise meaning, then what do I do? I flag an error, and I say that this program is not correctly written with respect to the specifications of the language. Program may still have bugs, program may have logical errors, but as far as language is concerned, it is very fine. So I'm not trying to find out whether it has logical errors or not, what I'm trying to find out is whether with respect to the specifications of the language, I have a correct program or not, clear? Everyone is with me up to this point, okay? So let's move forward because we are still somewhere in the middle and this part still needs to be filled, okay? Now what are the other things I can do? So these are front end phases. Let's go back and now look at front end phases in little more detail with respect to the programming languages. So we have lexical analysis, which is going to recognize tokens, and when I recognize tokens, I also want to ignore all the information which is redundant to the understanding of the program. So for example, if you put tabs somewhere, you put multiple spaces, you put comments, I just want to ignore them. And if this is my input, which is a sequence of characters, I want to generate a sequence of tokens here. So here is something which is known as a token or a word, okay? And from the compiler terminology, we'll keep using word token. We'll not use the words, okay? So this is my input to lexical analyzer, and this is my output of lexical analyzer. And this really becomes the specification. So when we start doing lexical analysis in detail, that is the specification we are going to use, okay? And we also want to report errors. If they are incorrectly formed words, then I want to report. So if you say there is no valid identifier or I'm not sticking to the rules of identifier, then I'll report that as an error. And all this will be modeled using regular expressions. Remember, I was talking about how DOC is going to be used here, theory of computation. So we use regular expressions, and how do I implement regular expressions? I try to make a finite state machine, okay? A finite state automata. So I'll create a finite state automata, which will take this as input and will give me this example. So this is really, if you look at this point, it gives you full specifications of what lexical analyzer is supposed to be. What we need to understand it, how it does it, clear? Okay? Let's look at specifications of syntax analysis. So I want to now check syntax or structure of the language. So what will be my input? Input is going to be a sequence of tokens, and what will be output? Output will be a parse tree like this. Output will say that all this can be put in a structure like this, okay? But if it cannot be put in a structure, then it will also flag an error. So it will report an error and will try to recover from this error. What does recovery mean here? What recovery means is that not only I will flag this error, but I will skip this and will try to analyze rest of the program to find out more errors, okay? We'll look at error recovery in more detail later when we actually do syntax analysis. But recovery means that I want to read as much program as possible to find more and more errors in the program, okay? And how do I model it? I'm going to use context-free grammars to model my syntax analyzer, and you already know all the theory of context-free grammars, and how do I implement this? Pushdown automata. I'm going to use pushdown automata or table-driven parsers to model this, okay? So this is where you will find that whatever you learnt in your theory of computation will come into play, and what happens in semantic analysis phase. This is where the contextual information will come in. So I'll start checking now, meaning of the program. I'll obviously report errors. So if I have situations like this, where we are trying to write expressions and trying to do a mix of different types which are not permitted by the programming languages, I'm going to report that. I'm also going to disambiguate overloaded operators, okay? So for example, when I write something like this, what is the meaning of this plus? What is this plus? It's adding what? So if I write something like this, B plus C or A plus D, and these are two floating point numbers, then this is, what is it adding? It's adding two floating point numbers. But suppose these are integers, then what is it doing? Okay, now what about a situation where I say I have two strings S1 and S2? So there is, you can see that in three different contexts, the same one sign plus sign has three different meanings, okay? And the machine operators are going to be different for this, and therefore I'll have to disambiguate that whether this is concatenation, whether this is a floating point addition or this is an integer addition, okay? So these are overloaded operators and I want to disambiguate. I want to understand what is the exact meaning because at some point of time I'll have to go and map it on to the off codes, which are available to me, okay? I'll also have to do what we know as type version. So if I take a situation like this, where I say that I'm trying to add an integer and a floating point number, and you know that representation of integer and floating point numbers on the machine are different, okay? So I'll have to change this representation into floating point and then do the addition, right? I'll have to coerce this type from integer to float before I can do really this addition. So that also is a job which is going to be done by the semantic analysis phase, okay? And I'll have to do type checking, I'll have to do control flow checking to make sure that I don't jump into middle of the controls, okay? I'll have to, depending on the language specification, I may have to say that no variable in one context can be declared more than each variable is a unique variable in each scope, okay? I may have to do name checks, so if I have labels for a block, like Ada says that each block must have a begin label and end label. I want to make sure that those labels are same and so on, okay? So this is another thing we'll have to do, and what will I get as an output? I'll get disambiguated parse tree or abstract syntax tree. So here I'll say that each of the operators I'm using, it'll have a unique meaning. So it will not say, it could be computation, it could be integer addition or floating point addition, but it will say that I have an integer assignment here, A and B are integers, and then I have a pooling here and so on. So what you get at the end of semantic analysis is a completely disambiguated abstract syntax tree, where everyone will be able to come up with only one meaning. And there is not going to be any ambiguity about it. And this is what we know as front end compiler and outcome of the front end of compiler. So what does that mean? There can be absolutely. There can be ambiguity in the parse tree. Give me a concrete example of what you have in mind. So question is, can I have different parse trees for the same input? Yes, we can. And therefore we have to make sure that when we write grammar. So remember that when I talk of these languages, when I talk of the grammar, I can write multiple grammars for the same specification. And therefore I have to make sure that my grammar is written in a manner that I will not get more than one parse tree. And that will take care of all the ambiguities. So is question and answer clear to everyone? So question was, can I have specifications? It will give me more than one parse tree. The answer is yes. You can have specification, but that's not a good specification. And what we'll try to do is either we come up with rules to disambiguate it or write specifications in a manner that I will not get more than one parse tree. That is what we need to do in the syntax analysis. So let's move on now. And what we have done here, so let me just replace this and say that what I have done is now, I have all these front end pages. But what is it that I do beyond front end? Because I have still not reached close to where I wanted to reach. It was the machine specification or something which is close to machine. I'm still dealing with the source level specifications which are programming language specifications. Now we also want to do code optimization. Now code optimization because I am dealing with, I mean at least I was giving you similarities with the English language. I cannot find a strong similarity except that this is like doing a technical editing or doing a precy writing. Where we said, remember precy writing in your school time? You start with a paragraph and say reduce it to almost like, one rule was reduce it to almost one third, keeping the meaning same. And using fewer words, can I do that? So that is the closest similarity we can find out with what we do in optimization. And the purpose of optimization is that we want to have a program which will run faster. And we also want to have a specification which will say that it will use fewer resources of the machine. Now if you talk of resources which are available to us, memory, registers, all these are going to be resources for us. So we want to make sure that I use fewer resources and I have a program which runs faster. But remember that when I say it runs faster, it is again not with respect to the algorithm. It is only with respect to the representation. And I'll give you examples of that. And there are some very common optimizations which we do. At this point of time, I'm just mentioning them and very quickly, I'm not going to give you details of these optimizations. But what this says is that, when you say common sub expression elimination, this saying that if there is a certain expression which has been evaluated once, you don't want to evaluate it over and over again. So for example, if I say a is assigned b plus c and then I say x is assigned y plus b plus c. Then I should be able to figure out that I have already computed b plus c. And there is no need to compute b plus c here. Provided certain preconditions are met, which means that after this computation value of b and c have not changed. If they have not, then I can use the previously computed value here. I don't have to recompute it. I want to make sure that something like copy propagation happens. That if I'm just keeping copy of the variables, I may as well use the original variable. I want to eliminate all the dead code, any code which is not reachable during the execution flow of the program. That should not be kept in the output and that should just be eliminated. I gave you an example where if a condition you know at compile time, either evaluates to true or false. Then you know for sure that one part of your statement is not going to be executed, you can just throw it out. Code motion says that if I have a loop and there is a statement which is constant which is invariant of the loop, which does not change its value with the loop execution, you may as well move it out of the loop. Because it's going to execute multiple times as part of the loop. Once I move it out, it will execute only once. Then I may do strength reduction which says that instead of a costlier operator, computationally costlier operator, use a cheaper operator. And constant folding says that if there is a constant expression, which you know at compile time, you may as well compute it rather than your rating board. So if somebody writes code like this, I don't want to generate an added instruction because that is going to again take time, I may as well replace this by, that's going to save me one computation. An interesting part is that all these computations actually give you a lot of speed up. These small things which happen in the program, they can give you a lot of improvement. But this part will delay, only towards then we'll talk about optimization. But what we want to do is now, so here are some examples of optimization, interesting. So these are some common computations if you recall. You have some value which is being assigned to pi and then you are computing area and you are computing volume. And if you see how many computations I'm going, I have three additions, four multiplications, one divisions, and two exponentiation, the two and the square. And you can find that I can do some very trivial things here. You can see that if I look at 4 pi r square, 4 pi r square is a common multiplication which is already computed here. So I can replace this by something like this, saying that I already know that the sum, so I can actually give you many, many versions of this. And let me give you all possible versions and you will find that each version has its own advantages and disadvantages. So you can see that basically what it says is that if I just propagate this constant value, then I know that instead of every time computing 4 pi, I can instead use a variable like this. Now, I would also make sure that all my programs are written this way. Because this is highly readable. Remember that when we talked about writing programs, these programs are written for the users. And we want to make sure that these programs are readable. So I don't want to write these kind of programs, because I want to make sure of readability of program. But I want to make sure that compiler all the time, instead of doing so many computations, will do something so that I do fewer computations. Now, which one is the best? That will depend upon in which order compiler is going to do the optimization. But you can come up, you can see that this representation can be changed into many representations. But you can also see that as far as this method of computation is concerned, I have not changed anything. I'm only doing fewer computations, but I'm still using the same formula. So this way compiler does optimization by saying that I can do a constant propagation, I can do a constant folding and so on. And common sub expression itself. So let's move on to the code generation. So this is where somewhere I'll say that optimization will fit here. But again, you can see that optimization has not changed anything in my representation. Whatever my representation was before optimization, that representation still remains the same. Only thing that is happening is that I am doing fewer computations to achieve the effect. But what about this part? Because this is where I'm going to not change my representation to machine representation, and that is also known as the code generation phase. And as I was talking in the previous class, we said that they are going to be different levels of abstractions in the program. So when I talked about source abstraction, I was dealing with variables. And then I was dealing with operators. With this, I was able to constitute expressions. And once I had these expressions, then I went on and I had conditionals, iterations, using which I made functions, okay? And depending on the language that you can make classes and so on, okay? So this is really the abstraction I use at the level of source. But what is my abstraction at the level of target? Here I was using memory locations. Or I was dealing with registers. Or I am dealing with stacks, okay? These are typically resources you will have on the machine side. Then you will have opcodes. And then you will have, after opcodes, what do you have? You have addressing modes to access all these locations. So what I really want to do is I want to do this translation. I want to change this representation to this representation, okay? Now you can almost see that if I'm careful, I have almost a one-to-one mapping here, okay? So for example, if I say variables, variables are going to be kept either in memory location or register or they are going to be in some stack, okay? So what we do is, we now say that again, we'll keep a two-step process. Because it is possible that when I'm dealing with a machine, okay? I may be dealing with a class of machines, which may look very similar, okay? So I may, again, as I said that it may be too tough to jump from one representation to another in a single step. And I want to use multiple steps here. And traditionally what we have used is an intermediate representation. And this representation is something which is going to be closer to machine, but not exactly machine, okay? It will not be concrete syntax of the machine. So we want to generate machine code from the intermediate representation. And we want to generate intermediate code from the representation I got after the semantic analysis, okay? Because remember that optimization could be an optional phase. There may be compilers where I may not do any optimization. My program will still run correctly. It may take a little more time, okay? So normally, the compilers, this is the essential part. This is the essential part, and this is really the optional part. Depending upon how much effort I am going to put into building the compiler, okay? So advantage is that, obviously that each phase is going to be a simpler phase. And it requires therefore that I design an intermediate representation, okay? So my intermediate representation in front end, what representations did I have in the front end? I had a sequence of tokens. I had a syntax tree, and I had disambiguated syntax tree, okay? And here is now another representation which will come between the disambiguated syntax tree and the machine representation, okay? So there is one more representation which is coming in. And most compilers, they will perform translation between a sequence of intermediate representation, right? So generally, when I look at intermediate representation or representation which is the source, okay? And the final representation which is really the target, they are going to be a sequence between two. And this is, order is decreasing level of abstraction. So this is highest level of abstraction, and this is the lowest level of abstraction this. And everything is coming closer to the machine, that is what we really want here, okay? So typically, one after another representation code is the most important thing to do here. Now what is it that I really want to do? So let me pick up a complete example of intermediate code, okay? So my exception as I just described, they are identifiers, operators and so on. And target level, I have all these memory locations and so on. So first thing I need to do is I need to take identifiers and put them in either memory or in register or on stack, okay? So this is what I call as memory allocation or allocation of resources. I'll say that this variable is going to recite in such a memory location. This variable is going to register and so on. That's the first thing I need to do. Now once I have identified, I have put locations for each identifier. Then I want to see that corresponding to this operator, what is the opcode I have? Now on many machines, you will find that almost for most operators we have today in the programming languages, you will be able to find an equivalent opcode. You'll be able to find that this particular operator can be mapped onto this opcode. But there could be situations where you may not be able to find an equivalent opcode. So for example, if you say, I want to do string concatenation. It's possible that on your target machine, you may not have something similar. So what will you do? In that case, what will I do if I want to implement concatenation, for example, or a string operation? I just need to write a macro, right? So if you know your assembly language, you just write an equivalent macro, which will do the same thing. So at some point of time, there were microprocessors which did not have multiplication or division. So how did we implement multiplication and division on that? By a sequence of additions or subtractions. So you can always find out that either I can map this particular operator to an opcode or I can write a macro to implement this particular operator. What happens to my conditionals here? So I have these statements and then conditionals and iterations. How do I implement conditionals? So is there something special in conditionals which I need to implement at the machine level? What is the equivalent abstraction here? Branching, but how do I do branching on the machine? Jump, for test and jump, right? So if you have a test and jump, I can just find out, I can do comparison of two locations. So I can say if A equal to B, what that needs to be done is I compare A and B, whatever locations are now bound to A and B. And then depending upon whether A is less than B or A is equal to B or A is greater than B, some conditional codes are going to be set. And then I'm going to do a jump. And how do I implement a conditional jump or go to just jump to a level, right? Because every opcode is going to or every instruction is going to be in certain locations. So I know the addresses and I'll just have these jumps. Whenever you have these functions, either these are user defined functions and therefore they can be just converted into transfer of parameters and then jump to a location or I can use system libraries, right? So I may be using system libraries here or mathematical libraries as well, okay? Or I may be using OS codes. So depending upon what kind of calls you have here, you will have to find an equivalent call at the machine end, okay? And this is what the code generation is. So code generation is really mapping from source level abstraction to the target level abstraction. And the first step here was that I'm going to map all the identifiers. So I'm going to take these identifiers and allocate memory to this. I'm going to also now find out, so this is an important point to understand because I did not talk about really the addressing mode, okay? I want to explicate variable access. So what that means is that when I say an identifier, identifier is never going to be in a fixed location, okay? Now, depending upon the scope, for example, suppose this is, I'm trying to say that A is going to reside in certain memory. And A is part of this variable which is declared in the function, okay? Now, at some point of time function will be called. Now, can I find a fixed memory location to A? That would be very inefficient because only when function is called, I don't know how many other functions are active. That point of time I want to allocate some space to A, okay? But I also want to know that A is going to be fixed at a fixed location with respect to certain offset. So I will always say that if A is part of a function, then doesn't matter where this function is allocated space. Whatever is the base address of this function with respect to this base address, A will be an offset of 100, some such number, right? So I can find out this information that what is the offset of this particular variable and what is going to be the base address. And what does explication mean is that at time of code generation, I want to change all these identifier references to either relocatable addresses or absolute addresses. So what is going to be relocatable address of A? Relocatable address of A is going to be some offset with respect to a base value. So you know that's indexing. So for example, if this is an array and if I say I want to access A5, then what will I say? I'll say that there's a base address of A with respect to certain other base value which is coming from the context information and the index value is 5. So I'm going to use a complex addressing mode to find out what is the location where I can find out value of A5, okay? So I'll use base address of the total scope, within that base address of A and within that an index of 5, right? So this is how I start using all my addressing, okay? So I want to map all the operators in the opcode to a sequence of opcodes and I want to convert conditionals and iterations into a sequence of testing jobs, okay? So intermediate code generation is that I want to also at the same time lay out all this information about parameter passing protocol because function is something that really plays a very important role, okay? And we'll see that how do we do this part and then we also are going to handle all the return values are going to be handled and how we lay out all these activation frames. Right now, these may all sound like terms which don't make much sense. But when we talk about intermediate code generation, this is going to become very important, okay? And that point of time it will become more clear. And then we also want to have interface calls to all the libraries and run times as well, okay? So let's break here today and we'll continue our discussion from this point onwards in the next class.