 So once again, to recapture this is what I have taken from my introduction. This is the input to the lexical analyzer, which is sequence of characters. And what it generates is a sequence of tokens. And in the process, it does error reporting whenever it finds that some token is not properly formed. And the whole thing is modeled using regular expressions, we then use finite state machine or finite state automata to implement lexical analyzer. Now before we get into really the implementation, we need to understand few terms. So one common term, I will actually introduce three terms to you. So one is what we know as the token. Another is the lexical analyzer. And the third term is going to be rules for construction of tokens. So if you understand clearly what are these terms and what does it mean? What do these terms mean? So once again, let me go back and start taking an example and introduce these terms by an example. Suppose I write an expression like this and I am trying to tokenize this. I am trying to compile this particular expression and obviously remember that this is in certain context. So I am not looking at the whole token, I am just processing on this one. And typically what I will be doing is I will be passing on information like saying that there is an identifier. There is another identifier, there is another identifier. This is an assignment operator and this is an addition of it. So my stream of tokens is going to be id is assigned id. So this is what my input to this lexical analyzer will be. So what we have are, if I look at this and this, these are really nothing but the tokens. But if I look at this string, if I look at this string and if I look at this string, these are really the lexies which are associated with these tokens. So tokens are this class which we are going to pass on to the syntax analyzer associated with each of these entities is going to be a string which we call lexie. And then we have rules which will say that how these tokens are going to get constructed. So this is the 3 terms we are going to use very commonly. So sentence consists of a string of tokens which is really a syntactic category. So if I look at identifier here, this is nothing but syntactic category. So this is one can be used. Second will be, so here is an example where we have numbers, identifiers, keywords, things and so on. And when I look at sequence of characters in a token which is really this sequence which is a lexeme. So for example, that is if I say 100.01 that is a number. Or if I say counter which is an identifier or I may say a keyword which is a constant form, there may be a string. So string always comes within codes and here we may say that this is really the lexeme which is associated with this particular token. And what are the rules of description? So I may have a rule like this. It says that how an identifier is going to get constructed. So each identifier must start with a letter and must be followed by 0 or lower of sentences of letter or number. And in the process what we are going to do is when I tokenize my input, I also want to discard all the useless information. So what is useless information here? So all the spaces here may be useless. So I may put one blank, I may put two blanks, I may put a tab, but that doesn't add anything to the meaning of the program. So I may just discard that. So all these white spaces which are blanks, tabs, new line characters, all these are going to be discarded. Unless they utter as part of a string. If they are part of a string, then that is really the lexeme and I cannot discard that. And also what we want to do is whenever we deal with numbers, I don't want to pass this as a character sequence, but I want to convert this into a number. So for example, when I am reading this, I am going to read six characters, which will be 100.01. But when I pass on this information to the syntax analyzer, I will say I have read a number whose value is 100.01. So I should be able to convert this character sequence into a number. So for all numbers, I will do this conversion. So when I pass on this information, I will say here is a token which is number and the value of the number is 31. I will not say the value of this number is as a character sequence p forward by 1. And we also need to recognize whether I have certain keywords. Now the rules of construction of keywords are similar to rules of construction of identity parts. But certain keywords in some languages or in most languages are not some. They are going to be reserved. So we will have to then take an input like this. So if I say that my input is counter being assigned counter plus increment, this is a common thing you will see x is sign x plus 1. So what will I pass to the syntax analyzer? This is what I am going to pass to the syntax analyzer. And every time I do that, I will have to check whether I am dealing with one of the identity parts, one of the keywords. If I say it is a keyword in my language, then I don't want to perhaps deal with these kind of situations. Some languages will permit that, but most languages will not. So I need to figure out that whether I am dealing with an identifier or I am dealing with a keyword. And keyword is a list of words which are reserved in a language. So functional specifications clear to everyone? What we are trying to do here? So let's move on and let's see how does it interface with other phases of the compiler. So if you go back and recall the compiler structure we had, we had somewhere in the first phase in this compiler, which was the lexical analyzer and then we had syntax analyzer. So really if I am looking at lexical analyzer, it is dealing with two entities, one which is input and another is syntax analyzer. It does not have to deal with any other phase other than the symbol table, where it will have to put all the information. So this is how the lexical analyzer looks. It is feeding information to syntax analyzer and is taking input from the sequence of characters. Now assume that the syntax analyzer is the one which is driving the whole process. So look at it this way that when I am starting my process of compilation, syntax analyzer says I want to pass something, give me a token. And syntax analyzer is asking for a sequence of tokens and who will pass obviously the lexical analyzer. So what it says is it just asks for a token and now to generate a token what lexical analyzer has to do, it has to start feeding characters from input. So it starts feeding characters from input and it starts forming now a token. So for example if I say I want to compile this, syntax analyzer says give me a token and lexical analyzer then reads A. But by reading A will it know that it is a token or not? It does not know, it has to read more. So it has to continue to read. So it will read then the next character and that certainly realizes that I have reached the word boundary and A was really between the word boundary was between A and assignment symbol. So in the process what it has done is it has read an extra character logically. So what happens here is that it has formed a token which is A and in the process it also has to say that whatever extra I have read pushed that back to the input stream because this is going to be the beginning of a new token. Is this point clear to everyone? So what we are trying to do here is that when lexical analyzer is trying to read characters and forming a token, it identifies a token and in some cases, not always, in some cases it will also have to push back all these extra characters into the input sequence. Logically that is what is happening. We will see how the implementation happens. So push back is required to look ahead because here I am saying that I will have to look ahead to find out the word boundary. And here is an example. So if I say I am trying to read greater than equal to or greater than, I will not know unless I have read the next character whether it is part of the same token or not. So this is implemented. This push back etc. can be just implemented through a buffer and all that means is when I am reading a character or pushing back character, what it means is that I have an input pointer which is moved left or right. So pushing back is not unget there like you go in C. You can just use a buffer to implement it and I have to take care of this input point. So keep input in the buffer and keep moving this pointer over the buffer left or right. So how do I implement lexical analyzer? So we understand what lexical analyzer does. Let's start looking at how I am going to implement and I will talk about three approaches to implementation. One is that since I am doing low level IO and I want this process to be very efficient I can always program this in assembly language. That is one option that is available to me. Another option is obviously I can use a high level language like a programming language C or I can use tools like lex or flex which are where I just write specification and I am going to then implement by lexical analyzer. So remember that not only I am worried about functional character of a phase I am also worried about the speed of a phase. How fast it can read my tokens because you don't want the situation where you have a program which is very slowly reading your characters and that becomes the bottleneck because remember this phase involves a lot of this kind like your program is going to be on the disk and there is a lot of IO which will take this. So which effort should I take? First one? Third. But what about efficiency? Development time will be very less. Development time will be very less. True. But run time? Run time will be slower than other things. So do I want that? I don't just remember the lexical analyzer. Oh, can be very difficult. Try writing an assembly program for this IO versus a character in a high level language. There will be noticeable difference in measure of time. So normally if I look at pros and cons this one is obviously the most efficient because you can do low level IO but also it is going to be most difficult to implement. This is definitely efficient not as efficient as this and it is difficult to implement not as difficult as this once again. But if I go to the third approach it is easy to implement but it is not going to be as efficient as the first two cases. Because I have no control over IO I am only hoping that my tool will be doing good IO tool will be doing good function management and that is why you see we had tools like flex which said fast lex because lex did not implement very efficiently in this. So this also goes back to the first point that you find that your tool is not efficient and you get a better tool then you can actually have a faster compiler. So normally what happens is in practice when we start implementing we will start with the third approach because we do not want to become a bottleneck for the subsequent phases. But as rest of the compiler is getting developed you keep moving to a high level language and keep implementing at least IO processes in assembly language. So to make sure that not only you are functionally correct you are also very efficient. So always start from this but slowly migrate at least to this level and then put on your IO at low level make sure that IO does not become bottleneck in whatever language you are using. So we need to therefore understand how to implement lexical analyzers using this approach and how to implement lexical analyzers using this approach in a systematic manner. So even when I want to write programs I want to write C programs to implement lexical analyzers. I just do not want to be writing arbitrary C program but I want to have certain structures over it and we talk about both these approaches of implementation. Is this point clear to everyone? Any comments, questions here? So let us move on and let us actually take a small language and try to construct lexical analyzers and see what goes on in lexical analyzers. So this language is going to allow white spaces it is going to have numbers and it will have arithmetic operators in an expression. And what it is going to do is it is going to return tokens and it is going to return an attribute to syntax analyzers. So token will be something of this class so it will say it is an identifier and my attribute may be saying either it is a lexeme or it may be an entry to the symbol table or attribute may be saying that so if I write something like this a is assigned say b plus 46 it may say that here token is number and attribute is number 46. So there is the information it is going to return and what we will do is we will assume that I have a global variable just for the sake of implementation of this program instead of returning a value I will say there is a global variable where I am going to copy this value and then the subsequent phases are just going to read this global variable and we will pick up this value. So token value is my global value which is going to set to the value of the number and what it requires is that we have a set of tokens which are defined and then we describe strings belonging to each of the tokens. So let us just look at the structure of such a C program. So I am just writing a standard C program without worrying too much about the structure the point of showing you this program is that how horrendous it can be to write the lexical analyzer which just involves so few tokens and then motivate you to use high level specifications and some tools which can then make sure that this job is done correctly and efficiently. So you will have all these hash input statements and I have initialized this token value to none none is somehow a defined character which is an integer now and here is a function lex where I have t as an integer defined and this is where the function closes. Now what are the things I need to do I am sorry this character this is not visible maybe because of brightness but what we say is while one that means going an infinite loop and keep reading a character so this is reading a character and first thing I do is I do a get cap I read the first character and then I find out whether this is a blank or a tab character this is a blank or tab I just ignore it I do not want any white spaces in my input I am just discarding it otherwise if this is a digit then what do I have to do if my input is a digit then what do I do make a decision make a decision like whether I have to wait or to do something I cannot wait there is nothing like waiting there so I have to read the next character and find out whether that is a digit or that is forming a number so I will keep reading as long as I get digits and as I am reading I will also keep converting these digits to a number and whenever I get something which is a not digit then I will say I have reached a word boundary and I will start beginning of a new token so this is what we do here we say token value is t minus whatever is as key value of 0 and t is then get care and then I say while I keep on getting digits keep on constructing token well and how do I construct token well I say simple arithmetic expression I take the previous value whatever I have read previous value is multiplied by 10 whatever I have read that is added and I go in this loop and I keep on doing it so long as I keep on getting digits now you can see that when I come out of this loop I will have the number as token well but I will also I would have read at least one extra character to find out whether it was digit or not then only I will come out of the loop so that is an extra character I have read so that is the first thing I will have to do then I will have to put it back into the input so I am getting that now once I do that then what happens so what other situations could have been there in my input okay so what other characters I could have got I have blank I have a tab which are white spaces I have a digit and what else I could have in my specification I have arithmetic symbols so here I know that I have recognized the number okay otherwise what I do is I say that token well is none token well is not a number and whatever is the T I just return that so if I read a plus for example I just return plus if I read a minus I return minus otherwise I return a number so this is the only specification I have I had expressions which consist of just the numbers and arithmetic operators and I will just argue about this now you can see that just to do this I had to write a program C program which is like 10 to 15 lines of code but more interesting part I had to write I had to use all these data structures I had to use this iteration here I had to do an IO here so this is what really makes a lexical analyzer complicated now imagine you have full specifications of programming language like see your passwords or read out anything and if you start writing a C program like this the chances that you will make an error somewhere in your declaration somewhere not using a loop properly not returning a character properly to the input buffer and so on are very nice so this approach is not clearly something we want so what do we do now so one extra thing which I am bringing at the end now read this what it is saying is that if my input character is a new line character then I am saying increment some line number by 1 and I have initialized my line number to 1 now why do I need line numbers here in lexical analyzer I am just returning this code and I am passing this structure so for example if I had something like this suppose my input was something like this I would have written the same set of tokens why I am capturing this information like line number how does it help me in doing passing and less cost compilation so when I have to do debugging at that point of time I must remember that for which line number what was the code which was generated because my view as a user is going to be that I want to take breaks at certain and use gdb how many of you are familiar with gdb almost everyone right so when you set a break point at certain line number so how do I map that line number and code somewhere I start generating this information so this information will also be communicated subsequently it will remain some as some tag within my code so that I can then start breaking at certain line number and this is if you now go back and start reading what I showed you in one of the files that compiler is part of overall program development environment and it has to feed information to so debugging was one of the phases made sense so so far we have seen some aspects of what lexical analyzer is doing but let's also see what are the kind of pitfalls we have to be aware of what are the problems you may face in writing lexical analyzer so when I start looking at specifications of a language I must trace certain flags in certain situations so what are the kind of problems I have a face here one is obviously that I am reading my input character by character so your I.O. has to be very efficient also look at character is going to determine what kind of token to read and when the current token acts so for example this is if I am reading this and then I read this this is going to determine what kind of token I had and this also is saying what is beginning of the new token so when I start for example something like xyz is a sign something by reading this character I have some idea of what kind of token or what kind of character I am expecting that will also give me a hint of what is the finite state machine I am going to use to identify this particular token and first character alone cannot determine what kind of token we are going to get so if I read for example first character of this so the example I already gave you if I have this and this is not clear just by reading either this or this what is it that I am going to get unless I read at least few extra characters in this case I am assuming to begin with that look ahead of one will be sufficient to find out but we will see situations where look ahead of one character is not going to be sufficient so next issue that comes is that how do I interface with the symbol table so what was my interface with the symbol table so if you recall very quickly what I had was a symbol table here and symbol table lexical analyzer is to put information in the symbol table and what kind of information it can put in the symbol table what information lexical analyzer has at any point of time it only knows the let's see and the token it has no other information so what do I put in the symbol table so when I say that I encounter A is a sign B plus C and it says A is a token what more information I have is the lexine so I should be able to then in my symbol table say that I have token which is of type identifier and it has lexine A but now imagine this situation that I have this A is a sign A plus C and this is what I am tokenizing okay now again you will say I have an identifier where lexine is A but that must have already been entered in the symbol table when I process this part so do I make a duplicate entry in the symbol table no so how do I know that this already exists in the symbol table see that information is there pointers are there but this is a table right which has multiple records so I must be able to look up this table and say whether such a token on such a lexine already exists right because if a token with this lexine already exists then I don't want to make an empty but when I come to this I will say again insert this identifier with a lexine C right so these are the two functions I need I should be able to look up in my symbol table and I should be able to insert something in my symbol table what do I insert the only thing I can say is insert this token and this lexine and look up will say look up this particular string okay and these two functions should be sufficient interface as far as lexical analyzer is concerned those symbol table I don't have to worry about rest of the structure of the symbol table I don't have to worry about rest of the fields in my symbol table right these are the only two fields I am going to deal with as far as lexical analyzer is concerned so this is what we do that I am going to store information for subsequent phases because I need to know what my lexines are okay and I need two functions which will say that save this particular lexine and this token and just return a pointer okay and before I insert I also want to look up and say that if this already exists then just give me an empty to the symbol table if it does not exist okay then give me another point then you say that it does not exist okay and these two functions are sufficient interface okay and how do I implement symbol table okay now if I look at symbol table one very preliminary implementation I gave you was that I want to have some kind of structure of an area of structure so what I want to do here is I want to have let's say one row corresponding to each of the variables I have in my program and this will then list of information like saying what is the token what is the lexine and then I will have more information so I will have for example type I will have address and some other information which is part of the symbol table but right now let's focus on this part okay now this is the kind of space I am using we will not worry about how efficient this is at this point of time that we will address later so assume that I have now an array okay and you can see that one thing when I do a look up here okay is going to be linear lookup right unless I continuously sort the symbol table and do some binary research okay so assuming that I have linear lookup because that's really not the issue at this point of time okay when I am implementing this okay look at how much space I am consuming okay and look at because at this point of time I am concerned only with these two feeds okay let's look up how much space I am consuming here okay now if I look at token okay how much space do I need for tokens to store a token so let's talk in terms of bytes which and bytes let's go to role level implementation right how much space do I need to store a token depends on the number of tokens depends on the number of tokens I have in my language okay suppose I have 24 different kind of tokens in my language okay then how many bytes I need 32 is the maximum 32 is something which is close to this so just 2 to power 5 I can do everything in 5 bits okay so if I have 24 kind of tokens then 5 bits are sufficient okay so that is very efficient right I can always and for each language I know predetermined I can predetermine what are the kind of tokens I will encounter I can encode it in terms of bits and I can have this compact information so there is really no space overhead here okay so what about the lexeme part how much space do I reserve for lexeme how much space do I need for lexeme depends on the depends on if I am storing my identifiers here it depends upon how many bytes I can have in each identifiers and most programming languages today will permit up to 32 bytes so you can have a variable which is 32 characters so that means I need to store I need to reserve space of 32 bytes okay but if I look at average size of variables when you have written programs right so what are the average size of variables we use variable length 5, 6 rarely more than that so average will be 5, 6 some variables maybe most variables will be like i, j, k and some variables maybe like 7, 8 okay and if you take average it will come to 5, 6 now 32 bytes when I know for sure that on the average I will not be using 5 to 6 bytes is highly inefficient space wise okay so I need to come up with some better data structure than this what word give me some ideas okay so this clearly I mean you yourself are saying there is lot of space which is being wasted here I want to recover the space so what do I do make it dynamic okay so what do I do make it dynamic yeah you are saying something okay so very good so what we can do is that I can have some separate memory which I can keep allocating for my identifiers and all I need to do is have a pointer here which will point to this so this is what my lexeme is going to be so let me say this is identifier 1 so identifier 2 and so on each one will have a pointer here right now you can see that what is my overhead pointer typically is going to be 4 bytes okay so if your pointer is only 4 bytes long then you know that what they are using is very compact and overhead on the average for each lexeme is going to be just 4 bytes okay and this is what is implemented almost in all symbol tables that fixed amount of space to store lexeme is not something which is advisable as it is going to waste a lot of space and therefore store all these lexeme in separate space and each lexeme is separate by some character because I don't want to store information like what is the length of the lexeme okay and symbol table has just pointer to lexeme okay so one implementation would have been like I have this fixed space for lexeme and all other attributes okay which is usually going to be 32 bytes what says I have all these separate spaces for lexeme which is stored separately and I have just the pointers to all these symbols okay so that makes my implementation space wise more efficient okay now next issue that comes is how do I handle keywords okay so how do I handle keywords okay so if I say so when I wrote something like a is a sign d plus c suppose I write a is a sign if plus 5 okay and as I start scanning my input from left to right I say that I encounter if and suppose in my language if is a result of keyword cannot be used as an identity file now how do I know that if is and as far as rules are concerned rule of construction of this is same as rule of construction of this okay so this will say it is an identity file how do I know that this is not an identity file how do I handle this situation but whatever we have seen so far there is a priority of something so if the keyword is conflicting with identity file then keyword here yes so I need to maintain a list of keywords okay if I maintain just a list of keywords then I know that first about saying that this is an identity file check it against the list okay and how do I maintain this list okay I can always initialize my symbol table by inserting all the keywords initially in it and whenever I do a lookup what will happen lookup will just say that this already exists in my symbol table and it is a keyword I do not have to worry about okay so this is really what we do that when I am saying that insert these as let's seems okay I just initialize my symbol table with all the keywords okay and since I have this lookup function lookup function is going to ensure that I will never conflict between a keyword and an identity file lookup function will always make sure because before I insert I always have to do a lookup so if I just do a sequence of inserts in the symbol table before the process of compilation starts okay then I have taken care of this and any subsequent lookup is going to return a non-zero value and therefore I will say that or this cannot be used as an identity file okay now as far as lexical analyzer is concerned is this an error no this is not an error right who will catch that this is an error which means parser will say that you cannot use a keyword in an expression right as far as lexical analyzer is concerned this is going to just return a sequence of tokens which will say identify, assign, keyword, add number and that's it okay so you must also remember what are the kind of errors which lexical analyzer can get and what are the kind of errors which subsequent phases are going to have okay so using a keyword inside an expression is as far as lexical analyzer is concerned or lexical analyzer is concerned is not an error but some subsequent phases are going to capture this information provided lexical analyzer has supplied information that this is a keyword and not an identity file yes okay so what are the kind of difficulties I may face what are the kind of flags I need to raise before I start designing a lexical analyzer okay so far everything sounds reasonable if not very simple so let's start now raising red flags okay so one the first red flag that comes is can I use a format which is this versus a format which is this these are two formats which express the same language okay what I have written this also in a form which will say so this, this and this will they mean the same thing depends on the implementation of what lexical lexical has nothing to do with implementation the reason is this is part of the specification of the language this language tells me that whether I can write an expression in this form or this form or this form okay if all the three forms are valid then I should be able to write design lexical analyzers which will say doesn't matter what is the input you give me even if you give me one character per input or one word per input or line of input I am okay with that okay versus their languages which will say no you cannot use this format you cannot use this format you must use only this format so there are issues like languages versus fixed format languages now this is not a part of implementation issue this is a part of language specification and lexical analyzer must implement whatever is specified by the language lexical analyzer is not supposed to deviate from whatever is the language specification so this is part of language specification it says that whether lexemes are in fixed positions versus it's a free format language right okay now do we know any fixed format language python and okay this is first language was the fixed format language fourth was the fixed format okay so whenever you deal with these formats okay you need to worry about whether you are in fixed format versus free format okay so typically if I will show you more examples of fixed format language and what kind of difficulties they can induce next issue we have to deal with this I said right in the beginning my input would be a sequence of characters and I need to tokenize it and then we said that I am going to define certain word boundaries and although I determine word boundaries I will say either I encounter a blank or some blank space that means either a sequence of blanks or tab or new line that will be a word boundary or some kind of punctuation which is like comma, full stop, semicolon and so on or I encounter a character from a different class of characters here and I come to a sign and I know it's a different class of characters this is the word boundary although there is no blank here okay but blanks typically most languages are used as word separators okay now let's look at slightly different situation so here is one expression let me write one more expression are the two same okay so answer is I don't know whether they are the same depends on the language specification now the language is we say that blanks even when it comes in an identifier let me just ignore it makes life more complicated okay blank can be put anywhere and as designer of the lexical analyzer I have to figure out what it means okay so blank here means saying that this is count ignore these blanks but don't ignore it okay this is to be treated like a blank similarly this blank and this blank can be ignored to say this is count but this blank cannot be ignored and this is end of count okay so even blank now becomes contextual okay where it occurs that makes a lot of difference so things like important blanks are just important in literal strings at rest of the places I just ignore blanks as if they don't exist okay so when I write counter versus count blank ER they are actually the same identifier okay and any idea I mean why such kind of format was prevalent is it by design by accident what was the reason I mean see as computer scientists and as student of this any subject I mean we must also know bit of history okay so I gave you one part of history like how compilers were designed initially right so is it by accident or by design do they like to make spaces do they do the programmers do they like to give spaces no actually it makes your program highly readable okay so programmers don't want spaces earlier programs were not typed earlier programs were not typed then they were written by hand they were written by hand and then how do I pass it on to the computer see whatever program you write earlier programs and today programs are all written by hand even today we expect that before you go to the lab at least some structure of the code in mind you will have some structure before you go and sit on terminal and start just banging on the keyboard but what happened was earlier so okay here is another example and this is really much more complex so take this and I will come back to this point of blanks once again so take this when I say do 10 I and there is a blank between DO10 and I is equal to 1.25 and this says do 10 I is equal to 1.25 okay now what in the first case is this is really an identifier which is DO10 I which is being assigned a value 1.25 okay real identifier okay and the second one says this is actually a for loop which says that I want to iterate from this statement up to the statement whose level is 10 and the iteration space is given by lower bound as 1 and upper bound is 25 in steps of 1 so this is do everything from this statement up to statement 10 for values of I varying from 1 to 25 so that is like a problem okay but interesting part is that I can also I do 10 I in this form I do not have to put blanks here okay so this now when I see this comma I have to figure out what I am dealing with is not an assignment but I am dealing with a for loop okay so I have to go back and tokenize this I must say that oh DO is a keyword and 10 is a label and I is an identifier okay so not only blanks are not important but in certain situation even if I do not put blanks okay that does not mean that I have the same token I need to take DO10 I and break that into three tokens okay and when will I know this when I encounter either dot here or a comma here interesting situation okay that means it does not matter how you write compiler is smart enough to figure out what you might that I could have written in any of these formats and compiler was able to figure out what was the language or what is it what was the intent of the program okay and it really happened the reason when we say it happened by design what it meant was I mean that was designed because we wanted to prevent certain accidents so the first line is a variable assignment second line is beginning of a do-do and reading from left to right one really cannot distinguish between these two till I encounter either this comma or a dot okay and why haven't we used these fixed formats and use these ignoring the blanks and not ignoring the blanks in some way because at that point of time we did not have these terminals where today I mean you have this board and terminal and you have an editor and you just go and type your program and visually you can see whether your program is at least space wise character not and then you can do various kind of formatting you can do editing before you pass it on to the compiler okay earlier we did not have that and really the way programs were typed was that we had these punch cards okay so you go and type your program on the punch card and it is almost impossible to correct a punch card you cannot correct a punch card you have to retype the punch card and people did not have that kind of time so situation was something like this that I typed my program on punch card each card will have one statement and having the thousand line program is no big deal right if you have thousand lines of program you have thousand punch cards okay now thousand punch cards will be a stack of something like a feed okay so this is typically our punch card loop I still have some old punch card for historical reasons and just take thousand of them so each punch card will have just one statement and you have thousand stacks and then you pass it on to an operator and go away this operator at some point of time will put it in card reader so there are hundreds of thousands of cards which are submitted every day this operator puts it and then suppose we had this kind of situation that you made in the state while typing a punch card okay you had no way of correcting it now if you come back next day and you find that only error in your program was because certain lines were not in certain right places okay the whole day is basically so you replace that punch card come again so whatever errors programmers could have made while typing punch cards compiler was trying to take care of that I mean today it's not required but for historical reasons when it persists because this language had you still have some old code and it persists okay so what happened here was that people were just punching their programs on this card and this card was then put on the card reader okay and you wanted to make sure that whatever errors you had during punching those are not really the errors which delay your execution and therefore all such errors you will try to tag in the phase of compilation and just ignore that because we did not have speed and interest at that point of time okay and that was an important use of punch cards what is the name of this card anyone knows what is this card called who designed it why it was designed it was designed before the computers it was designed sometime in 1880 for sewing machines sewing machines had cards but not these kind of cards they had bolts so sewing machines had designed cards but not these kind of cards I am talking about this specific card for US sensors data so this was designed by a statistician called Oliver in 1880 and he had a machine called tabulator and the first major use of this tabulator machine and this punch card was done was during the US sensors data in 1880 and really what happened was that if you have this punch card with certain holes you put it in certain stand up and there will be some connections which will be made so some leaders will do and where they can penetrate in the holes there the connection will be made and certain computations will happen so for the first time tabulation of all the sensors data was done and data on this company which was making these tabulators then after several murders actually emerged into IBM so that was perhaps beginning of IBM in some small way so let's move on and let's talk about different kind of problems so one problem we are saying is all this blank now what about this is this a valid statement in a program if it is a keyword in my programming language this is a valid statement in my program well I mean answer is if you say no that's not the correct answer you immediately have to ask the question is keyword a reserved word in my language so this keyword is not a reserved word then there is nothing wrong with this and the language is like PL1 which say keyword is not reserved and if they are not reserved I can write statement like this which is a valid statement and what is the statement it says if then then then is the signed else else else is signed then now it is for compiler to figure out which is a keyword which is a Boolean operator which can identify so here it says if that is true then that is signed else else else is signed then interesting right so answer is when I say that whether this is valid or not answer is not low but answer is is if a reserved keyword in my language or not and if it is then you know it is not valid but if it is not then I can write any kind of expression so you have to be aware of this here is another one if then then is signed another one this is also valid declarations there is oh we are running short of time so I think we will have to go here and then we will move on I think the next class is waiting outside so tomorrow we will start our decision of this one and those of you who wish you can keep these starts as soon as okay