 Welcome to the lecture on lexical analysis part 3. So, in part 1 we covered the basics of lexical analysis, the motivation etcetera and we covered the theoretical fundamentals, finite automata, regular expressions and transition diagrams in part 2. Today, we will continue a little more about transition diagrams and generation of lexical analyzers and then study lexical, you know lex which generates lexical analyzers. So, to do a bit of recap, transition diagrams are generalized deterministic finite automata, but there are a few differences. The edges may be labeled by a symbol, set of symbols or a regular definition. Some accepting states may be indicated as retracting states and when we reach a retracting state, we really do not consume the symbol which brings us to that particular state and then with each accepting state, there is an action which is executed when that state is reached. Typically, we use this action to return a token and its attribute value, but very important transition diagrams are meant for you know manual translation and not for machine translation. So, let us understand how exactly transition diagrams can be translated to lexical analyzers. So, the first transition diagram we will be considering is the one for reservoirs and identifiers. So, this is a fairly simple and straight forward transition diagram. We start in state 0 then on a letter, we reach state 1. We keep on consuming letters and digits in state 1 and when we finally, get some other symbol other than letter or digit, we reach state 2 where we return the token code say for the reserved word or for the identifier itself. So, how does this really translate to a program? So, the lexical analyzer you know is called get token and it returns a type called token. So, inside the lexical analyzer, we have two local variables my token and c. So, there is a while loop which runs until the end of file. So, which has a switch on the state. So, to begin with will always be in state 0 and then you know for this particular transition diagram will be in state 0 and as we read a character in state 0, if it is a letter we change to state 1, otherwise we report it as a failure that is because here the only legal character is letter and in state 1 we check whether it is a letter or digit after reading a character, if so we remain in the same state otherwise we switch to state 2. In state 2 we have actually we have got ready to announce a token. So, the symbol which brought us to state 2 is not consumed it is pushed back into the input the token is obtained by searching the table of tokens. So, if it is a reserved word then you know the reserved word token is returned by the search token and if it is an identifier then we simply get the string corresponding to the identifier put it in my token dot value and return my token. Similarly, when we want to write you know catch constants various types of constants hexa, octa etcetera, etcetera we start in state 0 and on little x or x we go to the hex constant part and on the octal digit we come to the octal part, we keep consuming digits here until we return a token. So, this is the scheme both for hexa digital and octal you know digits rather constants. So, this is fairly straight forward we remain you know if the character is a 0 then it is an octal constant. So, we go to state 4 and if it is not a 0 then it is a failure because it is an illegal character and in state 4 we check you know whether it is an x or x. So, it could be a hexadecimal constant or it could be an octal constant. So, instead you know if it is hexadecimal we go to state 5 and if it is octal then we go to state 9 and so the two constants hexadecimal and octal are separated by you know one of them the hexadecimal is prefixed by x whereas, the octal constant is not prefixed by x. So, if neither of these appear in the input then it is a failure. In state 5 you go on checking the for the hexadecimal digits and we go to state 6 on receiving a hexadecimal digit. We remain in state 6 until we get hexadecimal digits and then we either go to state 8 or state 7 depending on the qualifier presence or otherwise. So, here again you know we return the token called integer constant because hexadecimal constants are also integers. So, after and we evaluate the hexadecimal number and return its value as the attribute of the token. So, in state 9 we do something similar and check the octal digits here. So, and finally, we return into constant and the evaluated octal number as the token and its value. So, for integer constants it is even simpler any number of digits are consumed and then the qualifier or otherwise is checked and we return and into constant. So, the integer constants are very easy we check whether it is a digit otherwise it is a failure we consume digits and finally, in state 15 we get the we return the integer constant token along with its value. So, this is an overview of how the transition diagrams are translated into you know lexical analyzer programs, but there are some reality checks that must be done here. So, we actually saw program segments corresponding to different you know transition diagrams. So, these transition diagrams must be combined appropriately to make a big transition diagram and then that transition diagram will have to be translated to manually into a lexical analyzer program. Unfortunately, combining the transition diagrams is definitely not trivial and it is possible to you know one possibility is order the transition diagrams in particular order say transition diagrams for reserved words then constants then identifiers and operators. Try each of these transition diagrams in the in that order. So, in other words the programs are also listed exactly in that order. So, when there is a failure it goes to the next transition diagram and starts looking at that particular diagram. So, if this is followed you know it is fairly easy to program. However, this does not use the longest match characteristics in other word the word t h e n e x t would be an identifier and not a reserved word then followed by identifier e x t. So, it should not be rather, but what happens is if you actually order the transition diagram for identifier first the next would be an identifier, but if you order the reserved word transition diagrams first then you know you would have the reserved word then followed by identifier e x t. Really in reality it is better to use the longest match. So, in fact the next should be identifier rather than then followed by identifier e x t. So, how do we get this longest match the transition diagrams must be tried in you know all of them must be tried then all the matches must be recorded and the longest match must be used. So, if this is done or if the programmer is able to order the transition diagrams appropriately in either case the longest match can be used. Using lex to generate lexical analyzers really makes it easy for the compiler writer. So, we will see how this works in the next few slides. So, now you know so far we saw how to generate regular how to generate lexical analyzers you know manually using transition diagrams. This of that approach is fine as far as small lexical analyzers small languages are concerned. However, for professional languages such as c c plus plus java the lexical analyzers are very difficult to write by hand. So, there is a language and corresponding tool available for describing lexical analyzers. So, lex is such a tool available in unix. So, lex has a language for describing regular expressions which are at the heart of lexical analysis. So, you just write down you know all the regular expression specifications for each of the patterns that are used that we are going to detect in lexical analysis. Then it generates a pattern matcher for the regular expression specifications which are given to the lex tool and once it is done in the lex tool generates programs which are appropriate for the lexical analysis. We are going to see how this is done the general structure of a lex program is that you have definitions we will see what these are. Then we have what are known as rules and finally, we have user subroutines out of these definitions are optional, but rules and user subroutines are essential parts of a lex program or lex specification and on a unix system how do we use lex to create a lexical analyzer. So, you just type lex ex dot l and you create a c program by the name lex dot y y dot c then compile the program using g c c it produces e x dot o and e x dot o is your lexical analyzer which curves out tokens from its input. So, now let us see how exactly a simple lex program looks like and then go into some details. So, here is a comment lex specification for the example this is a c style comment and we do not have any definitions in this particular program. So, between these two percentage marks we have the patterns and then we have a little bit of you know user written code. So, this is very simple there is a main program which calls lex and there is a y y wrap program which wraps up the lexical analysis and in fact y y wrap hardly has anything unless we want some files to be closed explicitly. We want we will see examples of this little later coming back to the rules section or the patterns section. This is a pattern a dash z plus which stands for you know any of the letters a to z any number of times one or more. So, this is a set notation a you know a to z. So, all the characters a to z are in that set and a to z plus implies any of these characters any number of times ones are more times. So, if this pattern is detected then the echo statement generates you know echoes the pattern which is rather the text which is matching this pattern and then it prints out a new line character dot you know bar slash n indicates dot indicates any character, but new line and slash n indicates new line. So, if in the absence of a match here all other characters are actually matching here and we ignore them the semicolon is just empty code. So, it ignores all other characters. So, here is a sample. So, we have input here which contains both lower case and upper case characters, but the output filters and produces only the upper case characters. So, that is very easy to see. So, a to z plus matches all these w you know w e u etcetera etcetera and then for each of the matches it prints out the character switch match and followed by a new line and it ignores all other lower case or any number characters etcetera etcetera. The definitions section of the lex program it contains you know we have already seen regular definitions. So, they are similar to that there are regular definitions written here and there is also some code which can be included for initialization and other purposes. So, definitions are like macros and they are actually like short hands. So, and they have the fam name followed by its translation there are two simple examples here. The name digit really stands for the set 0 to 9 and the name number stands for you know the digit pattern which is defined here followed by digit star which implies 0 or more occurrences of digit. So, here we are writing a regular expression digit digit star and the digit part is defined here. So, these two together define a number as 0 to 9 followed by you know any one of 0 to 9 followed by any one of 0 to 9 0 or more times. So, that is the number and this is the you know definition for that number. So, when we use such definitions it becomes easier for us to write bigger regular expressions as we are going to see very soon. So, any code for initialization etcetera that we include here any variables that we want to use are all included between this percent bracket and present bracket. So, that is the initialization part. The rules part it contains the is the heart of any LEX specification or LEX program. It contains patterns and it contains C code. So, a line starting with white space or material enclosed in the percent bracket etcetera is C code. So, any C code lines are copied verbatim to the output to the generated C file they are not changed in any fashion and a line starting with anything else is a pattern drawn that is inside the rule section. So, pattern lines contain a pattern followed by some white space and then followed by some C code which is optional. So, that is here. So, there is a pattern followed by some white space then there is C code. So, this C code and the initialization C code are all copied exactly in the same order to the output C code file there is no change at all. So, what happens to the patterns? The patterns are nothing but regular expressions. They are first translated to NFAs then the NFAs are converted to DFA's. So, we have not studied the you know particular algorithm for optimization of DFA, but let me tell you that DFA's can be compressed and optimized. So, that they are compacted rather compacted and optimized the number of states of the DFA the number of transitions they make etcetera can all be made you know optimal and it so happens that you can find for the same language any other NFA or DFA will always be optimized to a particular unique DFA. So, these DFA's are stored in the form of a table and a driver routine as well. So, this is easy to understand the driver routine on a particular state looks up the table and then finds out what the action is and performs that particular action on that symbol. The action associated with the pattern is executed when DFA recognizes a string which brought you to that particular final state. So, once we reach a final state we may want to announce a token and that can be done with the help of this particular action. So, now let us go into details of the syntax of lex and then take up adequate number of examples to understand them. So, strings in lex are nothing but concatenation of various characters. So, integer then a 5 70 hello these are all examples of strings any symbols which actually you know can be put together and made into a string, but there are some operator symbols which need to be handled in a special way. So, for example the double code, the backslash, the square brackets, this carrot, etcetera, etcetera, star plus, mid, this you know bar, etcetera. They are all special operators we are going to understand their work and you know usage as we go on. So, to begin with the backslash is an escape character. So, if you want to use any its usage is similar to that of C. So, if you want to use any of these special characters in your string representation then you can say backslash question mark or backslash dot to get that particular dot as a character into a string. Now, very important notation is that of character classes. So, we have left bracket square bracket and right square bracket anything which is enclosed in between is a character class. So, inside only you know the backslash the dash and the carrot have special meaning all others are just characters inside the set notation. So, for example, the notation minus plus in one square bracket pair another pair contains 0 dash 9 plus. So, this stands for so minus plus is a set containing minus r plus and 0 to 9 plus is a set containing 0, r 1, r 2, etcetera, up to 9 followed by plus. So, this is a regular expression as you can see it starts with either minus r plus followed by you know any of the digits any number of times. So, similarly a to d 0 to 4 a to c. So, this says any character a to d followed by any of the let you know characters 0 to 4 and finally, any of the characters capital A to capital C. So, this is the set of characters corresponding to this regular expression notation. This notation says a character a, b, c inside the square brackets. So, this simply says it is a complement operator all characters except a, b or c. So, a, b, c as usual stands for a or b or c and a character in front complements the set and it says all characters except a, b or c and this includes all the special and control characters as well. So, two more examples plus backslash minus followed by 0, 5 plus. So, as I said this backslash really stands for the backslash minus really stands for the minus operator itself. So, it is either plus or minus. So, here the difference is minus started in the beginning. So, it was actually understood as the minus character, but here we wanted to make it the second character which cannot be done because it is used to you know used as when it is used as second character there should be a third character as well. So, to get rid of this problem we use a backslash in order to put this man in minus character. So, and the 0 to 5 plus is 0, 1, 2, 3, 4 or 5 plus. So, again character a to z, a to z says all characters which are not letters. So, this is a to z is all lower case characters, capital a to z is all upper case characters and the character complements them. So, it all characters which are not letters. So, that means you know the all the digits special characters etcetera all included in this set. There is a dot operator which matches any character except new line and the question mark operator is used to implement the epsilon or null string option. For example, a b question mark c stands for a followed by b or epsilon followed by c. Then we also have repetition, alternation and grouping characters. So, a b bar c d plus question mark e f star. So, this is a regular repetition which stands for. So, observe that we have used ordinary parenthesis and not the square brackets. So, a b is the string a b, c is the string c, d plus stands for d any number of times. Then the question mark says epsilon in place of any of these. So, that is bar epsilon and then the e f star says e f repeated any number of times. So, that is the e f star. So, this is the regular expression notation in the ordinary form whereas, this is the lex notation. There are context sensitivity operators as well slash then the carrot and dollar. So, if we use carrot as the first character of an expression, then the expression is matched only at the beginning of a line. So, for example, if you say carrot a b is the pattern on its own, remember the carrot character inside the square bracket has a different meaning. This is outside the square bracket. So, this means line beginning with a b. So, it is matched only if the line starts with a b, a b dollar the dollar is the end of line end of you know it matches only the last if the line ends with a b. So, dollar if the last character of an expression is dollar then the expression is matched only at the end of a line. So, then the look ahead operator is a little more complicated. We want to say that d o is a pattern which should be matched provided there is a pattern following it and the string matches this particular pattern. So, that is letter r digit star equal to letter r digit star comma. So, if this entire big pattern matches after the d o then d o is a pattern is a string which is to be pattern which is to be matched not otherwise. So, in that sense the slash operator is a look ahead operator which looks at all this the pattern which is following, but it does not consume the symbols which match the following pattern. It consumes only the two characters d and o the rest is going to be you know matched again after the d o matches finalized. What are the actions of lex for example, the default action is to copy input to output. So, those characters which are unmatched. So, if there is no match it just copies it to output. So, we need to provide patterns which really catch characters that is our purpose and what is caught is retained in the buffer y y text. So, echo actually empties the y y text into the output y y length contains the number of characters which are matched and whose character string is present in y y text. So, lex always tries the rules in the order written down and the longest match is preferred. So, remember we discuss this longest match requirement in the lexical analyzer and you know the transition diagram part. So, for example, here this is a word integer and here is a regular expression corresponding to any lower case character a to z plus. So, corresponding to it the integer is action 1 and corresponding to the second one is action 2. So, if you have a minor variation and we say it is integers, then really speaking integer there are several possible matches. Integer matches the first part, yes matches the second part. It is also possible to match the entire thing integers in the second with the second pattern as well. And since the longest pattern is always used it this is the pattern which actually matches our input integers. So, now let us start looking at several examples to understand how exactly lex performs its matching. So, in fact the files which contain these programs are all mentioned here e x 1 2 etcetera. So, they will all be available in the NPTEL repository and you can download them, compile them and execute them to see whether the output is generated properly, try with different variations and so on and so forth. So, for example, this is the program which we already saw. So, let us not discuss it once more. It captures all the uppercase characters in the input. So, a to z plus is printed out along with a new line. So, this is the input. So, all the uppercase characters are printed out along with new lines that is a fairly simple and straight forward program. So, let us take another simple example. So, the first pattern says you know beginning with any number of blanks 0 or more blanks followed by a new line. So, this is the pattern. So, in other words the simple explanation is all blank lines. So, they may contain a blank space character or they may contain only a new line character or they may contain several blank spaces followed by a new line. But in essence they must all be blank lines completely blank lines without containing any other character. So, if this is the pattern which is got it is ignored there is no action corresponding to it. Then remember the ordering of these as well. So, if it is a new line character the new line is echoed and the variable y y line number is incremented. Any other character apart from new line dot star any number of them is caught in the third pattern and when this matches it prints out y y line number corresponding to that particular line and y y text which matches the pattern dot star. So, this is printed out. So, the y y wrap routine does nothing it just you know is required for to complete some formality and in the main program we initialize y y line number to 1 and call y y legs. So, the way this operates the in the input all the blank lines are ignored and then since the longest match is suppose there are non blank lines containing other characters as well along with blanks. Then they would match this, but the new line will not match dot star as I told you dot matches only all characters except new line. So, all the characters all the lines which contain non blank characters will match this first for all the non blank characters except new line and this next for the new line character. So, between these two they print out the characters and also the line number. So, let us see how the input is transformed to an output. So, here is a non blank line this is another non blank line then we have a blank line. Then there are a few blank spaces followed by non blank characters another blank line non blank line and another non blank line. So, overall there are 1, 2, 3, 4 and 5 non blank lines. So, we have 1 the first non blank line 2 followed by the second blank non blank line 3 the third 4 the 4 and finally 5 the 5th. So, remember the blank characters which are here you know in the input are also copied what is important is that they contain non blank characters. Suppose we had after this a couple of blanks followed by a b this particular line would still be written out as all this followed by 2 blanks or 3 blanks whatever is the in the present in the input followed by a b. So, the blanks in the non blank lines are all copied as they are, but if the input is a complete blank line for example, a third line here and the fifth line here they are complete blank they are not copied to the output they are ignored. So, this is what this was supposed to do all the non blank characters caught here the new line caught here, but all blank characters blank lines caught here. So, now the example becomes a little more complicated. So, here we have a code part in which we describe you know there is a declaration called file star decl file. So, we have declared a file variable for use in our actions later on and then we have a remember the this declaration is a user return code and it is enclosed in these two special markers following that are a number of declarations rather patterns. So, finally let us go to the next page first the in the rule section what we have is a regular expression called declaration which is explained in the previous slide we will come go back to it in a minute. So, whenever a declaration is caught it is printed out to a file called decl file which we have already declared and what is printed out the corresponding text corresponding you know which was matched. In the y y wrap routine we just call f close to close the declaration decl file variable and in the main program we open the decl file for write purposes and then call the lexical analyzer. So, let us look at the output and go back to the patterns after that. So, this is the input this is the you know matched output and this is the rejected input. So, as if you look at the matched input which is printed out as output. So, we have float c d comma e f and then we have int g h t comma a s j h e w 37 comma f u i r comma g j 45 followed by semicolon and then we have a float i r e comma d e h a 80 semicolon. Now, let us see what happens in the what is present in the input there are a number of you know variables here followed by semicolon then there is int a b comma float c d comma e f and then another set of characters another int g h t etcetera. So, in this input which seemingly seems to be meaningless there are actually some c type declarations. So, the purpose of the lex program that we are now going to discuss is to extract these meaningful c declarations from this seemingly meaningless input and the rest of it is ignored. So, this is the our exercise so let us see how we can do it. So, there are a number of short hand notations that we write down for example, blanks. So, blank or tab so observe the you know back slash escape character for tab. So, blank slash tab so blank or tab any number of times letter is any lower case letter a to z that is the notation for set anyway. The digit short hand is 0 to 9 then an identifier would be a letter r an underscore followed by a number of letter r digit r underscore any number of times. So, this is the standard you know pattern for identifiers that we know of any letter followed by letter or digit star, but we have also included the underscore here. Then we have a number which is one or more digits so digit plus this is the regular expression notation digit digit star then we permit an array declaration part which says an identifier followed by the left square bracket followed by a number followed by the right square bracket. So, observe that this is a pattern which uses patterns declared before. So, in other words for the array declaration part we must have a name followed by a number. So, that is exactly what we have in our output here name followed by a number name followed by a number etcetera. So, that is the array declaration part and then the short hand notation declaration part says either an array declaration or a simple name. So, the simple name is you know any name that we have already seen here for example, this G H T I R E D E H etcetera are all you know I sorry F U I R etcetera are all simple names. Then the declaration list basically it is a number of declarations. So, declaration part followed by any number of blanks then there must be a comma any number of blanks again and then you know followed by this entire thing declaration comma is repeated any number of times including 0 followed by declaration part. So, in other words we want to generate a list of declarations. So, this helps us in doing that then a complete declaration would be beginning with a resolved word int or float followed by a declaration list. So, this gives us something like this you know float C D comma a int G H T S G H W 37 etcetera. So, this is the int this is your declaration list containing either a name a simple id or an array declaration part. So, separated by these semicolons and ending with a sorry separating by commas and ending with a semicolon. So, each of these is a repeated as in any number of times. So, there is one declaration here another here third here. So, you can have as many of them as you wish. So, that is our declaration and then we have you know a the pattern in the rule section which is a declaration. So, here this is what we are looking at this is the declaration that we know of. So, any number of them. So, declaration whenever there is a match for the declaration it prints out that particular declaration. So, observe here each declaration has int or float followed by a declaration list. So, this is one declaration. So, whenever this declaration is followed by of course, a semicolon here. So, whenever this declaration is matched in the input it prints out that particular declaration into the declaration file all others are a characters which are ignored. So, now let us see how it works here. So, this sequence you know is a sequence of characters it is an identifier, but then suddenly ends with a semicolon. We are going to start processing it as a declaration only when it starts with int or float. So, this part is ignored. So, this starts very promisingly with an int then there is an identifier also and then there is a comma as well, but then suddenly you know instead of a semicolon we have a float. So, therefore even this sequence of characters is ignored and copied into the rejected part whereas, again it starts with a float. So, a new declaration pattern is you know going to start here and this happens to be a correct declaration because it says float c d comma e of semicolon. So, that part is separated out as a declaration. So, after the semicolon it ends. So, we start processing a new declaration if possible. So, this entire thing does not correspond to a declaration because it does not begin with int this begins with int. So, int g h t a s j h e w 37 f u i r comma g j 45 and semicolon this entire thing is a valid declaration it is copied to the output as valid. This is you know invalid. So, it is a rejected this is valid. So, it is copied here and this is invalid. So, this is also rejected. So, the example really shows that for processing or catching declarations in a programming language you know you can actually write lex programs for it even though we are going to do this more meaningfully using yag a little later. That is after we learn parsing we will see how to do all this using yag which is actually a more meaningful way. But it is not as if lex is less powerful lex can do quite a bit of work it can actually catch all the declarations in the program etcetera. That is simply because the declaration part of a program is really regular in nature you know it is corresponds to a regular language. Whereas, when we get into a nesting of statements and records etcetera it does not happen to be a regular language anymore and we will require a more complicated machinery. Now, we move on to the next example that we have that is the you know how do we combine identifiers, result words and then hex constants, oct constants, normal integers etcetera etcetera. So, I told you that this is a fairly intricate problem when we dealt with transition diagrams. So, let us see how you know you are writing a lex specification simplifies this problem and makes it easy for us to write the specification. So, this lex program also has an initialization code. So, we have a variable hex which is initialized to 0, variable oct which is initialized to 0 and variable regular which is initialized to 0. Then, we have a host of these patterns we will discuss them shortly. So, here are a host of these patterns corresponding to reserved words and then the other constants as well and finally, you know a main program which calls y y lex. So, let us also see the input and output and understand it before we discuss the patterns. So, obviously this is the input and this part is the output you or me happens to be a simple identifier it is caught as an identifier while is a reserved word. So, it is printed out as a reserved word. We have 0 3 4 5 l a. So, out of this it actually the 0 3 4 3 4 5 l happens to be an octal constant 229, but the a part is a simple identifier which is printed out as an identifier. Similarly, we have 4 5 6 u b the 4 5 6 u part is a normal integer 4 5 6 and the b is an identifier b here is a hex constant 0 x 7 8 6 you know l. So, this much is a hex constant 1 9 2 6 and then the h a b c part is an identifier and now watch this we have b followed by 0 x 3 4 here it is not recognized as b an identifier followed by a hex constant because of the longest match characteristic b followed by this entire sequence of 0 x 3 4 is actually matched against an identifier and it is denoted as an identifier. So, you know this is how the longest identifier match is useful in identifying various tokens. So, let us go back to the patterns and study them in full. So, here is a letter which is already well known a to z or a to z followed by r underscore. So, any little lower case or uppercase character r underscore is a letter for us digit is 0 to 9 digits is digit any number of times 1 or more digit number octal is just 0 to 7 0 1 2 3 4 5 6 r 7 digit hex would be 0 to 9 or a to f. So, any of these are hexa digits integer qualifier is u capital u lower case l or capital l blanks as usual blank or tab any number of times 1 or more. Now, we look at identifiers. So, letter followed by a letter or digit star. So, this is a well known regular expression for identifiers. Then we have integers digits followed by integer qualifier optional. So, this question mark remember implements epsilon. So, this is really speaking integer qualifier or epsilon digits followed by integer qualifier or epsilon would make up an integer. This constant is 0 followed by either this is remember this is square bracket x x square bracket. So, either little x or bigger x followed by hexa digits any number of times once or more followed by integer qualifier either you know or epsilon. So, we have any 0 followed by x followed by any number of x digits of course, integer qualifier is optional. Octal constant similarly is 0 followed by octal digits any number of times followed by integer qualifier optional. Then we have the reserved words if else while and switch. So, now we come to the rules section with a percent percent. So, for the 5 reserved 1 2 3 4 4 reserved words the action is simply print out the reserved word and then the identifier. So, since we have written down the reserved words and then the identifier the characters which correspond to these will never be matched against identifier. Whereas, if we have a word such as switches then even though part of it matches here it will be the longest match will be that of an identifier. So, if we place the identifier before i f that is the reserved words here be sure to try it out once these 4 will never match everything matches against the identifier. Now, the hexadecimal constants when they are caught the s k n f is used to read the integer inside the hexadecimal notation and print it out along with the text. Similarly, octal constants are converted to decimal constants using the scan f function and print it out. Integers are also converted to the normal integer the characters are converted to normal integer and printed out. Finally, any other character which is caught is ignored so dot or new line they are all ignored. So, this is what we just now went through. So, the patterns are all caught and the output is generated. So, let us look at the floating point numbers which are a little more complicated than normal integers. So, as usual we have the lexical analyzer program for floats in C the digits are 0 to 9 plus and then we have an exponent which is either big e or small e followed by plus or minus it is optional the sign is optional followed by digits. So, if we write e plus 9 or e 9 it amounts to the same number same exponent then the blanks blank tab or new line any number of times and floating qualifier is f, f, l, l. So, then there are the patterns. So, let us look at the output and then go back to the pattern itself. So, 1, 2, 3 then 3, 4, 5 dot dot 4, 5, 6, 5 dot 3. So, really the first output is generated here because we do not catch pure integers we do not catch anything which does not have a blank after a dot. So, this is ignored. So, 4, 5, 6, 5, 0.3 is you know optional integer part. So, this could have been a dot 3 as well 6, 7, 5, e minus is float with no fraction then 5, 23, 0.4 e plus 2 again with optional integer part. Next one is also with optional integer part 234.3.4 corresponds to 3.4 234 dot should have had a blank after that. So, it does not. So, that is ignored 3.5 dot has a blank. So, that is caught here as optional float with optional fraction then dot 3, 4 e plus 0 9 l is caught as an optional integer you know possibly no integer part here and similarly in 987 is without fraction and the last one with optional fraction. So, let us study the patterns here. So, these are the first part first one is digits followed by exponent followed by optional floating qualifier followed and then followed by blanks. So, remember the look ahead operator. So, this pattern is valid provided it is followed by blanks otherwise it is not valid. So, this will match float with no fraction. So, digits followed by exponent. So, this is and this of course, is optional. So, this corresponds to floats with no fraction. So, float with no fraction is here 987 e minus 6 f here we have a number of digits you know 0 to 9 star 0 or more time iteration here followed by a dot compulsorily. So, before the dot there can be an integer part or no integer part. So, this is optional integer part, but once there is an optional integer part we must definitely have digits followed by an optional exponent followed by an optional floating qualifier follow and of course, it matches only if there are blanks after that. So, the integer part is optional but the rest is compulsory at least the digits part is compulsory exponent is optional. Here we have digits followed by a dot followed by the optional fraction 0 to 9 star optional exponent optional floating qualifier, but followed by blanks compulsorily. So, these are the various patterns which match here. So, one with no fraction second one with optional integer part third one with optional fraction. So, these are the various outputs that we already looked at. So, let us look at another example which is used with the desk calculator as a lexical analyzer to generate a lexical analyzer for the desk calculator which we are going to study later. Number is 0 to 9 plus followed by a dot optionally. So, this is a dot or epsilon followed by 0 to 9 star sorry this is a this is r. So, this is the bar here which gives you two options. So, either 0 to 9 plus followed by a dot or 0 to 9 star followed by a dot then followed by a number of digits. So, the difference between the two is the integer part here is compulsory whereas, the fraction is not where here the fraction the integer part is optional, but the fractional part is compulsory. You cannot have both of them as optional otherwise you would have just a dot name is as usual a to z a to z followed by a to z a to z 0 to 9 star. So, this is the identifier part. So, when a number is caught it is converted to a number using a scan f and it returns a token called number. The value actually you know is here you know. So, that is read into a variable called y y l val dot d val which is a parser variable. So, I am introducing this example just to show how interfacing with yak happens. So, y y l val is a parser variable used by yak and this value of number is put into that variable. Whenever there is a name caught it is looked up in a symbol table and if the name is present you know that pointer is returned to is otherwise if the name is absent in the symbol table it is entered into the symbol table and the token name is returned. For plus plus we return post plus minus minus we return post minus for a dollar we return a 0 and new line or any other character is just written that simple character itself. So, this is just to show you that it is possible to do some symbol table operations along with name search etcetera inside a lex program as well. So, lex programs can become as technical and as complicated as we want them to be. So, with this background let us stop the lecture in the next lecture we will start studying parsers. Thank you.