 Welcome to the second part of lexical analysis. So, in this lecture, we are going to continue our discussion on lexical analysis. In the last lecture, we studied lexical analysis, its purpose, you know why should lexical analysis be separated from syntax analysis, what exactly are tokens, patterns and lexemes and also the difficulties in lexical analysis. And I gave an introduction to finite automata. Today, we will continue the discussion on finite automata, study what are transition diagrams, regular expressions, etcetera, etcetera. To do a bit of a recap, let me go through the material on non-deterministic finite state automata. So, here is an example, typically a deterministic automaton has exactly one transition on every symbol of the alphabet from each state, whereas in the case of an NFA, there will be more than one transition possibly even 0 number of transitions on a particular symbol in the alphabet. So, for example, here from the state 0, we have transitions on 0 here and also here. So, 2 transitions on 0, 2 transitions on 1 as well, but from the state q 3, we have a transition on 0, but there is no transition on 1. Similarly, from the state q 1, we have a transition on 1, but no transition on 0. So, any of these combinations are permitted in the case of a non-deterministic finite state automaton. So, therefore, what happens is because there may be more than one transition on a particular symbol from a state, the transition function cannot be captured very simply you know as q cross sigma to q, but it becomes actually a mapping from q cross sigma to the power set of q. That is, we provide a set of states for each symbol. So, for example, from the state q 0 on 0, we go to either state q 0 or q 3. Similarly, on symbol 1, we go to state q 0 or q 1 etcetera. And if there is no state on a particular symbol, for example, from q 1 on 0, there is nothing. So, we indicate it as 5. So, this is the difference between the deterministic finite state automaton and non-deterministic finite state automaton. And the basic theorem of automata theory is that NFA, every NFA can be converted to an equivalent DFA that accepts the same language as the NFA. So, now let us study how exactly an NFA can be converted to an equivalent DFA. In other words, here is an example. So, this is an NFA and we should construct a DFA, which accepts exactly the same language. So, the NFA here has two transitions on symbol a, its alphabet is a comma b. From the state q 1, it has two transitions on symbol b and from the state q 2, there are no transitions. So, from q 0, there is no transition on b and from q 1, there is no transition on a. The way we proceed to construct the DFA is a fairly straight forward demand driven approach. So, the start state you know is q 0 and every state of the deterministic finite state automaton is indicated in square brackets here. So, as we already saw the state it is possible to make a transition to a set of states from each state of NFA. So, quite logically each state of the DFA would correspond to a subset of the total set of states. So, for example, p 0 has just one state q 0 of the NFA, whereas p 1 has two states q 0 comma q 1 from the NFA, p 2 has q 1 and q 2 from NFA and p 3 has nothing. So, it is a phi state that is actually an error state. So, we always begin with the state q 0 of the NFA and that is regarded as a start state of the DFA as well. So, let us say p 0 is q 0, the notation is very clear it consists of the state q 0 of the NFA and from each of the states in this set we find out the transitions that the NFA would make. So, in this case from the state q 0 the NFA makes a transition to either state q 0 or to state q 1. So, the state q 0 comma q 1 in a combined fashion becomes the new state of a DFA. So, from p 0 on a we go to p 1 which actually is a combination of q 0 and q 1. Now, from p 0 on b, let us see what happens. So, from q 0 on b there is no transition that means there is an error. So, from p 0 on b we go to an error state which is called as phi. So, p 3 is phi. So, we mark that as the error state and the transition every time we make a transition to phi state we accurate make a transition to p 3. So, the p 0 to p 3 transition will be labeled b. Now, so we have now two states p 0 q and p 3. So, and we have covered all you know all the transitions from p 0. So, let us see what happens to the p 2 mind you has not been constructed yet, but p 1 has been constructed. So, from the state p 1 let us see what happens to the DFA on the symbols a and b. So, to capture the effect of the DFA on symbol a from the state p 1 we look at the transitions of the NFA from both the states q 1 and q 0. So, from the q 0 on a it remains either in q 0 or close to q 1 that is it can actually remain in the state p 1 itself. So, that is the effect of q 0 and from q 1 there is no transition on a. Therefore, this particular thing does not add any extra to this the transition of delta p 1 you know on a. Next what about b? So, from q 0 on b it goes it has nothing and from q 1 on b it either remains in q 1 or goes to q 2 this state does not exist. So, we create a new state combining q 1 and q 2 and let us call it as p 2. So, there is a new state p 2 and the transitions from p 1 on b is to the state p 2. So, quite understandably you know the transitions of p 2 on b would be to itself. So, q 1 on b goes to q 1 q 2. So, and q 2 does not add anything to the transitions. So, therefore, p 2 on b remains in the state p 2 and on a from p 2 let us see what happens q 1 goes to error state and q 2 also goes to error state on a. So, therefore, from p 2 we have a transition to p 3 which is the error state and now from the error state either on a or on b we remain in the error state. So, we have self loops for both a and b. So, these are the only possible states that we need to construct it is not necessary to construct all the subsets states of the NFA. For example, here q 0 q 1 q 2 together give rise to 8 combinations each of which is a possible DFA state, but in this case this DFA has only 4 states it is possible to find you know examples where 2 to the power n states would be constructed. So, just to summarize what I told you so far the start state of the DFA would correspond to the set q 0 and will be represented by square bracket q 0 that is the set of the DFA. Starting from delta of q 0 comma a that is we try to find out the transitions of the new automaton on a the new states of the DFA are constructed on demand. Each subset of the NFA states is a possible DFA state, but it is not necessary that all of them would be required in the DFA. And how about the final states? So, if you look at the picture it is easy to see that q 2 is the final state here. So, all the subsets which actually include p 2 in this case it is only q 1 q 2 which corresponds to p 2. So, those would be marked as final states. So, all the states of the DFA containing some final state as a member would be final states of the DFA. So, and here is a description of how exactly we arrived at it it is notation. So, I would request you to go through it in detail and understand it. In the worst case the converted DFA may have a 2 to the power n states where n is the number of states of the NFA. There is another variety of a non deterministic finite state automaton which is called as epsilon NFA epsilon is the empty string as we know. So, for example, there is a transition from q 2 to q naught on epsilon and here is a transition from q naught to q 1 on epsilon q 1 to q 2 on epsilon etcetera. So, this is a silent transition and it does not consume any input. So, in this particular case q naught has one transition on a to q 1 another transition on b to from q 1 to q 2 and one more transition on a from q 2 to q 0, but there is a silent epsilon transition also possible from q 2 to q 0. So, the implication is once the automaton arrives in the state q 2 it can make a silent transition without consuming any input to the state q 0 or it can consume the input and then go to state q 0. So, let us see how that is represented in the NFA there is a algorithm which can convert epsilon NFA to equivalent NFAs and DFAs, but this is not exactly required for our discussion. So, I would give examples and then continue. So, from q 0 on a it goes to q 1 which is as in the original NFA and from q 1 on b we arrive at q 2, but q 2 can also go to q 0 on a silent transition. So, that means effectively from q 1 on b we may either stop at q 2 or go to q 0. So, we add transitions from q 1 to q 2 on b and we would also add transition from q 1 to q 0 on b to capture the effect of this epsilon transition. Finally, the epsilon transition is removed and we have just a transition from q 2 to q 0. So, let us see how it works in this example. So, from q 0 we have a transition on epsilon to q 1 the implication is on any number of zeros the NFA can remain in q 0 or it can make a silent transition to q 1 and finally, it can also continue to make a silent transition to q 2. In other words on consuming a sequence of zeros the automaton can remain in q 0 or in q 1 or in q 2 and since this happens we would actually you know we can also consume just epsilon in each case takes it to q 1 and then to q 2. So, the string epsilon is also accepted we make both all the 3 q 0 q 1 and q 2 as final states to indicate that epsilon is accepted and then a 0 is of course, accepted by making 2 epsilon transitions and then you know it can make a transition from q 1 again to q 2 on any number of ones. So, from q 0 we add a transition on 0 to q 1 the epsilon transition is removed and from q 1 we add a transition on 1 to q 2 and we also add a transition from q 0 to q 2 on 0 indicating that there is a possibility of epsilon transition all the way to q 2. So, this particular converted NFA accepts the same language as this and epsilon NFA, but sometimes epsilon NFAs are easier to construct specially in an algorithmic way which we are going to see a little later. So, now so far we studied finite state automata and looked at some of their properties conversion of NFA to DFA what would be the language accepted or recognized by DFA etcetera. As I said there is the DFA's and NFAs are machines whereas, it is possible to have a finite representation of regular languages in the form of what are known as regular expressions. So, regular expressions are specifications that and we are going to use regular expressions to specify lexical analyzers and that is our interest in these regular expressions. So, let us first define regular expressions this is really an inductive definition. So, we have three base clauses phi. So, for example, sigma let us say is an alphabet the regular expressions over sigma and the languages they generate or denote or define as below. So, the null set is a regular expression by definition and the language of phi is also a null set phi the empty string epsilon is also a regular expression by definition and the language of language generated by the empty string is the set containing the empty string. Please note the difference between phi and the set containing epsilon phi contains nothing whereas, this set contains the null string epsilon and for each symbol of the alphabet sigma let us say a the symbol a itself is a regular expression by definition. So, the language of that single symbol the regular expression is the set containing that particular symbol a itself. So, these are the three base clauses and now the what follows are the inductive clauses. So, now let us say we are given two regular expressions R and S and these represent the regular languages capital R and capital S. So, first combination mechanism is concatenation so R concatenated with S. So, R S is a regular expression and the language of the combined regular expression concatenated regular expression R S is also the concatenated set R dot S. So, you concatenate R and S what does that mean you take one element from R another element from S and concatenate them. So, in the case of strings it is string concatenation. So, x y such that x in R and y in S. So, this is our new language for the concatenated regular expression R S. Now, another operator is the plus which corresponds to union in the case of languages. So, R plus is the regular expression. So, and the language of the regular expression R plus S is the union of R and S R union S. So, expression in the sentences or strings generated by either R or S are both in the regular expression language of the regular expression R plus S. And finally, the third way of generating regular expressions using composition is R star or the clean a closure. So, this operator star is nothing but you know applying concatenating R a large number of times including 0. So, epsilon is 1 and then R R dot R R dot R dot R etcetera etcetera etcetera any number of times. So, this entire you know all these regular expressions are combined with a plus. So, in other words epsilon plus R plus R dot R plus R dot R dot R etcetera etcetera. So, this entire sequence is our new regular expression. So, that is that the language corresponding to this new regular expression is R star which is defined as the union infinite union i equal to 0 to infinity of R to the power i. So, you have R to the power 0 and then R to the power 1 R square R cube R 4 etcetera etcetera. So, we have so many of these you know strings generated by R star. So, it gives us an infinite string really infinite sorry language really. So, whereas these two if R and S are generate only finite languages R dot S and R plus S generate only finite languages whereas, even if R generates a finite language this R star really generates an infinite language. So, L star is known as the clean a closure or the closure of L to put it simply. So, now let us take examples of regular expressions. So, far we only looked at the way we combine regular expressions. So, if you consider the regular expression 0 plus 1 star this corresponds to the language set of all strings of 0s and 1s. Let us see how. So, how does one generate the string 1 0 1? So, what you have is 0 plus 1 star now you unroll this star 4 times. So, in other words you have 0 plus 1 followed by 0 plus 1 followed by another 0 plus 1 and finally, epsilon. So, all this is permitted because this is R star. So, I can concatenate R's any number of times. So, I have concatenated it 4 times. So, this R first R generates 0 plus 1 second R generates 0 plus 1 this is the third R and the fourth R generates epsilon. So, in each of these this regular expression is 0 plus 1. So, I can generate the string either 0 or 1 from each of these. So, let me make this generate a 1 this can generate a 0 this can generate a 1 and this of course is an epsilon. So, concatenating all these strings we really get 1 0 1 epsilon of course gets absorbed into the string 1 0 1 itself. So, in this way since you can replicate 0 plus 1 any number of times we have done it 4 times here you can generate 40 you can do it 40 times or 400 times or any number of them. We can generate strings all strings of 0s and 1s each of them each of these replications can generate a 0 or 1 thereby we can generate every one of the strings possible from the set combination of 0s and 1s. The next example the same 0 plus 1 star, but then it is followed by 2 0s and then again we have a 0 plus 1 star. So, this 0 plus 1 star as before can generate any strings of 0s and 1s this can also generate any string of 0 strings of 0s and 1s. So, the only property that is true for all the strings is that there is there are 2 0s together in the middle. So, this can generate only 1 and this can generate only 1s, but these 2 strings you know 0 and 0 will remain in the middle. So, the only property that we can state for this language is the set of all strings of 0s and 1s with at least 2 consecutive 0s mind you this can generate any number of 0s as well. Therefore, we can only say at least 2 consecutive 0s because there may be 0s here and 0s here as well. The third example there is a 0 star 1 0 star 0 1 0 star followed by 1 0 star plus epsilon. So, again the only property we can state for each of the strings generated by this regular expression is W has 2 or 3 occurrences of 1 the first and second of which are not consecutive. So, how can it generate 2 of them definitely 2 because 0 star 1 and then 0 star. So, 0 star can generate possibly even epsilon that is the worst case. So, in that case still 1 is generated this 0 star can also generate epsilon in each case this is a 0 which is definitely generated. So far we have 1 0 and then we have another 1 let us say even this 0 star generates epsilon. So, we still have just 1 0 1 followed by 1 0 star plus epsilon. So, if we assume that just this epsilon is active and 1 0 star is not used in the generation we have our string 1 0 1 which has 2 1s. If we instead of using epsilon we use 1 0 star we would have generated an extra 1. So, that is the third one. So, what it says is the first and second are not consecutive simply because there is a 0 which is being generated here. So, this is the mechanism using which we can generate strings from regular expressions. So, this is a the fourth example 1 plus 1 0 star which says set of all strings of 0s and 1s beginning with a 1 which is very easy because both alternatives generate strings starting with 1 and not having consecutive 0s. So, we generate a 0 and then in the next strings of 1 plus 1 0 we again generate a 1. Therefore, the 0 gets separated by another 0 with a at least 1 1. Therefore, consecutive 0s are not possible. And fifth example is 0 plus 1 star 0 1 1. So, this is very easy any strings of 0s and 1s ending with a 0 1 1. So, some more examples from the letters there are the previous examples for all from 0s and 1s. So, you have c star followed by a plus b c star whole star. So, you can only say the strings do not have any substring a c because again a plus b c star make sure that a c is not a substring. And this is r equal to a plus b star a. So, that means it ends with a and then we have the reserved words of a simple language. So, if then else why will do begin then to write a regular expression generating this set L we just attach all the strings with pluses that gives us a regular expression corresponding to this particular set. This is the way we are going to write regular expressions for various reserved words of our languages. So, one level above the regular expression is a regular definition. A regular definition is really a sequence of equations and this is more you know used as a shorthand for writing specifications of lexical analyzers. So, let us understand what these are. So, we have definitions of the form d 1 equal to r 1 d 2 equal to r 2 d n equal to r n which each d i is a distinct name and r i is a regular expression. And each of these r 2 r 3 etcetera can use the previous d i in general. So, r i is a regular expression over the symbols sigma union d 1 d 2 up to d i minus 1 an example we will make it very clear. So, let us define identifiers which are nothing but names used in programming languages. So, that is and integers. A letter let us say we consider only 5 letters a b c d e. We can define it as a regular expression a plus b plus e plus d plus e which generates the language a comma b comma c comma d comma e. Then digit similarly with just the 0 1 2 3 4 can be defined as 0 plus 1 plus 2 plus 3 plus 4. So, these are two very simple regular expressions extended to become regular definitions headed by letter and digit. So, what is a identifier? So, this is nothing but a name. So, all names must begin with letters followed by either letter or digit star. So, combination of letters and digit following a letter and a number would be any number of digits. So, not epsilon of course. So, digit followed by any number of digits. So, this makes the two definitions identifier and number a little more understandable instead of writing this number as 0 plus 1 plus 2 plus 3 plus 4 followed by 0 plus 1 plus 2 plus 3 plus 4 star. So, it would be very difficult to understand in a large definitions without using short hands of these regular definitions. So, that becomes even more clear in the following example here. So, digit is 0 to 9 combined with a plus then digits would be digit followed by digit star and optional fraction is digits then plus epsilon and optional exponent would be e followed by plus or minus or epsilon followed by digits and of course, epsilon indicates that there may not be any exponent as well and unsigned number is very clear now there are digits there is an optional fraction and optional exponent. So, the fraction and exponent may be absent then you still have the integers and if you have digits followed by the fraction being present optional fraction being present you would have fixed point number and if you have the exponent then the exponent part will also be present. So, these are the various possibilities, but it is very clear that readability is enhanced by using these regular definitions. So, what about the equivalence of regular expressions and finite state automata. So, here we have a very fundamental theorem the second one let R be a regular expression then there exists a non deterministic finite state automaton with epsilon transitions that accepts the language L of R. So, this is a very profound theorem and this is the basis of constructing NFA from specifications of regular definitions. So, the proof is by construction and we will consider the construction in detail. The converse is also to if L is accepted by a DFA then L is generated by a regular expression this proof is extremely tedious and does not yield any insights into the compilation process. So, we will skip that and if the readers are interested they can refer to the text book. So, this is the way the theorem really works. So, let us see here you have a DFA and you can actually generate a regular expression from it regular expressions can be converted to epsilon NFAs epsilon NFAs can be converted to NFAs and NFAs can be converted to DFA's. So, in other words if you are given a DFA you build the regular expression from it and then converted to NFA and back to DFA you actually get the same DFA. So, this is a very powerful result. So, let us see how it works. So, the proof is also based on the inductive definition of regular expressions. So, for R equal to this is the finite state machine. So, if it is it accepts a null language mind you there is nothing null set the final state cannot be reached. So, that is the reason why we have shown a start state and a final state, but there is no arc between them. So, this particular finite state automaton accepts the phi language. What about the epsilon the state q 0 if it does nothing accepts the string epsilon. So, we have made it a final state that is the construction for the regular expression R equal to epsilon what about the regular expression R equal to a single symbol a. So, there is a start state and there is a final state and there is a transition on a and q 0 actually it is transformed to q f. So, and the string a is accepted. So, this is the construction for the three basic cases phi epsilon and a. Now, the next construction is for the operator plus which produces the regular expression R 1 plus R 2 given the regular expressions R 1 and R 2. So, if you are given R 1 so mind you this is inductive. So, R 1 is smaller R 2 is also small than you know smaller than R 1 plus R 2. So, these are the two components they really have less number of operators compared to the total of the operators in R 1 plus R 2. Therefore, inductively we can assume that there is a machine M 1 for accepting the language of R 1. Similarly, another machine M 2 for accepting the language of R 2. So, this is the inductive hypothesis given these two. So, q 1 is the start state and f 1 is the final state of M 1 q 2 is the start state and f 2 is the start you know final state of M 2. So, we take these two machines add a new start state and a final state. So, from the new start state q 0 add an epsilon transition to q 1 add another epsilon transition to q 2. Make the two final states of f 1 and f 2 as of the two machines M 1 and M 2 as non final states mind you there are no double circles here and add transitions on epsilon from each of these states to the new final state f 0. So, q 0 is the new initial state f 0 is the new final state and q 1 q 2 are no more the initial states and f n f 2 are no more the final states. So, how does this machine accept R 1 plus R 2? If the sentence is from generated from R 1 then quietly the n f a takes this path and accepts the string generated by R 1. Similarly, for the string generated by R 2 it takes this path and accepts the string generated by R 2. It is very clear that there is nothing else possible. So, this is the automaton which accepts the string generated by R 1 plus R 2. I will give you an example of all this after the construction process is over. So, let us consider the concatenation operator R 1 R 2 the dot is traditionally not written and we now again have two small regular expressions R 1 and R 2 combined with the concatenation operation. So, inductively we can assume that M 1 recognizes the string generated by R 1 and M 2 recognizes the strings generated by R 2. So, the start state of M 1 is the new start state and the final state of M 2 is the new final state. The f 1 of M 1 is stripped of its final state nature and similarly, the start state of M 2 gets stripped of its start state to nature and between these two there is an epsilon transition. So, this you know now the machine with this combination accepts the strings generated by R 1 R 2. It is very simple. So, you start with the start state. Now, the R 1 part is taken care of by this and then we make a silent transition to the next machine. The R 2 part is taken care of by this and then the machine holds. So, which part is R 1 and which part is R 2? Well, this is a non deterministic automaton. So, it automatically can make transitions you know which take care of acceptance of strings corresponding to R 1 and R 2 appropriately. The last construction case is that of R 1 star. So, this is the closure operator. So, again R 1 is smaller than R 1 star. So, we have a machine for R 1 by the inductive hypothesis. So, q 1 and f 1 are the start and final states of M 1. Now, f 1 is no more final. Please observe that. We add a new start state and a new final state as before. We connect q 0 to q 1 and f 1 to f 0 by epsilon transitions. So far, we did not increase the power. How do we actually make it generate R 1 star? So, that R 1 star implies epsilon also you know 0 number of times here. So, to take care of that we add an epsilon arc from q 0 to f 0 thereby the new machine accepts epsilon. Now, what about multiple instances of R 1? R 1 many times. So, it comes here. This machine accepts R 1. Then, if there is another instance it is actually made to you know go to state q 1 by an epsilon transition, consume the next instance of R 1 and so on and so forth. So, we have added the back arc which is labeled as epsilon to take care of the iterations of R 1 and we have added an arc from q 0 to q f 0 to take care of the epsilon. So, this is the new machine which takes care of R 1 star. So, now, it is time to look at a good example. So, here the regular expression is a plus b star dot c. So, it has all the parameters that we have seen you know. So, the base it causes correspond to a b and c and then there is a plus and then there is a star and then the concatenation. So, R equal to a and R equal to b. I did not write the R equal to c part of it because it is very trivial. So, these are the two corresponding machines for a and b. Now, what about a plus b? So, the construction says take the machine for a, take the machine for b, attach q you know q 3 and f 3 which are the new start and final states and add epsilon arcs. So, this machine corresponds to a plus b as such. What about a plus b star? So, this is the machine corresponding to this part here to here you know q 3 to f 3 with no back arc is the machine corresponding to a plus b. So, we add an epsilon back arc and another epsilon edge from q 4 to f 4 and this is our machine or a plus b whole star. So, what about a plus b star into c or dot c? So, this is the machine for c this entire thing is the machine for a plus b star connect these two using epsilon make q 4 the new start state and f 5 the new final state and this is our new machine corresponding to a plus b star c. So, you can see that there are many epsilon arcs here. So, the construction of DFA from NFA can take care of removal of epsilon arcs and you know the non determinism and finally, convert it into a deterministic finite automaton. So, now we switch back to the finite state automata and their generalization called transition diagrams. So, let me give you an example before we go further. So, here is an example of a transition diagram. So, this is a transition diagram to recognize identifiers and resolved words. So, transition diagrams are the most important part is of course, they have states as in machines finite state machines, but the labels can be either symbols or they can be complete regular expressions. So, for example, letter is a regular expression which consists of any of the letters of the alphabet A 2 Z or A 2 Z and the this arc is written as letter r digit. So, letter r digit implies you know digit is similarly 0 to 9. So, this is A 2 Z and A 2 Z. So, we have actually put down you know alternates the single symbols and also regular expressions as labels of we are allowed to put them as labels of the arcs in the transition diagram. So, their generalized DFA's edges may be labeled by a symbol a set of symbols or regular definition. Some accepting states may be indicated as what are known as retracting states indicating that the lexeme does not include the symbol that brought us to the accepting state. This will be clear when we go through the example. Each accepting state has an action attached to it which is executed when the state is really reached. So, typically you know this action is used to for example, evaluate a string of bits or digits to produce its value or it can be used to search a table of reserved words and produce its code etcetera. Transition diagrams are not meant for machine translation, but they are only for manual translation. So, let us look at some examples. So, here starting from the start state a single letter. So, all identifiers begin with letters. Similarly, all reserved words also begin with letters. So, as far as you know if we do not look at the table of reserved words we cannot distinguish between identifiers and reserved words. So, let us say on letter we go to state one. On letter r digit we stay in state one and on any other symbol that is neither letter nor digit it could be an operator for example, we go to another state called two. This is marked with a star indicating that the symbol which brought us from one to two will not be consumed and is not a part of the token that is produced and there is an action here get token written get token code comma name. So, this searches a table finds out whether it is a reserved word or an ordinary name identifier gets the corresponding code for it and comes out. So, this is the action part. So, let us take another example how about hex constants and octal constants hexadecimal and octal constants. So, the syntax is very similar to that of C. So, hexadeconstant is 0 followed by either a small x or a capital X followed by hexadecimal digits that is 0 to 9 and a to a b c d f all these any number of times a hexadecimal digit any number of times followed by you know either a qualifier or nothing a qualifier simply says it is unsigned or long u or capital U small l or long l big l. Similarly, octal constant to starts with a 0 it has the octal digits from 0 to 7 any number of times followed by either a qualifier or nothing. So, this is our definition of these are the regular definitions for hexa constant and octa constant. So, let us see you know how it works in a transition diagram. So, starting from this state 3 on a 0 we go to state 4 now V branch it is a hexa constant. So, there is an x or it is a digit an octal digit 0 to 7 then we go to state 9. So, either 5 or 9. So, if it is an octa constant in state 9 we consume all the octal digits and on getting a qualifier we go to state 11 or on getting some other operator or symbol like that we go to state 10. So, this is a retracting state that means this symbol which brought us from 9 to 10 is not a part of the token. So, in both these cases 10 and 11 we return the token int constant and the value as well. So, the string that we have accumulated the lexeme we have accumulated in entering either state 10 or state 11 is actually the string of digits in the constant. So, that part would be taken then evaluated and the value is returned as the value of the integer constant. So, whether it is a hexa constant or octal constant it is still an integer and that is the reason why we return the token int constant. So, how about normal integers? So, normal integer constant is any number of digits followed by qualifier. So, digit is either 0 1 2 3 etcetera 9 and qualifier as usual is unsigned or long. So, this is very simple start on a digit go to we since we have you know digits here. So, we must see that consumer digit and then any number of digits here. So, we really have to say this should have been digit any number of digit plus sorry this is not digit star this is digit plus. So, any number of digits here and then either unsigned or other character in both cases we return integer constant and its value. So, this is a very simple transition diagram for integers now something more complicated. So, this is for real constants. So, you know real constants can be of any of these three types. So, they could have any number of digits digit plus followed by an exponent and then followed by a qualifier or nothing or they could be any number of digits followed by a dot followed by a dot digit followed by exponent or qualifier and or qualifier. Finally, third type you know here in this case you would have possibly you know no digits before the dot, but once you have a dot you must have at least one digit. And then you have any number of digits followed by dot followed by any number of digits followed by exponent or you know qualifier. So, to permit to optional factions you know some jugglery of this kind would be necessary and exponent has either big E or small E followed by the sign of the exponent and then the digits. So, qualifier is either small f capital F L L L or big L. So, this transition diagram again is similar on a digit it goes to this state remains in it consuming all the digits on a dot it goes to this state consumes all the digits again and on any other symbol goes to this state where we return a real constant with value. If there is an exponent it continues all the way consumes the exponent digits and finally, returns the real constant value. And if we have we do not have any integer part when we come here and again follow this particular path. So, you can go through this with a couple of examples to convince you that it catches all the real constants it is a slightly more complicated example than before. Then, so far we have seen constants of various kinds right integer constant hexadecimal constants octal constants we have also seen floating point constants now and integer identifiers right. So, now let us see how operators are taken care of in transition diagrams. So, we have many operators this is greater than and then you know if it is not a greater than symbol or a you know if it is just an ordinary greater than then we must return relop greater than. If it is greater than followed by another greater than and then followed by some other symbol then we have an arithmetic shift right right shift. And if we have a greater than followed by equal to then we return greater than or equal to we have greater than and then equal to we assign this is actually a right shift assign type of operator. So, there this is the this is one of the transition diagrams here is one more. So, we have a plus followed by equal to which is the add assign operator which is just a plus then we return an op as plus. And the you know see here it is either arithop or assign op or you know these are the two types of operators and accordingly there is a code within that called add assign or right shift or shift assign etcetera etcetera. So, the various arithmetic operators are all handled here. Similarly, we would have you know you know the logical operators being handled and so on and so forth. So, it would be repetitive thing without giving extra benefit for the student. So, now what about generation of a lexical analyzer from transition diagrams as I already mentioned transition diagrams are useful only for writing you know lexical analyzers by hand. So, let us see how the transition diagrams are implemented in code. So, the first example would be this. So, we have 0, 1, 2, 3 states from 0 to 1 we go on letter and in 1 we remain on letter or digit and on any other symbol we go to 2 and which is a which returns a token. So, let us follow that. So, this is a function get token which returns a token and token has two parts it can contain the token value token and value. So, two parts in the token itself and there is a karsi. So, here this is a while loop which keeps going forever until we exhaust the input. So, switch state. So, read the next character this is state 0 if it is a letter go to state 1 otherwise state equal to failure. So, it goes back to the start state and then there is a break if it is case 1 if it is a letter or digit it remains in state 1 otherwise it goes to state 2 and breaks. Case 2 this is a retraction as I said this is this is really a retraction. So, in state 2 we return a token and we do not consume the symbol that brought us to this state. So, my token dot token we search the token array I mean or rather the word array find out what type of a token it is it is either a name or resolved word. So, if my token dot token is equal to identifier then the value is the string corresponding to that identifier otherwise we need the string and we return the token. So, then the hexa and octal constants are very similar. So, for example, here let us trace the octal part on 0 we come to 4 and then d oct we come to 9 and then on the qualifiers we go to 11 or any other symbol we go to 10 if we keep getting octal digits which remain in state time. So, here in state 3 we read a character it must be a 0. So, otherwise it is not a hexa or octal constant. So, we go to state 4 otherwise it is an illegal character in state 4 it is corresponds to you know either x or x that means it is a hexadecimal constant if it is not x or x then is it a digital you know the octal digit yes we go to state otherwise it is a failure you know it is not a correct type of constant. So, let us look at state 9. So, state 9 says if the you know symbol we get is a octal digit then we remain in the same state and if we get a qualifier u or u or l or l we go to state number 11 otherwise we go to state 10 and stop. So, this is our state 10 here. So, state 10 is here this is corresponds to the other symbols part this corresponds to the octal numbers part and in case 10 we retract and this is a fall through to case 11 which assigns the token as inconst and the value as after evaluating the octal number whatever value we get and returns that particular token. So, we will stop here and continue the lexical analyzer discussion in the next lecture. Thank you.