 Welcome to the session on tokens, patterns and lexems in compilers. These are used in the first phase of the compiler, which is a lexical analysis. Usually every program input is taken in this one. Now the learning outcome of this session is to illustrate the concept of lexem, token and pattern and student will be able to identify the tokens by reading the lexems using the patterns. Let us see one by one what these concepts are. See the compiler, it is usually reading the source program and generating the target program. And if any errors are there, it is showing the errors in that. In the source program, the format is in high level language, say C programming language or a Pascal language, any kind of high level language is given as an input to the compiler. And the compiler is generating the target program in some other programming language, but specifically it is in machine language, which is specific to machine. So machine dependent target program is usually it is generating. Now the first phase of the compiler lexical analyzer, this reads a complete program, which we call it as a source program. It is scanning that by character by character and it is generating the sequence of tokens. And those tokens are given to the parser. So input source program is given to the lexical analyzer, it is generating the token and that token is provided to the parser. Parser gets that token for the further processing. And again it is providing the command to the lexical analyzer. And based on that, again it is giving another token to the parser. And in this, the symbol table is making the changes in that by the lexical analyzer and the parser. And rest of the compiler phases are again provided and those are processing. Now consider the program statement. A simple program statement is there, position equal to initial plus rate multiplied by 60. In this one, we can see that position, initial and rate, these are the identifiers. And equal to plus multiplication, these are the general operators. And one number is there, which is 60, so that we can categorize. Let us see more in this one. Pause the video and think about what more information we can get from this statement. Is there any other information or more information we can get from this statement? See in detail, we can say that the position is an identifier. Equal to is an assignment symbol. Initial, again is an identifier plus is categorized as plus operator. Specifically, we are saying it is a plus operator, which is doing addition and rate, rate is an identifier and multiplication, which is a multiplication operator. And finally, numeric, which we call it as a numeric literal or numeric constant, which is number 60. Now, let us see in detail what exactly these terms are, tokens. Now, it is generally a set of strings or any given source alphabet. Whenever we are reading any input, which is a sequence of characters, those are we are combining and which is a form of a smallest part of a program, we can say and those smallest part of a program is generating an information based on that, it is categorizing, it is providing the identifier based on that. So, the categorization may be identifier, number, operator, keyword, these are all the tokens which are generated from the program. Now, the pattern, what exactly pattern is? Pattern means it's a rule. It is describing a particular token, that set. Example, if we are saying identifier, how I can say a particular given thing, given set of character is an identifier, I can say that an identifier is starting with a letter. It may have letter or digit, letter on combination, any combinations of that. So, that is provided by the regular expression starting with letter, letter or digit star. So, that we can say as a rule, as a pattern. So, every identifier is matching with this pattern and generating the token as identifier. Now, lexem, what is lexem? It is a sequence of character which is matching that particular pattern. Say for example, in our example, position is a lexem, initial is a lexem, rate is a lexem, plus is a lexem, 60 is a lexem. So, all these are lexems. Based on this, we are generating the tokens and how the tokens are generated, those are generated by the matching with the patterns. Now, consider this C statement, constant pi is equal to 3.1416. This is a simple statement. So, in this, if I can say what are the lexem's patterns and tokens for this, say the substring pi, this pi is called as a lexem. And what token it is generating? It is generating the token as identifier. And what pattern it is matching? The pattern it is matching is identifier, which is producing letter, letter or desert star. So, these are the simple things which we can say that the tokens, patterns and lexems are there. Let us see in detail. See, consider this particular table where the table is providing us the detailed information. We are saying this as a token means what exactly the lexical analyzer is providing the information to the parser. So, the categories are like this. And what are the lexems we are giving as a sequence of character here? The inputs are in this way. Okay. And the description, the patterns, what we are matching that we are providing here. See the first thing that is if. If is a simple keyword, we can say. So, what it is matching? It is matching exactly with two characters. One is i, another one is f. So, that is a lexem. And it is matching with the exact pattern called as if. If I want to check that if is a if keyword, then I have to match exactly with the pattern i and f. So, after matching with this pattern, what token it is providing to the parser? It is providing the token if as a keyword there. Similarly, while keyword is there, here while is given. Now for this while, the lexem is again while which is w, h, i, l, e. And that is exactly matching with the pattern w, h, i, l, e. And it is providing the token. So, generally we can say that in any programming language, whatever the keywords are matching, the tokens generated for them are the keywords only. Let us see here what these relations are. Now the relation, every relation operators are there. The lexems are less than, less than equal to, equal to, not equal to, greater than, greater than equal to, all these are there. Okay. Now, so for every kind of these relations, the token it is generating is a relation. And the patterns are, we can write the pattern as less than or less than equal to as two set of characters equal to less than and greater than as not equal to, then greater than, greater than equal to. So, these are the, this is a pattern. Similarly, id or identifier. So, every identifier is having its own pattern. So, earlier we have seen the pattern that is letter followed by letters and digits, which we can write as letter, letter or digit star. So, that is matching with these kind of lexems. All these are the lexems. Count is a lexem. Son is a lexem. I is called as an identifier. J is an identifier. Pi is an identifier. Even I can write d2, which is a valid keyword. So, it is returning identifier. Now, the number. The number is, it is any kind of numeric constant. It might be simple 0, 12 or negative number or the fraction number or exponential value. All these are the numeric constant. So, any numeric constant is there. So, the token it is providing for this is num or number according to that. If in detail we are saying, now see in this, in detail the token type are provided here. Integer constant. So, the token value is any numeric value. So, numbers like 3, minus 5, 12 without decimal points are provided. Now, floating point, any numeric value, the floating points are fractional, negative fractions and all those ones. Reserve words, we can call them as keywords. So, any word string is there. So, words like if, then, class, for, while all these are the keywords. Identifiers, the symbol table index is given for that and the words reserved for which are usually starting with the letter or underscore is also used in some patterns and which is containing letters, underscore and digits, any combination. Similarly, relational operators, operator strings are provided as token value. So, less than equal to, double equal to, these are the one more formats are there. Operators, any operators plus, minus, plus, plus, increment and all according to C language. Character constant, a simple A is provided which is in single code. So, that is a constant character value here and string. So, simple strings are provided. So, we can say here that tokens means these are a complete set of tokens which are the form of terminal symbols. So, usually tokens are the terminal symbols used in the grammar and those are provided for the parser. And in most languages, the tokens are falling into the categories as keywords may be called as tokens. Operators are there, identifiers are there as we have seen earlier. Constants are there, literal strings are there. Literal strings means any string which is written usually in double quotation which is as it is generating. So, that is there. The punctuations are there. All these are the categories of whatever the given lexems are and every lexem is matching with the pattern and based on that pattern it is providing the token as a category for it. So, this concept is token, patterns and lexems. These are the references. Thank you.