 Hello everyone, welcome to the session on input buffering for the lexical analysis. We already know that lexical analysis is the first phase of the compiler and it is the only phase which is reading a complete program character by character and it generates the tokens and those tokens are provided to the next phase of the compiler which is parser and that parser is providing proceeding for the tokens it is passing it and it is processing. Now this input buffering this input buffering is used to read a statement or a program part character by character. So most of the part most of the time of the compiler is given for this. So let us see at the end of this session you will be able to illustrate the use of buffering technique with two things that is buffer pairs and sentinels and you can write a small part of a program statement for generating the tokens using the buffer pairs and sentinels. Let us see scanner which is called as a lexical analysis. So as I told earlier it is the only part of the compiler which reads a complete program and it is taking most of the time of the compiler we can say around 25 to 30 percent of the time of the compiler is given for the lexical analysis. Usually it scans look it is using look ahead with the double buffering technique. Why this double buffering technique it is minimizing the overhead it means one buffer we are using for reading and till that time the second buffer can be loaded. So for this purpose it is usually using double buffering technique. Now this input buffering the look ahead of the lexical analysis is reading several characters and those characters we have the set of character we call it as a lexem and this lexem is matching with the pattern and if it is matching with a particular pattern then it is generating a token accordingly and this token is given to the parser. So basically there are two buffer input schemes for look ahead one is buffer pairs and the second one is the sentinels. Now buffer pairs buffer pairs the memory or the input buffer is usually divided in two parts which is given here like this. So the first half of the buffer and the second half of the buffer actually it is continued here and continued here. So two character halves are there and usually the number of characters in each half is based on the memory block it may be 1 0 2 4 4 0 9 6 or so on it depends upon the block of a memory. So every buffer half is dependent on that. Consider the statement the statement is value equal to rate raise to 3 so rate cube. Now this is the statement program statement so if this one we are putting in a buffer it is looked like this one. So every character in a buffer is stored like this even the space here is taken as one character the next space as another character like this the statement is stored and this is the end of the half we can say the first half you may say left half of the buffer. Now this buffer is having two pointers one is lexem beginning another one is called as a forward pointer. The lexem beginning is pointing always to the beginning of the lexem which is going to match with the pattern and then it is generating a token and the forward pointer moves on next one by one till the end of the lexem is not observed. So from the beginning of the lexem that is from the lexem beginning till the forward pointer okay one complete lexem is generating which is matching with the pattern and based on that the token is generated. Let us see here so as I told earlier these spaces are taken as the separators these one like this another the statement is same statement where here we have taken a space so these are taken as separators and even semicolon is also taken as the end of the line or the separator which is storing in this way here. Now this statement is stored like this now see what is the difference in the earlier statement and this one this statement is not having any separators so without any separator the things are stored like this. So how the tokens are generated on this the tokens are generated as this one so this as one token then equal to then rate and then this raise to operator does it two stars and then three as a number. So token one, token two, token three, token four and token five are generated. Let us see how the forward pointer is moving the forward pointer is initially it is point it is pointing to V now it has pointed to this A then similarly it is moving ahead one by one next this one then the next ahead so like this from the lexem beginning to the till the end of the lexem the token has generated for this. Similarly for the next one where this rate is there so lexem beginning is pointing here and the forward pointer moves on one by one and it is pointing to star so this forms one token so now see here here no separator is there so forward pointer and next lexem beginning you will again point to the next token generation for this like this. Now the concept of buffer pairs usually buffer pair it reads n bytes into one half that is the first half of the buffer and the second half of the buffer each time and the input has less than n bytes and after the end every input statement the end of file marker in the buffer is stored so that it will inform us it will inform the compiler that this is the end of the statement and when one buffer has been processed it is reading n bytes into another half means if one half is processing another half it is reading the contents therefore we are using two buffers here. Let us see an example here the same thing is taken here how the buffer pointer is moving if the forward pointer is reached at the end of the first half then what it has to do it has to read the reload the second half and whatever the pointer is there that pointer forward pointer it has to point to the beginning of the second half so the pointer is moving here else means if it is not the end of the first half it might be the end of the second half so if end of the second half is there then it is reloading the first half and again it is setting the forward pointer to the beginning of the first half otherwise means if it is not the end of the first half or it is not the end of the second half in that case it is moving the forward pointer it means let us see in this one whenever it goes on reading the contents of these one if it is reaching to the end of the buffer that is a left half then it has to reload the second half and the forward pointer has to be pointed to the next one or if it is reaching to the end of the second half then it has to reload the first half and the forward pointer has to point to the beginning here that is the thing so otherwise the pointer has to move to read the next lesson so two conditions are there here which we are using for checking the first half and the end of the first half and the second half what it does for every forward pointer increment usually it is taking two tests unnecessarily why we are saying unnecessarily because usually the forward pointer is incrementing every time at the end of the halves only it has to reload the things so here the overload is the unnecessary tests which have been taking place here for every forward pointer so this is the disadvantage of this buffer pair what to do how to minimize this code okay how we can optimize this code so that the number of condition checking the number of tests can be minimized let us say we are now adding the end of the file marker for end of every half that is end of the file here and end of the file here so these are the end of file markers at the end of every half which are called as sentinels so extra two characters two sentinels are added for every end of the half again as usual the end of the file marker is added at the end of the statement also we can see here there are two EO three EOF's are there one at the end of first half the second one at the end of the second half and one at the end of the statement also so this is called as sentinels what sentinel is doing it is extending each buffer half at the end which is optimizing the code how it can optimize see it is reducing the number of tests every time instead of two tests it is reducing it by advancing the for forward pointer every time okay let us see how we can minimize the code how we can optimize the code based on this by adding the end of the characters file markers so the three markers are we are here using see the program code forward pointer is incremented and it is point it is checking the end of the file marker and if it is EOF then there are two things it might be end of the first half it might be this end of the second half or it might be the end of the termination of the string otherwise the forward pointer is incrementing so instead of this one okay forward pointer is incremented and then the end of the file pointer is checked pause the video and see the difference in the earlier code and this code and check that how it is minimizing let us see how it is minimizing as I told earlier only one condition here we are checking whether it is EOF if it is EOF it is if it is true then it is checking the end of the first half or the second half or the end of the termination symbol so in this case we are only checking the single test per character so it is minimizing the it is optimizing the code for forward pointer these are the references thank you