 Hello everyone, I am Mrs. Sunita Dole working as an assistant professor in computer science and engineering department of Valshan Institute of Technology, Singapore. Topic covered here is syntax analysis, compiler is a program that reads a program written in one language and translate it into equivalent program in another language. Syntax analysis is the second phase of compiler which is also called as PASR or the hierarchical analysis. Every programming language has rules that prescribe the syntactic structure of well formed programs. In Pascal, a program is made out of blocks, a block out of statements, a statement out of expression and expression out of token and so on. The syntax of programming language construct can be described by context free grammar. The PASR checks whether a given source program satisfies the rule implied by the context free grammar or not. If it satisfies, the PASR creates the parse tree of that program, otherwise the PASR gives the error message. In this video, I am going to cover the role of PASR and context free grammar. Learning outcome At the end of this session, students will be able to identify the different errors like lexical, syntax, etc. Know about error recovery strategies, write the derivation for the given expression and draw the parse tree for the given expression. In compiler model, the PASR obtain a string of token from the lexical analyzer as shown in figure and verifies that the string can be generated by the grammar for the source language. So, the PASR works on a stream of tokens. It should recover from commonly occurring errors so that it can continue processing the remainder of its input. The methods commonly used in compiler are classified as being either top down or bottom up. Top down PASR builds the parse tree from top to bottom while bottom up PASR starts from the leaves and work up to the root. In both cases, the input to the PASR is from left to right, one symbol at a time. The most efficient top down and bottom up method work only on the subclass of grammars but several of these subclasses such as LL and LR grammars are expressive enough to describe most syntactic construct in programming language. If a compiler had to process only correct program, its design and implementation would be greatly simplified but programmer frequently write the incorrect program and a good compiler should assist the programmer in its identifying and locating errors. Program can contain errors at many different levels for example lexical such as misspelling an identifier, keyword or operator, syntactic such as arithmetic expression with unbalanced parenthesis, semantic such as operator applied to the incompatible operand and logical such as infinitely recursive call. Error handler in PASR has simple to state goals. First it should report the presence of errors clearly and accurately. It should recover from each error quickly enough to be able to detect subsequent error and third it should not significantly slow down the processing of correct program. Error recovery strategies. There are many different general strategies that a PASR can be used to recover from the syntactic error. First panic mode on discovering an error the PASR discard input symbol one at a time until one of the designated set of synchronizing token is found. The synchronizing tokens are usually delimiters such as semicolon or end whose role in the source program is clear. The compiler designer must select the synchronizing tokens. This strategy is simple as well as guaranteed not to go into an infinite loop. On discovering an error a PASR performs local correction on the remaining input i.e. it may replace a prefix of the remaining input by some string that allows the PASR to continue. A typical local correction would be to replace a comma by semicolon, delete an extraneous semicolon or insert a missing semicolon. This strategy can correct any input string and has been used in several error-repairing compilers. Its major drawback is the difficulty it has in coping with the situation in which the actual error has occurred before the point of detection. Third error production. We augment the error grammar for the language at hand with the error production to construct a PASR. If an error production is used by the PASR then error diagnostic can be generated to indicate the error nearest construct in the input. Fourth global correction. We would like a compiler to make as few changes as possible in processing an incorrect input string. There are algorithms for choosing a minimal sequence of changes to obtain a globally list cost corrections. Given an incorrect input string x and the grammar g, these algorithms will find a PASR tree for a related string y such that the number of insertion, deletion and the changes of token required to transform x into y is as small as possible. Till now, we consider the errors at many different levels in the program and the error recovery strategies. Now pause this video and reflect on this question for a minute or 2 minute and write your response. Once you return the answer to these questions then you can restart playing this video. The question is identify the types of error in the following example. I hope all of you have completed this activity. So the question was identify the types of error in the following example. First, misspelling an identifier keyword or the operator is a lexical error. Hence, 12ab is a lexical error. Arithmetic expression with the unbalanced parenthesis is a syntactic error. Hence, option b is an example of syntactic error. Operator applied to the incompatible operand is a semantic error. Hence, 2 plus a of i is a semantic error. Infinite recursive call is a logical error. Hence, option d is a logical error. Context free grammars. Many programming language constructs have an inherently recursive structure that can be defined by context free grammar. In a context free grammar, we have a set of finite set of terminals, a finite set of non-terminals, a finite set of productions in the form a derives alpha, where a is a non-terminal and alpha is a string of terminals and non-terminals include the empty string and a start symbol. Grammar for the simple arithmetic expression is given below. Notational conventions, terminals, lowercase letters early in the alphabet such as a, b, c. Operator symbols such as plus, minus, etc. Punctuation symbols such as parenthesis, comma, etc. Digits 0 to 9 and bold-face string id or the if are the terminals. Non-terminals, uppercase letters early in the alphabet such as a, b, c. The letter s, which when it appears is usually the start symbol. Lowercase italic names such as expression or statement are non-terminals. Uppercase letters late in the alphabet such as x, y, z represent the grammar symbol that is terminus or the non-terminals. Lowercase letter late in the alphabet for example u, v, x, y, z represent the string of terminals. Lowercase Greek letters alpha, beta, gamma represent the strings of grammar symbols. If a derives alpha 1, a derives alpha 2 and so on up to a derives alpha k are the production with a on left then we may write a derives alpha 1 or alpha 2 and so on up to alpha k. Unless otherwise stated the left side of the first production is start symbol. So the grammar for simple arithmetic expression using these notational convention is given below. Derivation. So derivation is a sequence of replacements of non-terminal symbols in general. A derivation step is alpha a, beta derives alpha, gamma, beta. If there is a production rule a derives gamma in our grammar where alpha and beta are arbitrary string of terminal and non-terminal symbols. The first symbol means derives in one step. Second symbol means derives in zero or more step. Third symbol means derives in one or more step. Derivation of string id plus id using the grammar for simple arithmetic expression is given below. At each step of derivation we can choose any of the non-terminal in the sentential form of g for replacement. There are two types of derivation left most derivation and right most derivation. If we always choose the left most non-terminal in each derivation step, this derivation is called as left most derivation. The top down parser try to find out the left most derivation of the given source program. If we always choose the right most non-terminal in each derivation step, this derivation is called as right most derivation. The bottom of parser try to find the rightmost derivation of the given source program in the reverse order. The leftmost derivation and the rightmost derivation of the string id plus id is given here. Pastry. A pastry can be seen as a graphical representation of a derivation. Inner nodes of a pastry are non-terminal symbol while the leaves of a pastry are the terminal symbol. The leftmost derivation and the pastry of the string id plus id is given here. The rightmost derivation and the pastry of the same string id plus id using the grammar for simple arithmetic expression is given here. Ambiguous grammar. A grammar produced more than one pastry for a sentence is called ambiguous grammar. As there are two derivation as well as two corresponding pastries for the string id plus id into id using the production of simple arithmetic expression grammar, hence this grammar is ambiguous. These are some references. Thank you.