 We have seen three distinct ways of describing regular languages finitely and these are either we can use a DFA to describe a regular language or we can use an NFA to describe the same language as well as we could have used an NFA with epsilon transitions in general to describe the same language and all these three are finite ways of describing in general what are infinite sense. Now, today we will find yet another way of describing the same class of languages that is regular languages and this will be done through something which are known as regular expressions. So, first of all I will define what are regular expressions now we say a regular expression R and I will tell you what this R is inductively, but first of all we fix an alphabet. So, let us say this is sigma over an alphabet sigma. So, we are now describing regular expressions over an alphabet sigma is defined as follows. First of all we have a base case we say pi epsilon and a regular expressions where a is a symbol over a is a symbol in the alphabet sigma. Now, certain notational remark as we will see every regular expression denotes a set of strings. So, this as a regular expression although we are writing the same symbol a and here also we are writing a this is a symbol in the alphabet whereas, this is a regular expression over the alphabet sigma and to distinguish these two uses of the same symbol a we put an underline here and typically in textbooks this will be written in bold. So, you see this underline bold phi bold epsilon and bold a are regular expressions what they mean what they denote that I will explain a little later. Then let me first let me tell you what are these expressions syntactically second our second rule is if R 1 and R 2 are regular expressions then. So, R we can put R 1 regular expression in a bracket this is the first or let me write it as a or without anything. So, that we do not get confused the other will be. So, are this R 1 plus R 2 then R 1 R 2 and finally, R 1 and we have a you know we say a regular expression over R is an expression which is obtained by using finitely many times these rules. So, basically by saying this what we are ruling out is an infinite string as a regular expression. So, one point is all regular expressions are finite strings and they will look like they can be constructed in this manner using these rules. So, let me just give you an example first we can see suppose our alphabet is 0 1. Then this clearly is a regular expression because 0 is a symbol in sigma and therefore, this is a regular expression and another regular expression is this and now what I could have done is for example, I could have put a plus in between this is again a regular expression by this rule because two regular expressions in between if I put a plus sign then that means it is a regular expression and then since this whole thing is a regular expression therefore, this is also a regular expression. This is one simple example of making use of these rules to obtain a regular expression. Now syntactically these are the kinds of things these are the expressions which we call regular expression. Now, one point which we have to keep in mind this was an example and the point is every regular expression over sigma denotes a language and let us use this notation that if R is a regular expression then L R is the language denoted. Now, let me describe for the regular expressions what are the languages denoted by regular expressions and we will again to it inductively. So, let me write for the first this is the base case. So, we write denotes the empty language fine. Now, what does it mean see we said we can if we use this notation in other words I can write that if our sigma was the alphabet. So, what we are saying is L of the regular expression the language denoted by this is now clearly phi is a subset of sigma star and therefore, it is a language over sigma and this regular expression denotes that language which is the empty language. Similarly, denotes the language in other words. So, what is this this regular expression denotes the language which has one string and that is the empty string. Again this is a language which has just one element and that element is the symbol which was used to obtain the regular expression. So, that is similarly a law this completes the base case rule one and now for the other cases of course, this one just means you know brackets are used for convenience. So, we can simply say R 1 the same language as. So, R 1 is a regular expression if you put a you know brackets around it the language denoted does not change. Now, I can say R 1 plus R 2 denotes this is a non trivial case what we are saying is that if I take two regular expressions R 1 and R 2 put a plus in between then this expression denotes the union of the two languages denoted by the two arguments here. Then R 1 R 2 the concatenation of these which of course, you know this language is same as all those x y such that x and y are in sigma star and x is in L R 1 and y is in L R 2. This was our old concatenation of two languages all I am saying the regular expression R 1 R 2 denotes the concatenation of the two languages L R 1 and L R 2. That means everything here any string in this language denoted by this expression there is a way of breaking it up. So, that the first part is in the language of R 1 second one is in the language of R 2 and then finally, this again denotes the language L R 1 star. I should remind ourselves what this star was star was the cliny closure or simply closure and just to remember what if I have a language L then L star is what this is defined in this manner first of all I use the notion L 0 is epsilon language which has only the string epsilon L 1 is of course, the language L itself L n plus 1 is L concatenated with L n now I am giving it recursively what it is and then L star is simply the union of all these L i's and intuitively when do I have a string in L star if I can break up that string in a number of as if we can see a string as a concatenation of several strings each of which is in the language L then such a thing is such string will be in L star. So, if this is our definition this is our understanding of the sets denoted by these expressions for this simple case can we figure out what is the language denoted. So, let us use our definition here and we will do it the same way as we did here will go in from inside to out the 0 this denotes the language with just the string 0 1 denotes the language 1 we have we are just applying these basic cases now 0 plus 1 what it will denote it will denote the union of the 2 denoted languages. So, what is the union of these 2 clearly just the set which has 2 elements 1 is 0 the other is 1 and what is therefore, 0 plus 1 star this is simply this is our old familiar set of all binary strings set of all finite binary strings and this is the regular expression before we proceed we should realize one thing that it is possible to have several different regular expressions denoting the same language. For example, if you think a little bit you will see that this regular expression also denotes this same language that is the set of all finite binary strings though this looks very as an expression it looks very different from this one, but it is also denoting the same set of strings. Let me now give you a number of examples so that our idea about this regular expressions and what they denote that is clear before give I give you the some example more example we had already seen two very simple examples. There is a way of writing these things writing regular expressions so that we do not have to put too many brackets and that is as we use in normal arithmetic for example, if you say 2 plus 3 star 5 everybody will say that this means 17 and not 2 plus 3 5 into 5 25 why what is our actually we to see this if we put brackets and we could have put like this and then another bracket, but that is not necessary because we use the convention that this operator this binary operation has higher precedence over the other one which is plus. So, let me just see this therefore that this has the highest precedence and concatenation precedence of concatenation namely this one is higher than that of which is of course union. Now using that notion of precedence this I could have written as you see it is clear that this star it immediately binds to the one which is immediate namely this. Similarly, this star immediately binds to this one and not this whole thing and therefore it is 0 star concatenated with 1 star and then we are taking the star I should mention this this operator is called either closure or cleaning is a one of the founders in this area who was a logician done a lot of work to base the foundations of theory of computing. Now we will give the examples I have written a number of examples and you can see each one of them the each one of them is well formed using those rules of construction of regular expressions that is clear and also I am using the precedence which I explained that closure has the highest precedence followed by concatenation followed by plus. So, let us understand each of these expressions and what they denoting the sets they denote. So, alphabet here is binary 0 and 1 and this is saying if you see what it is saying that you can have any number of 0s followed by 1 followed by any number of 0s. This is what it is saying 0 star means 0 or more number of 0s clearly this means what this means to the set of all binary strings which has exactly one occurrence of 1 1 occurs the symbol 1 occurs in the string exactly once no more no less such a thing such a string may have only 1 and nothing else and that is allowed because you see this goes to epsilon or you know you take the epsilon this is set of all 0s 0 or more number of 0s in particular epsilon is there in this language here also epsilon is there concatenate epsilon 1 epsilon get 1. So, just a single 1 will do on the other hand if I take something like 0 0 0 1 0 0 clearly this can be this will be in the language 0 star and this also will be in 0 star and therefore, it is and it is not too difficult to see that this is precisely all those binary strings x as exactly 1 1 what about this now again you see remember this is our old friend which denoted the set of all binary strings finite binary string any string over alphabet 0 1. So, you are saying take any string over alphabet 0 1 concatenate it with 1 and follow it by the another string. So, what you are guaranteeing you are guaranteeing that the strings denoted by the set will have at least 1 1 it cannot have only 0s why because of course, you can produce only 0s through this or through this, but then we are concatenating this and this and this. So, one occurrence of 1 has to be there and why at least 1 1 you see they can be other ones also coming out of this coming out of this. So, therefore, this regular expression denotes the set of all finite binary strings in which there is at least 1 1 what about this now actually regular expressions are very easy ways of describing that something happens in a string. So, let us look at this it is slight generalization of the old thing what we are seeing is that any string in this in the language denoted by this regular expression it is clear that it must have the substring 0 0 1 why because how do you get a string in this language denoted by the regular expression by taking a string from this language taking a string from this language and putting 0 0 1 in between. So, 0 0 1 must occur and if we have any string the other way what I am trying to say is that any binary string which has an occurrence 1 or more occurrences, but at least 1 occurrence of 0 0 1 such a string will be in the language denoted by this regular expression because such a string is of the form x 0 0 1 followed by y this part is y x we can imagine we can think of x coming out of this part y coming out of this part and then we are concatenating these three. So, both ways it is true that the language generated by denoted by this regular expression every string in that language will have 0 0 1 and any string in which 0 0 1 is a substring will be in the language denoted by this. So, I can write in this way is the set of such that x has 0 0 1 occurring as a substring all right. Now, this is also not too difficult to see what the language denoted you see what is this is saying I take this particular part of the expression will denote a language which is precisely 0 or 1. So, basically either 0 or 1 so basically either this denotes the language with two strings which are the two bits and then you are concatenating with this. So, when you concatenate what you are going to get you are going to get 0 0 0 1 1 0 1 1 these four strings you will get and now you are taking the closure what would it mean that any string in which which can be broken up in this this this or this any ways that is you know something like I can say 0 0 1 0 1 1 0 0 0 0 right this would this string surely would come from the closure of this language which is the concatenation of just these simple one bits and in effect what is this saying doing therefore it is denoting all strings all binary strings which has even number of symbols right because you know you can see that is happening. So, this is all strings with even number of symbols all binary strings with even number of symbols and final example here is what this is any string this denotes any binary string similarly this denotes any binary string. Now what we are saying this part is saying what a string which comes out which is denoted by this regular expression is in the language denoted by this regular expression must begin with 0 and end with 0 and similarly this one must begin with 1 or end with 1. And then we are saying or it can be a single 0 or it can be a single 1. So, in effect what we are saying is that all that I can compress in a simple sentence that it is the set of all binary strings where the first and the last bit is same if the first bit is 0 the last bit also has to be 0 if the first bit is 1 last bit also has to be 1 and of course a single bit string which is 0 or 1 that satisfies that same property because that bit is both the first and the last. So, I can say it this way that this denotes the language the first the last bit can we generate epsilon can epsilon come out of this not really because the smallest thing that this can denote is 0 0 the smallest thing this can denote is 1 1 this is of course 0 and 1. Now makes sense because when you talk of the first and the last bit they should exist and therefore this is correct way of denoting this language through this regular expression. Then our next examples are from an alphabet which is a b c. So, let us take this the first one what is it saying by now you know this just means any string over a and b. Similarly I have one more such thing component. So, any string over a and b concatenated with a c followed by any string over a and b concatenated with 3 c this and then the then you can repeat any number of times 0 or more times. So, do you see that language denoted by this is the set of all strings over a b c where the number of c is even the total number of c is in the string will be even and by the way 0 is an even number and that is allowed. So, a string which does not have any c in it can also be in this language because we are taking the star of this whole thing. So, in particular it can generate it can it does have epsilon and epsilon you add with that to this you will give you you will get a string with no c's at all. Our final example here is possibly a little more advanced and you see whereas it is very easy through regular languages to see talk of strings which has some this substring or that substring right. For example, if I wanted a set of all binary strings which either have 0 0 1 or 1 1 as the set of a substring what I would have done I would have simply done 0 plus 1 star plus 1 star this will generate a string with 0 0 1 as a substring of course, I should also take care of 1 1 substring. So, that is this and I put a plus in between and of course, I should do all this. So, that we are not confused and this expression you can clearly see denotes the set of all binary strings where each of which either has 1 0 either has 0 0 1 occurring as a substring or 1 1 occurring as a substring. So, these kinds of things are very very easy to do it, but how do I specify through my device of regular like regular expressions. The set of all strings in over a a b c let me write it say of all strings over a b c which is a set of all strings over a b c which is a set of all strings the substring a c does not occur does not occur. So, what I said little while back was that when I want to talk of strings in which some substring or number of substrings occur that is easy, but what about this and contrast this with the d f a situation in case of d f a if I had a d f a which accepts all strings in which a c occurs as a substring in that d f a all I needed to do was to switch the final states accepting states with non accepting or non final states and that would give me this language the set of all strings in which a c does not occur as a substring. I can actually write a regular expression for this and in fact this is that regular expression can we understand this. So, of course such a such a such a string in which a c does not occur can begin with any number of strings no problem and then what we have what we are seeing is that I will have 0 or more strings from this language. Now, you see in this language what is happening if you have to get a c you know that you can get out of here any number of c is including 0, but then that has to be preceded by a b and. So, therefore, b followed by 0 0 or more c is again b followed by 0 or more c is in between anywhere you put a is, but you see when you put a is can come only a single a can come only after one of these of course you can put another a because in closure you can take any copies any number of copies of this, but notice it is not very difficult to see that the symbol c if it occurs it has to have b in front here in this part and therefore, set of strings generated or the language denoted by this regular expression does not have a c occurring. Now, we are going to prove something very important which is that regular expressions you take this infinitely regular expressions that is possible over an alphabet sigma with all these each one of these regular expressions because we know each one of these denotes a language. So, if you take this sum total of all languages which can be denoted through these regular expressions over an alphabet the class of languages that you get is precisely the class of regular language in other words what I am saying is that this is the class of regular languages just a picture to keep our ideas clear. So, you know each one here each element here is a regular language L 1 L 2 right each L is a regular language by this what I mean is for each one there is a regular expression which denotes this language that is one part and the second part is that if you take the set of all regular expressions over this alphabet sigma then these are now I am thinking the class of all regular expression each one again denoting a language each one of them is also regular. In other words the language denoted by the set of all regular expressions and the regular languages the class of all regular languages these two things are precisely one and the same and I will prove this. This as you can see requires two proofs direct way of showing this fact and the first one I need to show is that given any regular expression let us say r a large is a regular language and the second part will be a regular language the other way around given any regular language say L there is a regular expression say r such that the language denoted by the regular expression r is same as the regular language that you are given L today let me show you this part and we can take this part next lecture how do I show this the given any regular expression the language denoted is a regular language we follow the same inductive rules that we had for forming regular expression. So, as you will see you see that I had the base cases where sigma is the alphabet phi epsilon and a where a is in sigma these are regular expression. So, it is trivial to give for each one of these regular expressions some d f a or n f a to accept this, but for our purpose it will be simpler the proof will become simpler when I use n f a's with epsilon transition and what more in my construction what we are going to do is that as we use the inductive rules will keep forming n f a's, but these n f a's will and these n f a's will be n f a's with epsilon transitions which is because every n f a with epsilon transition we had proved last time that every n f a with epsilon transitions also accepts only regular languages. So, the n f a's that will define will have n f a's used in the proof of exactly one it is just easy for our purpose because you know this way we will be able to compose put together different n f a's of this kind and it will be very clear what they do. So, let me sigma is the alphabet and what is such an n f a for this for the set denoted by this regular expression that is of course trivial. Now, this is an n f a which accepts the language denoted by this regular expression it is clear to see, but point I want to make is as we are as we will do this please make sure that we keep this in mind that every such n f a will have exactly one final state and there will be no transition out of that final state. What about this one well that is also easy easy to see that this n f a accepts only this language and there is no transition out of this final state and similarly a is a simple and there is only one transition and that is on a. Now, let us use our recursive rules so what did the recursive rule say that if for example, first one says that if r 1 and r 1 and r 2 are regular expressions then so is r 1 plus r. Now, r 1 and r 2 are two regular expressions inductively I assume that for each r 1 and for r 2 for each of these r 1 and r 2 I have an n f a which accepts the same language as the language denoted by r 1. So, let me draw a diagram so this is an n f a m 1 there is and remember that there is exactly one final state with no transition going out of it that is inductively I will assume so what is the point I am saying that this n f a accepts the same language which is denoted by the regular expression r 1. And similarly I have an m 2 inductively I can assume such that l m 2 is a law and I want to this is easy right this I want to get that n f a which will accept the union of these two languages and that is fairly easy to see how we can do it I will put a new initial state have epsilon transitions to the two old initial states and these will no longer be marked as final these two final states because I need to have only one final state and no transition going out of it this will do this is the new machine I obtain using these two and similarly you can see what is going to be for r 1 r 2 for the r 1 r 2 case I had the two this is the n f a which would accept the language denoted by r 1 and let us say this is the n f a case 2 n f a for r 1 n f a for and clearly if I just do this do not do not mark it as final state but have an epsilon transition from here and this is it. So why look at this I mean the argument for that will be similar so consider this combined this new n f a which is the first one n f a now for any string to go from here to here it has to first go from here to here and that means that string must be one in the language of r 1 the language denoted by r 1 and then it has to go the second part of the string will take this combined n f a from this state to this state and that can be done only through so this particular n f a just this part was accepting all strings which are in the language l r 1 this l r 2 and combined machine will accept a string if it takes the machine from here to the only final accepting state and that means it has to use a l r 1 string and then an l r 2 string. So therefore this n f a is the n f a for r 1 r 2 provided this is for r 1 and this is for r 2 and then we put this epsilon transition from here to here we change this final state to non-final state and there is again you notice our invariant that this machine also has exactly one final state and there is no transition out of it because r 2 machine also did not have any transition going out of it more interesting is the case of closure. So let me write that here suppose I have a regular expression r and the this is the corresponding n f a for r the language denoted by r and from this I would like to create or define an n f a for r star and I claim the this will be the n f a so I put a new state which is my initial state I put a new state which is the final state new final state. So this original n f a final state is now marked as non-final however I put an epsilon transition from this state to the old initial state and this transition is epsilon here I have an epsilon and there is one transition from this state to I claim this is the machine which will accept this language and it is not too difficult to see why because what is a string x in l r 1 and this is a string x in l r 1 and this is r star x such a string x can be thought of as x 1, x 2, x k where each x i is in the language denoted by r. So you see on such a string our machine what it is going to do it will first of all first of all it will take this transition to come here and then on x 1 it can reach this state and now it will use an epsilon to come back here then on using x 2 it can reach again this state on epsilon it will come here and so on and finally after x k it will it can take this transition. So therefore x 1 through x k this string can take this n f a from its initial state to one of the final I mean the only final state that is that it has also notice this language contains epsilon and this machine of course has epsilon because there is this transition which is there from the initial state to final state and you can also prove the other way round that suppose there is a string which can take the machine from this initial state to the final state then it is again not difficult by the similar argument that that string has to be in this language. Now one thing I should point out that you may be you might have been tempted not to put this state in state have an epsilon going from here to here. This epsilon has to be there because you know concatenation of strings in the language so that means from the final state there should be a transition back to the initial state but if you put an epsilon transition from here to here then that will not be correct and you can check that out. So what I am saying is that it will it would have been wrong not to have added this initial state new initial state and instead working with the whole initial state keep putting this epsilon transition and to take care of the epsilon that can be there or that has to be there in L R star if you attempted to add an epsilon from here to here that construction would have been wrong you can try to figure out why. In the next lecture we will do the second part what I am trying to say is that using this part what we have done inductively I have shown the for each of the base case I have an NFA which accepts the same language and inductively if I have two expressions R 1 R 2 and the corresponding NFS for them then I can combine these NFS to do for example R 1 R 2 or R 1 plus R 2 or for the NFA for R R star so this part of the proof is taken care of now in the next lecture we will take care of.