 So, in this lecture, we see some introductory concepts and define language formally. We also see some language operations. Language can be seen as system suitable for expression of certain ideas, facts and concepts. In this course, we have this formal languages and automatic theory. We will first see the concepts of formal languages. First, we will see what are the various features, common features across different languages, because while discussing the languages, we need to consider different kinds of languages like natural languages, programming languages and so on. So, what are the various features of a language or common features across the various languages. We see that a language is a collection of sentences, a sentence is a sequence of words and the word is a combination of syllables. So, while formal learning of a language, we consider that there will be a script. Therefore, it is necessary to understand the alphabet of the underlying language. It is the first step for formal learning of a language. Then, one of the various words in this language and finally, how to form sentences from these words. We will see that the last step in the learning of languages is the most difficult part. Therefore, we will postpone for climbing that part. We will concentrate on the other two words. Let us first see what are common features of a word and a sentence. We find that both are just sequence of symbols from the underlying alphabet. Consider this English sentence, a decision problem is a function with a one bit output. Yes or no? We observed that there is nothing but sequence of symbols from the Romani alphabet or English alphabet and some other special symbols like column, quotation mark, full stop and a special symbol is a blank space which is used to separate the various words. Thus, abstractly a sentence or a word may be used interchangeably. So, we do not distinguish between sentence or a word. Now, this we go for definitions of alphabets and strings. We define an alphabet as a non-empty finite set. For example, say a the set containing a, b we considered as an alphabet. We denote an alphabet by sigma. If we have more than one alphabet, then we might be using the subscript sigma 1, sigma 2 to denote the alphabets. Sometimes, we might also use some other symbols say gamma to denote an alphabet. Now, for example, we can have an alphabet like this which contains only a single symbol or we may have an alphabet like this which contains some arbitrary symbols like this. So, this has four different symbols. Normally, we use the small letters from English alphabet towards the beginning. Say a, b, c and so on to denote the symbols of the alphabet or sometimes, we use the digits say 0, 1, 2 and so on to denote symbols. Now, we define a string. A string of an alphabet sigma is a finite sequence of symbols of sigma. For example, just consider sigma to be the set containing the letters a and b. Now, we can define some strings which is a finite sequence of symbols from this set. For example, say a, b is a string over this alphabet. Similarly, a, b, b is also a string containing three symbols from this alphabet. Then, a, b, a, b is also a string from this alphabet containing four symbols. Normally, to define, we denote a sequence by say a, 1, a, 2, say a, n for a, i belonging to sigma. That is how we represent the sequence. But for our convenience, we represent the string, the sequence by a, 1, a, 2, a, n. That means, just by writing the symbols one after another, just supposing the symbols. So, this sequence is denoted by this sequence. So, this is a string. So, in the context, we see that the empty string is denoted by this. So, in this context, we will be using a special symbol epsilon to denote the empty string, because this is required. Now, a string is also known as a word or a sentence. Therefore, we will be using the term string word and sentence inter-sangibly. We see that the set of all strings over an alphabet sigma is denoted by sigma star. For example, if sigma is say 0, 1, then sigma star may be say epsilon, because epsilon is a string. 0 itself is a string, 1 itself is a string. So, 0 and 1 contain only one symbol and epsilon contains no symbol. Then, 0, 0 would contain two symbols, 0, 1, 1, 0, 1, 1, 0, 0, 0 and so on. So, if we collect all those strings, which can be found by using the symbols from sigma, we get sigma star. Although, we see that sigma star is infinite, it is a countable set. Now, let us see some operations using which we can manipulate strings and we can generate strings. First, let us see the operation concatenation. Now, we define say x is a string. Normally, we use the symbols towards an of the English alphabet to denote a string. For example, w, x, y, z, these are used to denote a string. Just consider a string x and another string y. So, the concatenation of string x and y is denoted as x, y. So, if x equal to a 1, a 2, say a n and say y equal to b 1, b 2, say b m, then x, y is nothing but the string that we get just by supposing x and y, that is a 1, a 2, a n, then b 1, b 2, b m. So, if x has n symbols in it and y has n symbols in it, together x y will have n plus n symbols in it. For a string x and an integer n equal to 0, we write x to the power n plus 1 as x to the power n x with the base condition x to the power 0 equal to epsilon. That is, x n is nothing but a string which is obtained by concatenating n copies of x. Whenever n equal to 0, the string x 1, x 2 up to x n represents the empty string epsilon. For example, if x equal to the string a b, then x to the power 0 is epsilon, x to the power 1 is a b itself, x to the power 2 is a b, a b, x to the power 3 is a b, a b, a b and so on. Now, let x be a string one of a with sigma. For every symbol a belonging to sigma, the number of occurrences of a in x shall be denoted by modulus of x suffix a and we denote the length of a string x as modulus of x, which is defined as summation of numbers of occurrences of all symbols in a. That is, the length of a string is obtained by counting the number of symbols in a string. For example, if x equal to say b b a, then length of a string x is 3, because there are 3 symbols in it. If x equal to epsilon, that means the empty string, then the length of a string is 0. If x equal to say a b, a b, a, then the length of a string x is 5. If we denote a n to be the set of all strings of length n over sigma, then one can easily observe that sigma star is the union of a n for n greater than equal to 0. And hence, since a n is a finite set sigma star is a union, all those a n's must be countably infinite. Let us see some more string manipulation operations. We define a sub string like this. So, if x, that means x is a sub string of y, if x occurs in y. That is, you can write y as u x v for some strings u and v belonging to sigma star. For example, if x equal to a b b a, then we say that b b is a sub string of y. We define a sub string of x, because we can write x as u y v, where u equal to a and v equal to the string a and y is the string b b. So, in this case, this string b b y is a sub string of x. Similarly, this b a will be a sub string of x. In this case, u equal to a b, this part and v equal to epsilon, this part. Similarly, you can find many other sub strings from this string x. We say that x is a prefix of y, if this u equal to epsilon. For example, if x equal to say a b a b b, then we define a sub string of a b b. Then, we define a sub string of a b b. Then, we see that x can be written as u y v, where u equal to epsilon, y equal to a b a and v equal to b b. That means, x can be written as u y v and here u equal to epsilon. So, this y a b a is a prefix of a b b a. Similarly, we see that the string a itself is a prefix of x. Similarly, the string a b is a prefix of x. Similarly, the whole string a b a b b is a prefix of x. Similarly, we can say that epsilon is prefix of every string. So, these are the only strings which are prefix of x. Similarly, we define a suffix. We say that x is suffix of y if v equal to epsilon. So, if we write for the same string a b a b b, if we write x to be say u y v where v equal to epsilon say y equal to a b b and u equal to a b. In such a case, we say that y depends this a b b is a suffix of x. Similarly, you can find out all the suffixes for example, say b will be a suffix of x, b b is a suffix of x, a b b is a suffix of x. Similarly, b a b b is a suffix of x and finally, the whole string a b a b b is a suffix of x. We say that epsilon is a suffix of every string x. We adopt this notation modulo sub y with suffix x to denote the number of occurrences of a string x in y. We see that strings are basic elements by language. Let us now define formally what a language is. We define a language over alphabet sigma a subset of sigma star. Since sigma star contains all the strings from the alphabet, any subset of that set consists of a language. Let us see some various examples. Since a language may be any subset, so empty set phi is a language over any alphabet. The singleton set containing the empty string is also a language over any alphabet. Please note that these two sets phi empty set and the singleton set containing epsilon are not identical or not same because the language empty set does not contain any string but the singleton contain a string namely epsilon. Also, you see that the cardinality of the empty set is 0. Whereas, the cardinality of the singleton set is 1. Another example of a language is for example, say the set of all strings over the alphabet 0 1. Let us start with 0. Similarly, the set of all strings over a b c having a c as a sub string is also an example of a language. Let us see some more examples. Say the set of all strings over some alphabet sigma with even number of a just consider what are strings that may be dear in this language. Since the language contains the strings containing even number of a's therefore, a a will be in the language the string a or a b a will be in the language then a b a b b will be in the language because numbers of strings numbers of a's in all the strings are even but a b b does not belong to this language because numbers of a's in this string is not even. Similarly, the set of all strings over some alphabet sigma say containing say a and b with the number of a's is equal to number of b's is also an example of a language and you can find out many strings which are in the language which are not in the language. Consider the set of all palindrums over alphabet sigma there is also an example of a language palindrum means all those words or strings which reach same from left and right. The set of all strings over some alphabet sigma that have an a in the fifth position from the right. Just consider this language and see that the string a b a a belongs to this language because the symbol a appears in the fifth position from the right whereas, a b b b a b does not belong to the language because in the fifth position from right a does not appear. Similarly, the set of all strings over some alphabet sigma with no consecutive a's is also an example of a language and you can give many other examples of language. For example, the set of all strings for a b in which every occurrence of b is not before an occurrence of a. Now, since the language are sets we can apply various set operations such as union, intersection, complement, difference and so on. Similarly, the notion of concatenation which was used in case of strings can also be extended to languages. We define the concatenation for language like this. Concatenation of a pair of language l 1 and l 2 is a set of all strings x y such that x belongs to l 1 and y belongs to l 2. Let us give some examples. Say l 1 is a language containing say the strings 0 and 0 1 whereas, l 2 is the language containing the string say 1 1 0. Then l 1 and l 2 l 1 l 2 the concatenation of l 1 l 2 the set of all strings that can have by concatenating the first string from a concatenate two strings the first one is from l 1 and second one from l 2. That means say 0 1 1 0 then 0 1 1 1 0 these two will be the only strings that you can get by concatenating l 1 and l 2. Now, suppose l 1 is the same language and l 2 or say this l 3 are a language say it is 1 0 and 0 0. Then l 1 l 2 sorry l 1 l 3 will be a set of all strings say 0 1 0 0 then 0 0 0 then 0 1 1 0 and 0 1 0 0 it will have four strings. In general if l 1 has m strings and l 2 has n strings. Then l 1 l 2 will have at most m into n strings you see that concatenation of languages is associative that is for all languages l 1 l 2 and l 3 l 1 concatenation l 2 concatenation l 3 can be written as l 1 concatenation with concatenation of l 2 and l 3. So, in general we write it as l 1 l 2 l 3 without any parenthesis. Then the number of strings in l 1 l 2 is always less than or equal to the product of individual numbers we have already mentioned it. We see that l 1 is a subset of l 1 l 2 if and only if epsilon belongs to l 2 and also the property epsilon belongs to l 1 if and only if l 2 a subset of l 1 l 2. We write l to the n to denote the language which is obtained by concatenating n copies of l. That means l 3 to the power 0 is the singleton set containing epsilon and l 3 to the power n is concatenation of l 3 to the power n minus 1 and l for all n greater than equal to 1. For example, if l is a a b and b l to the power 0 is epsilon l to the power 1 is l itself and l to the power 2 will be we just concatenate the various strings a b a b a b b b b a b a b and there are only strings that we get from by applying concatenation of l with l. In the context of formal languages n or important operation is clinster. So, we define clinster on language l as union of l 3 power n for n greater than equal to 0 and we denote it as lster. For example, if l equal to 0 1 or say if l equal to 0 then lster will be union of l to the power 0 that means epsilon l to the power 1 that means the 0 itself and l to the power 2 is 0 0 l to the power 3 is 0 0 0 and so on that means we get all the strings that can be formed by using 0 or more 0s. Since, an arbitration in l n is of the form x 1 x 2 up to x n for some x i belongs to l and l star is denoted as union of all those l to the power n for n greater than equal to 0. We see that l star can be written as x 1 x 2 x n for some x i belongs to l that means a typical string l n is a concatenation of finitely many strings of l. Note that the clinster of the language l equal to 0 1 over the alphabet 0 1 is l star which is union of l 3 power 0 union l union l 2 and so on. So, it is nothing but a singleton of epsilon union 0 1 union 0 0 0 1 1 0 1 1 and so on. Eventually, we will get a set of all strings of our sigma that means if we take sigma as a language of our sigma then the our notation of sigma star is consistent with the clinster. We define positive close of a language l which denoted as l plus as union of all l to the power n for n greater than equal to 1 that means l star is simply l plus union singleton epsilon. Now, we have represented or describe different kinds of languages, but if we describe the language using or represent the language using a form which is called set of form it helps us under understanding various properties of language better and also it gives some very elegant representation. Let us consider a set of all strings 0 1 let us start with 0 this is an example of a language. Let us represent this in set of form. So, you see that every string here in this language can be written as or seen as 0 x that means 0 must be there in the first place and x may be any string over 0 1. So, therefore, you can write it as the set of all strings 0 x such that x belongs to 0 1 star. So, it is a set builder representation for the given language just consider this set the set of all strings over a b c that have a c as sub string. So, how do you represent this in using set of form. So, any string in this language can be written as x a c y. So, where x may be any string over a b c y may be any string over a b c and a c must appear in this string. So, therefore, we can represent this language l as l equal to x a c y such that x and y belongs to sigma star where sigma is a b c. So, this is a set builder form for a given language. Similarly, the set of all strings over some alphabet sigma with even number of s you can represent this in set builder form by using the notation that we adopted. For example, in this case you can write it as any string x that belongs to sigma star which is basically a b such that the number of occurrences of a in x is twice n for some n. So, this is how we represent the given language containing even numbers of s again the set of all strings over some alphabet sigma with the number of a is equal to the number of b is again the same notation comes in handy to represent this all those strings x belongs to sigma star such that number of occurrences of a and number of occurrence of b in the string same. Consider this the set of all palindromes over an alphabet sigma to represent this you can use the notation the reversal that means the palindromes can be written by the set of all strings x belonging to sigma star such that x equal to x to the r where x to the r is nothing but the reversal of the string. That means if x equal to a 1 a 2 a n then x to the r is simply a n a n minus 1 a 2 a 1. Now, since in case of palindromes every string should read identical from left and right we have this condition. Similarly, one can find out the simpler form for all these languages say the set of all strings over some alphabet sigma that have an a in a fifth position from the right. So, in this case we can write it as x a y since a must appear in a fifth position the set of all strings x a y of the form x a y such that x may be any string from sigma star y may be any string from sigma star. But the condition that the length of y must be equal to 4 since a must appear in the fourth position fifth position. Similarly, the set of all strings over some alphabet with no consecutive edge again you can represent it in similar form like set of all strings x belonging to sigma star such that there is no occurrence of the string a a. So, this is equal to 0 the set of all strings over a b in which every occurrence of b is not before an occurrence of a. So, occurrence of b should not be before an occurrence of a. So, every string here must be of the form a to the power m b to the power n. So, every occurrence of b must be preceded or happy preceded by some as if a occurs then b cannot occur before a. So, all those strings of this form for some m n greater than equal to 0. Now, let us see some exercises consider language L such that which contains all strings over a b such that number of a is odd because it is 2 as n plus 1 for some n give 5 strings it is which are in L and not in L consider language L containing all those strings from 0 1 such that the length of the string length of every string is prime give 5 strings which are not in L also list that any 2 strings in L represent the following language in set form the set of all strings over 0 1 that have 1 in the third position from the left the set of all strings over a b c that and do it b b the set of all strings over a b that have at least 2 occurrence of a b the set of all strings over a b that have at least 2 occurrence of a b a let us see some more exercises that L 1 is this language containing the strings a and a b L 2 is a b and a b a compute L 1 L 2 continuation of 2 languages similarly compute L 1 L 2 given L 1 is a singleton set containing 0 and L 2 is c master and just consider the language L 1 out of form x a where x belongs to c master and L 2 out of form b y y belongs to c master describe the language L 1 L 2 in English let L 1 b a to power i b to power j i j greater than equal to 0 and L 2 b a to power m m greater than equal to 1 right L 1 interaction L 2 in set trivial form and properties thereнаяwe discuss eyelids Phelps actually may be proved. So, at the combination of languages as a stoic that is for all languages L 1 L 2 L 3 L 1 L 2 continuation put L 2 equal to L 1 convention with L 2 L 3. Then, so that the numbers of strings in L 1 L 2 is always less than or equal to the product of individual numbers and the other two properties as well say L 1 is subset of L 1 L 2 if and only if epsilon belongs to L 2 similarly epsilon belongs to L 1 if and only if L 2 is a subset of L 1 L 2. Let us show that concombination of languages is associative that means if L 1 L 2 and L 3 are languages then L 1 concombination L 2 L 3 is equal to L 1 L 2 concombination L 3. To prove this first observe that the concombination of string is also associative that means if x y z are strings then first connect y and z and take the concombination of x with that string it will be equivalent to x y concombination z therefore you can write it as x y z. Now to show that concombination of string is concombination of languages is associative just consider any string x which belongs to L 1 concombination L 2 L 3 by definition you see that x can be written as x 1 x 2 where x 1 belongs to L 1 and x 2 belongs to L 2 L 3. Therefore, since x 2 belongs to L 2 L 3 we know that x 2 can be written as some y z or y 1 y 2 where y 1 belongs to L 2 and y 2 belongs to L 3 that means x equal to x 1 x 2 x 1 y 1 y 2 y 1 belong to L 2 and y 2 belong to L 3. Now since concombination of string is associative you can write it as x 1 y 1 y 2 now again here x 1 belongs to L 1 and y 1 belongs to L 2 therefore x 1 y 1 belongs to L 1 L 2 hence you can write it as some z y 2 where z equal to x 1 y 1 and z belongs to L 1 L 2 therefore we say that or you can conclude that x belongs to L 1 L 2 concombination L 3 therefore every string that belongs to this set L 1 L 2 L 3 it will belong to L 1 L 2 L 3 similarly the converse can also be proved. Now let us prove the property that L 1 is subset of L 1 L 2 if and only if epsilon belongs to L 2 we observed that the if what is straightforward for instance if epsilon belongs to L 2 then for any x belonging to L 1 we have x equal to x epsilon that belongs to L 1 L 2 on the other hand suppose epsilon does not belong to L 2 now note that a string x belonging to L 1 of shortest length in L 1 cannot be in L 1 L 2 this is because if x equal to y z for some y belonging to L 1 and a non-interesting z belonging to L 2 then length of y is less than length of x now there is a contradiction to our assumption that x is of shortest length in L 1 therefore L 1 is not a subset of L 1 L 2 hence