 Hello everyone, today we are going to learn one of the classic information retrieval model that is Boolean model. So, learning outcome for this session is students will be able to create Boolean model and will be able to retrieve the documents using given, retrieve the documents for a given query using this Boolean model. So, what is Boolean model? It is a simple retrieval model based on set theory and Boolean algebra, a framework which is easy to grasp by a common user because we have to just think about the Boolean expressions and then queries are specified as in Boolean expressions which has precise semantics. So, how the framework is defined in Boolean model? The index terms are considered as in present or absent in a document that is index term weights are binary 0 or 1. A query is composed of index terms connected by these three connectives and or and not. And then the query essentially a conventional Boolean expression will be represented as disjunction of conjunctive vectors. Let us see with the definition of this Boolean model. So, index terms are binary. So, Wij that is nothing but the weight of the ith term in jth document which will be 0 or 1. A query queue is in conventional Boolean expression. Once we have given a query we have to convert it into its disjunctive normal form. So, this QDNF vector is indexative normal form for query queue. Whereas, in that DNF we can have the conjunctive components which is nothing but here QCC. So, once we have given enter a query we are going to convert it into its DNF and then after converting into the DNF we are going to find the similarity of the document or which are the documents relevant to the given query that will be identified. So, how it will be identified? Here the result of the similarity will be again binary that is 1 or 0. So, how it will be identified? We have to find any of the conjunctive component that is QCC has to be taken and for all the keywords in that conjunctive components if the weight in the document and the component is same for all the terms and then it will be 1 otherwise it will be 0. While looking for the example we will say it in detail but this is a similarity degree of similarity or ranking of the document. Now, let us see it with one example. So, this is in collection. So, of course, in any of the search engine or the information retrieval we will be having the collection of the text. So, here 3 documents that has been considered. So, what is the first task for the given collection is that we have to find the set of index terms. So, there are multiple ways to find the set of index terms like remove the stop words then do the stemming and so on and collect the keywords. So, these are some of the identified keywords for these 3 documents. So, some keywords are again repeating and some are unique. So, we are having total here collection of 12 keywords from these 3 documents. So, this is from k 1 to q 1. So, once we have identified this index terms what is the next part is that we have to define the weight vectors in Boolean model. So, we are having 3 documents. So, we are having 3 weight vectors here. If as we can see here it is 1 or 0. So, this is keywords k 1 to k 6 are present in the first document its weight is 1 whereas, k 7 to q 12 are not present. So, that is going to be 0. In the same manner if we can see here that k 7 to k 9 these are the 3 and the mountain is present in the second document that is why it is 1 and then 3. And then again for d 3 it is mount and the remaining 3 keywords are present which are 1 remaining are 0. This is how we have defined weight vectors in Boolean model for the document. Now, let us see how to execute a query. Before going for execution retrieval of the documents for a query what we have to see is that once a given expression or a query how to find its DNF. Of course, in detail you can visit any of the video for finding the pdnf, but let us look it in brief. Whatever may be the query for finding the conjunctive components here we have to see that every term is present in that QCC component. And it should be the disjunction of conjunction. Right now this is not in the DNF. So, we have to convert it. After simplifying we have got this disjunction of conjunction still it is not in DNF because every term is not present and that is why we can introduce missing term. So, while introducing the missing term its meaning should not change and that is why retrieval or negation retrieval which comes to true has been introduced and then after simplification we are going to get here disjunction of conjunction of every component which is nothing but DNF. So, once we have identified this DNF we have to convert it into the weight vector wherever the term is present without negation it the weight will be 1 and if it is present with negation it will be weight will be 0 and that is how this is 1 1 1 1 1 0 and then 1 0 0. Now, at this moment pause the video and then try to find out the DNF for this given query which is nothing but Indian mountain or Japan mountain that we want to search in our collection. Yeah of course, it is in disjunction of conjunction, but all the terms are not present. So, introduce the missing term and find out the DNF. I hope this is the same answer that you have got. So, three components. So, Indian mountain Japan, Indian mountain negation Japan and then Japan mountain negation Indian. So, let us look at this. So, Indian 111 Indian mountain and negation Japan that is what will be 110 and then negation Indian which is 0 and then mountain and Japan. This is how we have obtained this weight vector for a query. So, once you have given a query or the Boolean expression convert it into the weight vector and after identifying this weight vector for a query then we will start retrieval part or finding the similarity of the document. So, these are the three documents. Now, if you can see the K8, K4 and K12 are the keywords in the query that are present. Now, let us see whether which document is matching with the this particular query. So, 010 is the weight of this keywords in the document D1 whereas our QCC component 1 it is 111. So, at the first only it is not matching. So, we should go for the second one. So, this 0 and 1 is also not matching. So, that we should go for the third one. Now, this 0 and this 0 is matching 1 and I am sorry this 1 and 1 is matching, but this 0 and 1 is not matching. So, no component is matching with this weight vector. So, that is why its similarity will be 0. In the same manner we can calculate it for the second and third document. So, second document again this weight for this is 110 and so, this is from the weight vector. Now, let us first see that QCC component 1 it is not matching because it is 111. Go for the second component. Now, it is matching with 110. See what we have seen in the formula at least for the one component all the keywords weight should match. Since one component has been matched we did not require to go for the third component and that is why similarity is going to be 1. Let us look at for the third document. Now, it is 001. So, again QCC component 1, 2 and 3 are not matching at all. So, the similarity will be 0. So, what we have obtained is that only relevant document is D2. Rest of the two documents are not valid or relevant for the given query. In this way so, what we can do is that first try to obtain the document vectors as far as document is not changed or added document weight vectors are not going to be changed. What is only the changing part it is going to be the query vector that we are going to change or that we are going to enter accordingly get the DNF get the weight vector and try to match it with every component. So, what are the advantage of this Boolean model? It is in clear formalism behind the model it is very simple since it is primary for implementation also we have to assign the binary weight. So, just find whether the keyword is present or not and then assign. So, it is very simple to implement the model, but what are the disadvantages? See it we are looking for the information retrieval and in information retrieval we go for the approximate matching or partial matching, but in this Boolean model it is as good as like a data retrieval. So, either relevant or not relevant. So, there is no partial matching in the query condition this is one. Second thing is that giving a user need as a Boolean expression is going to be tedious task for a normal user or a layman over there. This is the second and the third one is that due to its exact matching what can happen is that there can be too few documents or too more documents as a retrieval. So, what we want to find a next model which is going beyond this exact matching or which will give the notion of partial matching and that is why we should move to the next model that is a vector model in the next lecture. Thank you.