 Hello everyone today we will see retrieval performance evaluation part 2. So learning outcome for this session is evaluate the precision at 11 standard recall levels using interpolation formula and students will be able to evaluate retrieval performance of algorithms for individual query. So just recall the definitions of recall and precision. What is recall? How many relevant documents has been retrieved out of total relevant documents and what is precision? How many relevant documents has been retrieved in the answer set? So consider this example. In this example for a given information request there are three relevant documents and the result of the system is as shown. So 15 documents are retrieved in the answer set. Now let us calculate the precision and recall. So first relevant document is D 56. So in the answer set we are having three documents. So recall will be 1 upon 3 that is 33.3 percent and precision will also be 1 upon 3 that is 33.3 percent. Now for the second relevant document that is which is nothing but D 129. Here there are total eight documents in the answer set. So recall will be two documents up two upon three that is 66.6 percent recall and precision will be two by eight that is 25 percent. So 25 percent precision at 66.6 percent recall. For the third document now we are having three relevant documents and the third document is also retrieved. So we are getting three upon three that is 100 percent recall level whereas precision out of 15 documents three are relevant so 20 percent precision. So 20 percent precision at 100 percent recall level. So when you will obtain 100 percent recall level just now we have seen. So again pause the video and think for a moment. So yes when all the relevant documents are retrieved in the answer set we can say that we are getting the 100 percent recall level. So here is a example that we have already seen that three relevant documents are retrieved in the answer set. Now we have to plot a graph of precision versus recall for 11 standard recall levels. But the example that we have seen we have got only three recall levels. Again these are not the standard recall levels. So how to obtain this precision at standard recall levels? So we have to use this interpolation formula. So what is interpolation formula is we have to check the maximum precision at the recall level from rj to rj plus 1 if we want to calculate the precision for rj which is nothing but p of rj is equal to max of rj less than or equal to r less than rj plus 1 p of r. So for example precision at recall level 30 percent is the maximum precision of recall levels between 30 to 40 percent. So now let us calculate the recall levels for the given example. So precision for standard recall levels using interpolation formula will be now we are having the first recall level is 33.3 percent. So for the recall level 0, 10, 20 and 30 which are less than or equal to this 33.33 percent the precision will be same which is nothing but 33.33 percent. Next recall levels 40, 50 and 60 are in between 50 to 60. So which is nothing sorry 60 to 70 which is nothing but the recall level of 66.6. So precision at this level is 25 percent which will be interpolated for this 3 that is 40, 50 and 60 percent recall level. And the next recall level is 100 percent. So the recall levels 70, 80 and 90 percent will get precision of 20 percent which is same as the recall level of 100 percent. So assume that here we have received 33.3 also we are having 38 percent recall level. So 33.3 and 38 percent recall level are between 30 to 40. So whose precision is maximum that we are going to take. So thus we have obtained the precision for this 11 standard recall levels and then we have plotted graph of precision versus recall. Now consider this example here there are two algorithms and we are having this average recall versus precision graph. Now this algorithm is performing better for lower recall levels whereas the other algorithm is performing better for higher recall levels. So thus we can use this precision versus recall graph for comparing the retrieval performance over a set of example queries. In the previous lecture we have seen that the algorithm will run for n number of queries we will find the average and then we will plot a graph. So here these graphs are or these graphs are plot after getting this average of the precision running for set of examples. But there are some situations where we want to compare the retrieval performance of the algorithm for individual query. We need to see that which algorithm is performing better for the given query. So for this we need to use the single value summaries. So what can be the single value summary is nothing but single precision value for each query. Instead of averaging we will find the precision value. So for example summary of corresponding precision versus recall that we can take or we can take precision at specific level. So there are some methods for finding the single value summary. So first is average precision at scene 11 documents. So find the average of all the precision obtained after each relevant documents observed. So consider the example which we discussed we have got these three precision. So we can take the average and point 63 will be the average precision for the scene relevant documents. Second method is R precision. So compute the precision at the Rth position in ranking where R is the total number of relevant documents for current query. So for example in this example there are 10 relevant documents. So in the answer set in the first 10 documents how many documents are relevant that we need to find. So if you look at this example 1, 2, 3 and 4. So 4 are the relevant documents so R precision value will be 0.4 or 40 percent. Consider the second example here the number of relevant documents are 3. So how many documents are retrieved in the answer set which are relevant. So in the first three only one document is relevant so R precision will be 0.33 that is R precision. Next is precision histogram. So this histogram is used to compare the retrieval performance of the two algorithms. So for example this RPA is for algorithm A whereas RPB is for algorithm B. So here we are finding the difference in between the precision value of algorithm A and algorithm B. So if the difference is same or 0 we can say that both algorithms have equal performance. If the difference is greater we can say that algorithm A is performing better than algorithm B and if the difference is minus or negative we can say that algorithm B is better than algorithm A. Now look at this histogram. So we have run 10 queries. So if you look at the histogram you can see that the query number 4 and 5 is having the difference of negative means that algorithm is performing B is performing better for query number 4 and 5 whereas for the rest of the queries algorithm A is performing better. Also we can do the statistics or we can store the statistics about the execution of that particular algorithm. So what can be the summary table? So number of queries used in the task, total number of documents retrieved by the queries, total number of relevant documents which are effectively retrieved when all the queries are considered and total number of relevant documents which could have been retrieved by all the queries. So from this stored data we can have different queries and find out the result of that particular algorithm. Now problems with the precision and recall are. So here for calculating this precision and recall we require the detailed knowledge of the documents in the collection but with the large database such knowledge is unavailable. So precision and recall are related measures which capture the different aspects of set of the retrieved documents but there are some situations where we need to combine these two and it will be more appropriate when we will combine this. So next is that precision and recall measure effectiveness or the set of queries which are processed in the batch mode but in day to day life we see that the interactivity is a key aspect of the retrieval process unless user is satisfied she or she will not stop the process. Also these two measures are easy to define when there is a linear ordering in the retrieved documents but the systems with weak order precision and recall might be inadequate. So there are the alternative measures which can be used to calculate the performance. So one is harmonic mean where this is a single measure which combines recall and precision. So if you look at the formula the harmonic mean for the jth level will be 2 upon 1 upon r of j plus 1 upon p of j where r of j is the recall at jth level and p of j is precision at jth level. So f assumes to have a value in between 0 to 1. So just pause the video and think when you will get the 0 value for as an f measure or harmonic mean will be 0 or 1 in which condition. So it is obvious that 0 means no relevant document has been retrieved whereas 1 means all the rank documents are relevant. Now instead of combining these two there are some situations where user is more interested in either recall or in precision. So here e of j is the major or it is a e major which combines both but still you can give the you can decide to whom you should give the more interest. So if b is 1 it will be treated same as harmonic mean. If b is greater than 1 means you are more interested in precision and if b is less than 1 it means that you are more interested in recall. Now when we are calculating this recall and precision it is the assumption that the set of relevant documents in the query is same independent of the user. But in real life it is not the every user has different interpretation for the relevance of the document. So some user will say that document a is more relevant whereas for the other user a will not be that much relevant. So we need to consider user oriented measures. So consider that r is a set of relevant documents a is the answer set. Now what is set u relevant documents which are known to the user whereas set rk is that relevant documents known to the user and which are retrieved and ru is a set which are the relevant documents which were previously unknown to user and now after retrieving they are known to user. So what is the coverage fraction of documents known to the user which are retrieved which is nothing but mod rk upon mod u and what is novelty fraction of relevant documents retrieved which are unknown to user means ru upon ru plus rk. So high coverage means system is finding most of the relevant documents user expects to see and high novelty ratio indicates that system is revealing many new documents which were previously unknown to user. Also we are having the measures relative recall which is the ratio of number of related documents found to number of documents user expected to find whereas recall efforts number of documents user expected to find to number of relevant documents examine to find the relevant documents. So these are the measures which can be used to calculate the retrieval performance of the system. Thank you.