 Welcome back to a learning analytics tool course. Let us continue the performance metrics in machine learning classifiers. So, we saw that Kappa is used for performance to pick which classifier is better if you have in-balance data set and everything. There is a binary classification problem. We saw that out of 1000 students, we want to predict how many students will get more than 90 marks in final exam. Consider the table looks like this, that is, 980 students got less than 90 marks and predicted correctly by the classifier. And a classifier predict that also the other 20 students who are actually got more than 90 marks as a non-performer of 90 marks or more than 90 marks. Consider this table, what will be the Kappa score and what will be the accuracy? The classifier did nothing, the sense the classifier did not create any rule. You might have given 10 features, does not matter. The classifier created a zero rule classification problem, that is, there is no rule. It simply classified everything into one class or it can be one simple rule. Please classify all the data into majority class, majority class is zero. So, the classifier did simple classification, classifying everything into one majority class. So, in this problem, compute accuracy and Kappa and infer why the values are, you get accuracy values, say something and why Kappa score is like that. After writing down your answer, please resume to continue. So, binary classification problem, let us look at accuracy and Kappa, accuracy is 98 percentage because 980 students out of 1000. So, 98 percentage, there is no doubt, it is accuracy in percentage and Kappa is zero. So, let us see, can we have a negative Kappa? So, think of a problem and create a own confusion table and try to compute negative, get a negative Kappa in it. If negative Kappa, what is the meaning of it? If it is negative, which means the classifier is doing very poorly compared even by chance. So, what is the meaning of Kappa is zero? It is the classifier is simply performing very equal to the chance level. So, how to infer the Kappa values? So, in the last slide, we mentioned that Kappa equal to 0.4 means what? Whether 0.4 is good or bad? There is no definite answer for that, I said that it depends on the domain. For example, in education domain, 0.2 to 0.4 is fair, if you get a Kappa score, about 0.4 is considered to be good. And if you have multiclass problems, say you have to classify the students into multiclass like the students will get pass, student will get more than 50 marks, 260 marks, 60 to 70 marks. If a multiclass problems, then Kappa score of 0.4 is also considered to be good. The value is 0.4 here, it is not this. But for an intermediate derivative problem like what we saw in the last class, we should have a agreement between two rater, at least 0.8 or greater than 0.8. If you have a agreement say around 0.6, it is not good to use that interrator, the two researchers for the observation in research. So, make sure that you have interrator relativity greater than 0.8. And if they are not achieving it, train again the researchers, train them again, discuss with the co-researchers and find out where the mistakes, why the mistakes have happened, again do the measuring the students frustration or something like that. Then after measuring the students effective states, again compute the Kappa and make sure that you get better than 0.8. If you are not at 0.8, you may not able to publish your research in a good reputed journals. So, let us move on to the other metric. In this class, we will see two more metrics to pick the better ML classifier. There is a one is receiver operating characteristic curve. This particular RLC curve is come from signals or audio signals transmission receiving an electronic communication. So, what is receiver operating characteristic curve? It is actually a graphical plot to identify the performance of binary classifier. So, it is a graph between true positive rate and true negative rate. So, given this table true positive false positive remember this table two classes ago. So, true positive rate is true positive divided by true positive plus false negative, that is we call, otherwise it is called sensitivity. True negative rate is true negative divided by true negative plus false positive. It is called specificity and true negative rate is sometimes called as 1 minus false positive rate. False positive rate is false positive divided by false positive plus 2 negative. So, receiver operating curve is plotting sensitivity versus 1 minus specificity. Let us consider this is the plot and you have values for 0 to 1 and 0 to 1 here. So, let us see if you have a classifier which gives very good specificity that is very good specificity and 1 minus 1 will be 0, the classifier will be here and sensitivity that is recall rate is very low then this value will be here. This is not a good score. Similarly, if you have a very good recall but very poor specificity, what is specificity? Poor means if it is 0, 1 minus 0 will be 1, it will be here. So, you are having a classifier in either very good sensitivity or very good specificity is not good. So, like only one of them is very good and other one of them is not good if it is 0, it is not good. So, the values lie below this line which indicates the classifier is performing bad. The values which lies above this considers the classifiers of performing good. For example, if you have a perfect classifier which is high recall and say high specificity, then 1 minus 1 will be 0 and recall will be 1. So, the value will be this is the best classifier. This is the best these two are just a random guess or average. Do not even consider pick the classifier in this line. Classifiers perform below this. This is the worst. Do not even consider picking the classifier. If something happens like this, accuracy will see like 0, do not worry. And you will have the classifier lying in this particular line, do not pick those classifiers. The classifier lying in this line here or here pick based on where the classifier is. So, this is better than all other two or this is the best scenario. Let us do a small activity, then we will come back to this how we seek out in detail. So, let us start a small activity to understand the how we seek out better. So, you have a values accuracy, TPR and TNR. You know that TNR is the specificity. So, we have to say 1 minus specificity through negative rate and through positive rate. So, you have 3 classifiers and the performance is given to you. And which classifier you should choose? Use the ROC curve to plot and pick a light classifier. After you pick the light classifier, you can resume the video to continue. Let us see which classifier is doing good. You might have done it. I will just repeat it again. So, in accuracy, we do not need accuracy for ROC at all. There is no point in keeping that here. So, the TPR through positive rate that is sensitivity is 0.7 for classifier 1, TNR is 0.4, which means it will be 0.6. 1 minus TNR is 0.6. So, it will be like here. So, X will be 0.6 and it will be 0.4. This is classifier 1. Let us erase this. So, this is classifier 1. And TPR is 0.8 and this is 0.9, TPR equal to 0.8, TNR is 0.9, it will be like here. So, here this is classifier 2. And the TPR is 0.3 and TNR is 0.5. So, 3.5 will be this. If TNR is 0.5, the value will be 0.5 minus 0.5 will be another 1 minus 0.5 is 0.5 and 0.3. Let us get through negative values this. This is classifier 3. So, we can see that the best classifier is perfectly the one which is here, the classifier 2. If you compute it like that, it is good. So, you understood what is ROC curve is. If not, please look at the slides and also check Wikipedia what is ROC curve means. It is simply the curve between sensitivity and the 1 minus specificity to measure which classifier is doing good, which assumes both precision and recall also true positive rate and true negative rate in order to pick the classifier. It is not the precision recall rate and false positive rate. So, we saw what is ROC. The other important metric in machine learning to pick the right classifier with the right threshold is called area under curve. Assume a binary classifier developed to classify whether a student will pass the exam or not. But the classifier response is not simple 0 or 1 instead it gives the probability value of being 1 or 0 that is 0.8 or 0.7 or 0.0 or 0.1 or 0.2. Consider this is the table. This is the true value. There are like 10 students, there are 3 students will pass the exam, 3 will not pass again 1, 0, 1, 1. This is a predicted value. Instead of predicted value being a 1 or 0, we saw that in a previous slides or previous classes. Here it will give that some class predicted value probability say 0.8, 0.4, 0.7, some values like this. So, you can apply threshold on this particular column then you can create your own classes. You can classify these students into class 1 or 0. This is the testing data set. Consider that. So, if I apply threshold is equal to 1, what happens? None of them is equal to 1. So, we classify everything as a 0. If the threshold equal to or greater than 1, you assume the value is equal to class 61. Else, you assume the class equal to 0. If you apply this threshold and you see this 0.8 is not equal to a greater than 1, which means all of them are classified as a 0. If you apply the threshold is 0.8, this 0.8, so this classifier be classes as 1, this everything else as 0 and this will be classified as 1, everything else as 0. There are 2 positive classes and 8 negative classes here. 2 will pass, 2 will not pass. If you apply the threshold is 0.6, this classifier is this good and this is good and this is pass and this is pass everything else will be classified as a 0, not pass. Similarly, for point threshold equal to 0.4, 0.2 and threshold equal to 1. If threshold equal to 1, sorry threshold equal to 0, everyone will be considered to be pass. So, let us see you have a option to choose which threshold to select which classifier for your classifier. Let us assume your option to choose threshold for your classifier based on the performance. So, in this slide, we are thinking about the classifiers which gives a probability value instead of the either 1 or 0 output. That will be the most of the most cases we will be dealing with in our classifiers. So, let us assume only the threshold equal to 1 and let us compute the position and recall. So, as I discussed in the previous slide, we have a 10 students and their value, if I assume 10 threshold equal to 1, we will classify all the students will be not pass. So, the value will be 6 and 10. For example, like we assumed everybody is not pass, but 6 students are actually 1 and 4 are 0 and classified correctly. So, the true positive rate will be true positive rate is 0 divided by 6 is 0. So, true negative rate is 2 negative divided by 4 by 4 plus 0 is 4 and that is called specificity and false positive 1 minus specificity that is what you want to plot in the graph. So, let us do a small activity. I want you to stop the video and really go and compute false positive rate and true positive rate for all the threshold values. Use the table given in a previous slide, go and compute threshold for each and every threshold, false positive true positive. Please compute so that if you have any mistakes which can be corrected and if I made any mistakes, you can inform me in the forums. After writing down all the false positive rate and true positive rate, please let me continue. So, here is the true positive, so a false positive rate and true positive rates for all the threshold values and we also plotted them in the curve, in the AUC curve. So, in area under characteristic curve that is simple, is similar to AUC, right? It is not just AUC. So, this is your operator characteristic curve. We are going to compute AUC in the AUC, it is not AUC. So, in AUC curve we have plotted FPR versus TPR, FPR is 1 minus specificity. So, for a threshold equal to 1, the value is 0, 0. So, threshold equal to 1 here. For threshold equal to 0.8, the false positive rate is 0, but true positive rate is 0.33. So, here the threshold equal to 0.8. For threshold equal to 0.6, false positive rate is 0.25 and 0.66. So, threshold equal to 0.6, it will be threshold equal to 0.4, 0.2 and equal to 0. By, you now know that if the value lies in this line that is this line, it is kind of not good random guess something like that and anything above is good. So, this classifier is doing good. Let us draw the curve. The curve is not exactly the curve instead it is a step curve. The area and the curve is you have to compute the area under all of this that is all this area should be computed. So, I am not going to compute the areas here, but you have to compute all the areas if you want to compute AUC and AUC. So, this is a one classifier and based on the different threshold we can pick which threshold is good. There might be another classifier say another classifier which might have had different values that might give a different area under curve. So, based on the area under curve, we can pick the better classifier. More importantly, it is not about picking the better classifier, it is important to pick which threshold value you need for your classifier. So, in this video we saw what is receiver operating characteristic curve and also how to compute area under curve and to pick right classifier also to pick the right threshold. So, is there any other metrics in machine learning to pick the right classifier? Yes, there are a lot of other metrics like A prime. So, but we will stop here. We know this three, four metrics is enough for us to pick the right classifier for the binary classification problems. So, do you need to compute AUC, AUC manually every time? I said no, it is simply used tools or the libraries available in each script language to compute it. So, the idea here is that you have to understand what is AUC and what is area under curve or it is computed so that when someone show you the AUC curve or area under curve you should know this means this I should pick the right classifier. The receiver operating curve shows me there is more area it covered so I might pick this one. So, the best classifier always will look like the completes the area under curve will be one like it completes all the area but it is highly unlikely that we see that kind of classifiers. Thank you.