 Hey everyone, welcome to another episode and today we are going to talk about ROC curves and precision recall curves Understand why they're important what they do and even come up with an implementation from scratch in Python of how to just come About both precision recall curves and ROC curves. So let's get to it All right, so in this notebook importing of basic Libraries pandas data frame manipulation library NumPy is our math library make classification is Used to create our classification dummy data set just to create dummy data test train split to split test and train sets And I have like a bunch of built-in metrics here for Creating well the meat of this entire video, which is precision recall curves as well as ROC curves We'll also code these out from scratch individually. So that's more to come below And then we have logistic regression just a built-in classifier That'll be the example that we're using for the model in this video Okay, so now I kind of split this up into two cases of balanced and unbalanced classes Because you can really see the difference between ROC curves and precision recall curves for both of these cases So let's get to that So in this first little chunk of cell right here, I'm just creating our dummy data Basically 10,000 samples four features all of them good numerical features Where the positive and negative classes are evenly split 5,000 each And then I'm creating a bunch of I mean just creating column names for each called feet one to feet four And uh converting it to a data frame and that's it and you get here's like an example Sample from that data frame of 10,000 rows. So you have four features numeric and label Okay, so now what we're doing is We'll basically split our data set into test and train sets with like a 90 10 split I'm putting shuffle is equal to false here because This is very application dependent. Uh, you know certain situations data has Time stamps or date stamps and you don't want to shuffle them together because you'll be training potentially on data That's in your test set and also vice versa, which could lead to data leakage. We don't want that Um, but depending on the situation you can you can change this And then yeah, just splitting into tests and train sets And with that data, we are fitting a typical logistic regression classifier Nothing too fancy here Now for evaluating our model, uh, we have a little function here called the evaluation where we're passing in the classifier that we fitted Along with some x which will be a set of features and y which will be the true ground labels Now in this first line, we are just getting predictions from our model pass all the x's in get the predicted y's Which will be in the form of a probability and hence I called it y predict proba Right, uh, and with this first line right here We're calling the psychic learns precision recall curve by passing in the ground label as well as the set of predictions that we've made And using this it'll give us three items back all three of them are lists or arrays So let's say like, um, this will be a list of precisions This will be a list of recalls and this will be the list of thresholds at which we got each precision and each Recall so one of these could be 50%. Um, maybe like For the corresponding, let's say the 10th entry is 50 precision in the 10th entry and the precision array would be the precision at 50% Uh, and the 10 and then at the 50 threshold and same for recall So what essentially this is doing is like you have this model, right? And it's really hard to determine sometimes what threshold is Suitable typically we go at 50% but in some cases 30% may be better some cases 70% may be better And by using this function here We can get the value of precision and recall at each of the thresholds that we could potentially choose And this is kind of an edge over simply arbitrarily choosing a specific precision and specific recall Because now like by just computing the area under that curve We have a single metric that could take into account any threshold And is not based on or dependent on any source of arbitrariness that we would you know We would use in modeling and hence, you know a a ucs are pretty useful in general Um, I think I skipped this line and this is just used for The same thing instead of a precision recall curve where you know the y-axis precision x-axis is recall We have the typical roc curves where you know the y-axis could be like True positive rate and the x-axis could be false positive rate Again, we pass in the ground labels and the predictions and we get similar arrays of False positive rates true positive rates and the thresholds at which they exist Um just returning a json dictionary for uh, he just come now we can with a uc again It's a built-in function that takes a list of all the x-axis the y-axis and the x-axis And then it is used well just to compute the area under the corresponding curve created by that graph, right? It's pretty neat very simple functions that will return a number And so if we just call our functions on our training test set we get the training test a ucs for both cases Now just looking at the test set. Well, I mean they're all like 95 precision 95 percent recall So sorry 95 percent. Um a uc under the roc curve 95 percent a uc under the pr precision recall curve So yeah all in all pretty good model. Yeah pretty well performing Now there's not really much of a difference between these two until we actually look at case two Which isn't using this on an unbalanced data set So like before we create a classification data set But this time the weights are like a 90 percent 90 We're going to create like 90 percent of the labels We're just 9 000 of the samples will be of the negative class and only 1 000 will be of the positive class And we split our data set in the same way But this time when training our model we're going to pass in this parameter called class weight is equal to balanced With this we're telling logistic regression to say hey because there are nine times more Negative samples and there are positive samples when training We want to weight the samples of the positive class nine times more than that of the negative class This is kind of required this and potentially under sampling would also be required because you need your model to be able to predict Or rather interpret and detect these small instances of positive classes or positive labels that occur in your data Without it. It's just going to perform much worse Now uh looking at uh We're just calling the same evaluation function And it's only here now that you can see the true difference between just a simple roc curve versus a precision recall curve Right while the roc values might be super high and they they get you really excited This 70 percent is more likely how your model is currently performing Right because my hunch is and you can also check this with the data is that there is so Because of the few positive classes your model finds it a little hard to detect them And so our precision is tanked but our recall is pretty high And that's why this might be a more indicative of actual performance and the kind of performance that we would be looking for anyways So yeah, that's also uh kind of what we called out here So why use precision recall curves over roc curves in some cases? It's because of class imbalances Okay, that's like the main takeaway from this section right here that we just went through but all this is just using you know, these black box You know functions that are built in so well in scikit-learn that you should be using by the way Uh, but let's pick it apart a little bit and try to implement those functions ourselves So let's create them so that you get a better understanding Now the first thing to note though I kind of put this under a big note right here is that the area under a curve in scikit-learn It's typically found by using the trapezoid rule So if you and as a proof like you can kind of go into scikit-learn's documentation and you see okay This is like uh, this is the auc function that we've been using before And if you scroll down here, you'll kind of see this little highlighted thing that I've made here Where it's calling a numpy dot trap z and passing in y and x y and x being like lists of values like List of precision lists of recalls or a list of true positives a list of false positives like I mentioned before And with this it's basically going to perform the trapezoid rule if I mean, I'm not sure if you remember that but Um, if you kind of remember in high school, there is this little let me try to open it. Ah These are Yeah I might have closed it here, but let me actually just google a trapezoid rule real quick and get back to you Okay, so this is a super simple example where you know, maybe your curve might be like A little bit more smooth than this but with the trapezoid rule what we can do is you know kind of create Very distinct lines and it's an it's a method of finding an area by splitting up your entire curve Into you know, probably vertical sections and then finding the area of those components And typically those components will be like, you know from 0.25 From 0.25 to 0.5 It looks like a trapezoid and you can use like the trapezium formula in order the trapezium area formula or area within a trapezium To just find, you know, this area and similarly would find this big chunk area And then we would find this little triangle area and then add them all up to get the overall corresponding area of this curve And that's how you would kind of find uh area under a curve, right? And that's kind of how you would use a trapezoid rule in general, right? And that's kind of what's going on here Now what we're going to do, uh, we're going to kind of use that logic too. Just for context So first ROC AUC manual is our manual, uh, it's this is going to be the manual function that we use which inputs Uh, ground a set of ground labels and a set of predictions kind of like what we did before right over here Yeah, same inputs, but the outputs are going to be well I'm just going to output the area under the curve, which is the AUC right So first of all what we're doing is we're creating data frames for A set of predictions and a set of labels two columns in that data frame And are initializing all the false positives and true positives that occur to be just empty lists And let's say that we want to evaluate this for 100 thresholds between zero and one at, you know intervals of like zero zero point zero one zero point zero two So if we had used those thresholds like one percent two percent all the way up to 100 What would have been the corresponding false positive rates and true positive rates? That we're doing in this function here. So for every one of those thresholds I'm computing true and false positives true and false negatives A false false positives and false negatives sorry and then appending it to our list And what I'm doing right here is I'm saying like direction is either negative or positive Because there are situations where you know your precision Because of the nature of the curve it might be in one direction or in like sloping upwards or just like coming downwards that The the value that's returned by and numpy's trap c could be negative So we just want the absolute value of that and we're returning that as the main area All right, so now that that's taken care of for Um, I'm going to do the same thing now for precision recall. Basically, we're taking in ground labels We're taking in the predictions We are constructing data frame a data frame of two columns of predictions and the corresponding label the predictions being in terms of probability values, by the way And then we're going to say, okay Let's create two empty lists of precision recall the thresholds We're going to do from zero to 100 and we are going to compute precision and recall So precision recall will end up to be like a list of 100 elements at different time at different items It'll be like the um the precision recall at different thresholds And now we have well, and then we compute the area which is using the trapezoid rule numpy has a built-in function for it Now what we can do is well in our new evaluation function that i'm creating it takes in it's has the same Input arguments as a trained classifier, which is our logistic regression trained classifier Set of x's and actual ground truth label y's So with that we're making a prediction using the built-in functions to To determine the area under the curve for the precision recall curves, which we did before But i'm also adding in manual calls to our roc and precision recall so that we can compare all these values together with that Let's take a look at how our um values look so for the balanced data set case It looks like well The the built-in roc and the roc here they agree with each other same with the built-in precision recall metrics They also agree with each other, which means that we implemented it pretty good And with the unbalanced data set case as well You kind of see that these the values that are to be matched from you know the built-in scikit-learn version as well as our version They do match up pretty well, you know Now you might see like there are like subtle differences between you know like these values for precision It's mostly because the precision recall curves like the way that it is determined is not exactly the way that I have typed it out And in fact, let's see if I could pull that up um If I can pull that up right here, right? So we have this uh function, right that's built in the precision recall curve But what it does is it in reality? It makes a clock call to this binary clf curve which basically it outputs a list of false positives and true positives It's kind of like the base uh function for also the roc curve that's built in you can check it out on scikit-learn And it's from that that we're doing a bunch of like other processing here And because of slightly different methods of computing the same thing You'll see that there are slightly different values that are outputted But more or less this is still the overall logic. That's also very you know intuitive to us Uh, the main takeaways that I want you to get out of this though are It is also one first of all It's also better to use a uc over just specific precision recall scores Because when you're calculating precision and recall at like 50 thresholds, like you don't know if that threshold is the best And also there is a sense of arbitrariness in just choosing that threshold And computing an area under a curve removes that sense of arbitrariness because we can see how the model's performing at all thresholds So it's a better single metric that removes a sense of arbitrariness and also like I mentioned up here Wherever I mentioned it here. Yeah Precision recall curves can be very useful in evaluating models where there's high data imbalances So you might want to check that out too. I'll put all of this code in the description down below It's a link on my github. Uh, I encourage you to check out, uh, probably the trapezoid rule certain other things like, uh, You know the the actual coding documentation for each one of these libraries Now one last thing a tidbit before we leave is that I have this function here called average precision score that I'd never really used But this is an alternative case to computing, um area under the pr curve, right? And why it's an alternative case as well if you look at the documentation here You know what, um, when you when you look at, uh, the AUC and how it's computed with the trapezoid rule, there's Interpolation that is actually happening. So let's say you look at this particular, you know, ROC curve, right? So it's zero. It's zero at 0.25 for a false positive rate. It's a 0.5 True positive rate But in these values, it kind of looks like well at 0.2 The value would be something around 0.3 for a true positive rate, which honestly isn't necessarily the case right And this could be a difference. It's only because here in this case There's only like four thresholds that we're actually using and in my example, there were a hundred thresholds So the more number of thresholds that you use the more accurate will be the representation With the ROC implementation that we made But um, you know, sometimes in this case, it might be a little too Optimistic and in which case we might want to use like step function Why so it would go kind of like how my cursor is being traced, right? And we only want to use step function wise characteristics What it would do is it would actually decrease though the value of the area under the curve And I think if you read this documentation It's exactly what it says because the linear interpolation Linear interpolation Just connecting two points could be too optimistic Is why this might be another alternative solution to evaluating how a model performs Now when you're comparing two models together It kind of makes um, doesn't make too much of a difference because you're only concerned with what is one is one greater than the other But this might be still a good Method or a good function to use in scikit-learn and you can just check out the source documentation here Along with everything else. I'll just put it in the description down below So that's all I have for you today. If you like what you saw, please hit the like Please subscribe hit that bell and yeah, just have a good day. Happy coding. Happy design thing. Take care