 This video is sponsored partially by Kite. They provide a code completion service for machine learning code. It integrates super well with your editors and even Jupyter notebooks. So click the link in the description to try Kite for free. EFX is equal to X times the probability of X. I'm sure you've heard of this formulation. You might have learned it at some point in high school, but where exactly do we use this in a machine learning context? So math symbols are abstract and annoying to look at on their own. So let's paint the picture with a scenario and add context to these math symbols. There is code later in this video too, so keep staying tuned. Your grandma owns a company called Grandma Fixes that fixes laptops. A customer with a broken laptop places a work order online. They ship it to grandma. She and her workers fix a laptop and the laptop is sent back. This is the typical scenario, but there are some cases where the customer claims they haven't gotten their laptops back, even though grandma says they ship the laptops. Now, you're a data scientist at Grandma Fixes. How do you think this situation of a customer coming to us with a claim is possible? I'll give you a sec to think. Now, this could be because of several reasons, versus delivery mishaps or maybe the customer lives in a shady area and the package was stolen, or the customer could just be lying where they actually receive the package, but they're being hush-hush about it and seeing if they can get a quick buck from us. These are definitely rare cases, but they do happen, and each time this does happen, it is a costly mistake. To combat this, let's say that we have the option of ensuring the laptops before they go out. The insurance company quotes say $30 for every laptop, so when we're sending the laptops back to the customers, we have the option of paying $30 and ensuring the laptop, or we could just send the laptop and just take the risk. This is where machine learning comes in handy. We can use machine learning to decide if we want to either ensure the laptop or not, like a binary classification problem. For a classification model, we expect an output probability. We could model this problem as, let's say, determine the probability that the customer will claim they haven't received the laptop. So if the label is 1, that means that we lose money. If the label is 0, we don't lose money. For example, let's say that a customer sends a laptop that's worth $1,000 to us. Grandma fixes the laptop, and before sending it out, we ask the model if this will lead to a claim. Now, the model returns, say, 0.05. So this means that the model says that there is a 5% chance that a claim might be filed, which could result in a loss for Grandma. Let's plug this into our magic formula. So X is the price of the laptop. It's $1,000 here, or $1,000. P of X is the probability of a claim, which is 0.05. And so the expected cost or the expected loss for Grandma is $50. But we know that the loss for Grandma, if we insure the laptop, is just $30. So we just insure the laptop before sending it. But for this laptop, what if the model said that there was a 1% chance that this would lead to a claim? Now, in this case, P of X is 0.01, and replacing this value in our formula, the expected loss for Grandma is $10. This is less than the $30 loss we incur from insurance. So we choose not to insure the laptop. Note the expected value isn't ever a loss that can happen. It is just a theoretical loss that we can use with thresholds to make decisions like this. Now, to illustrate how this can save money, let's take a look at some code together. Before we move on, just a quick favor. Can you destroy that like button for the YouTube algorithm gods to pick up? That would be lovely. The more likes videos like this get, the algorithm will be like, hey, this video is pretty sick, and send this video out to people like yourself. By just hitting that like button, you're helping us all out as a community. This is so much appreciated, so thank you. Forgive my unkempt hair, but let's just take a look at some code here. I opened this notebook, which, well, mainly we just have to start by installing scikit-learn. Some libraries that we need are make classification, which will create a classification data set for dummy data. Calibrated classifier CV is used to calibrate our model, which I'll get to when we get to that stage later in the notebook. Test train split used to split our data sets into test and train sets. Logistic regression is, well, it's just used to create our classifier, our binary classifier, which will predict whether or not a particular transaction will lead to a claim by the customer. Our AUC score is just to get some model probability from them all, just to see how well it's doing. Minmax scaler is just used to create our data set. This briar score loss is used for, again, model calibration. It's kind of just a metric just to see how well a model is calibrated. We'll get to it again later. And then we have the normal pandas and numpy imports. All right, so now creating our data set, I'm making a call to make classification where the data set will have 10,000 examples with two features. Both of those features are, they are good features. And I'm sending a wait for the, at least the negative class, that it'll have 90% of these examples will be negative and the remaining 10% will be positive. So that's like 9,000 examples, negative 10,000, 1,000 positive. And I'm just making like a bunch of manipulations to make sure that the data looks very realistic. The features, by the way, I want them to represent the price of the laptop, the actual value of the laptop, and also the number of past orders the customer has made before this particular transaction. So we'll use both of these as predictors to predict the label, which will be the probability of whether or not a person has, a customer has claimed that they just didn't receive their laptop. Now kind of doing all these manipulations, you'll see that the value of the, the distributions of the features look more realistic. So we have like the medium price of a laptop is $1,400 with the number of past orders being 23, which does seem kind of like realistic data. So we'll roll with it. Now I'm just taking the data set and splitting it into tests, train and dev sets. So we have like an 80, 10, 10 split that's going on here. I'm using the dev set or the valuation set because we need to calibrate our model, which is honestly the next section right here. So I'll get to that. But after all this, you can kind of see that there is like a one is to nine ratio for the positive class versus the negative class. So model calibration. We kind of need model calibration because the output of a classification model, especially after under sampling or an imbalanced data set, this could lead to values of the output not truly representing probabilities. In fact, in this case, like let's say we're creating classification model for logistic regression and we pass in this parameter class weight is balanced. This is basically saying that since well, our data set is basically a nine is to one in terms of negative labels to positive labels. The positive class will be weighted nine times more or the positive samples. Each of them will be weighted nine times more than the negative samples. And because of this, the model will lead to a higher probability bias. So it'll basically output much higher probabilities than actual probability value should be. And so to bring them down to their actual probability values, we need to calibrate the model. And I'll explain a lot more of this in another video about the methods of calibration, but just know that we're calibrating the model to make sure that the output is representative of true probabilities. So now we have an AUC of like 94.6%, which is great. The model works. And for the 1000 test examples, we can see that like half of them output a value of under 1%, which is pretty realistic since the activity of a customer creating a claim is considered rare or should be considered rare, even like realistically. All right. So now cost calculations for these 1000 examples, how much money could we have potentially lost depending on different strategies of how we choose to ensure these laptops going out to the customers. So in our first case is like, let's say that we do nothing. We have no policy in place. We just send laptops back randomly. In this case, if you saw in the beginning right here, let's see, there were 110 cases where we sent the laptop and the buyers or the customer said, nah, we didn't receive it. And we would have lost the entire value for all of these laptops. And that price from this data set is $141,000 in 800, $141,801. Got that right. That's a lot of money that we did lose. But let's say that we insured everything. That means that for all of these 1000 laptops that went out, regardless of whether or not the customer received it, we would have lost $30,000 flat, which is definitely way better than ensuring nothing. So that's good. We're in the right direction. A step in the right direction is always good. And now we have, well, let's say that we use a model to determine what to insure and what not to insure. In this case, I have a little data frame here that I've kind of plopped where these two are features. And then we have a label of what actually happened. And then the prediction of what is the probability that this particular order led to a claim. Now, from this, now we use our magical formula that was the clickbait of this video. So creating the expected value or the expected cost to grandma fixes, which is the price times the probability of leading to a claim, that this product leading to a claim or this laptop leading to a claim. And then if it is greater than $30, we will choose to insure it. Otherwise, we just don't. We just like, we don't insure it all because the expected value would be less anyways. And so we now have this actual actionable label in this insure column. And we see that the model actually chooses pretty well to not insure 70% of the cases, which is good. That does seem like a good balance right there. And here's the function that is you that is applied to every single row in calculating the loss. So every single one of those 1000 transactions. Basically, if we insured the laptop, then the loss for grandma fixes is $30. But if we hadn't insured the laptop, and this eventually led to the buyer coming to us and saying, hey, we didn't receive it. Well, we just lose the full price of that laptop. And in all other cases, well, we just don't lose anything. And this leads to a loss of $18,963, which is 40% less than, well, case two, where we insured everything. So this model is a lot more intelligent and also can save potentially so much money because imagine a situation where your company or grandma fixes has tens of thousands of transactions a day so that amounts to like millions of transactions a year and you could be saving tens of millions of dollars this way just by using a simple formula with a little bit of probability knowledge. I'm going to put all this code on GitHub. And like I said before, I'm going to explain more on model calibration later and why it's required. You can also just play around with this value just to see what this particular classifier could have output as probabilities and see that you'll probably notice that the probabilities for predictions would be higher than what you're seeing right here. But yeah, I'm going to put all this on GitHub. Hope you all enjoy this video. Please do share, like, subscribe, do all that stuff. Just spread the word of this channel. Trying to grow a community here. And I will see you in the next one. Bye-bye.