 Let me revisit logistic regression. I'll use the employee attrition dataset that we've already seen in our previous videos. Now to remind you, it includes over 1,000 employees characterized by 32 features and reports on whether they've left the company. Now to begin, I'll use only two features, monthly income and total working years. These two features have very different ranges of values. So for the purposes of our demonstration, I'll standardize them first using the preprocess widget. Now most of the methods in orange use standardization before modeling the data. So explicit normalization of features is often optional, but you should be careful anyway. Now let's see this data's decision boundary inferred by logistic regression. I'll again use the polynomial classification widget from the educational add-on and set my target class to yes. Oops, there are many more blue points representing those who stayed. So the decision boundary with the nutrition probability of 0.5 falls outside the picture. Now instead of using the default logistic regression, I can use the one that balances the classes in the dataset. This means I want to check the balance class distribution in the logistic regression widget and send the resulting learner to polynomial classification. Okay, now the bold line representing the classification boundary is where I expected it to be if I drew it by hand. Now, just a bit of math. Let M denote the monthly income and Y the total working years. In two dimensions, the equation for a line is just the weighted sum of features M and Y. Plus some intercept, call it W zero. Now in order to parametrize the decision boundary, we just need to find the proper weights. I can find these weights in a data table. Now I'll rewrite my equation as minus 0.109 minus 0.306 times M minus 0.305 times Y equals zero. The weights W1 and W2 define a vector that is perpendicular to the decision boundary and as expected, points toward the red points, those of the target class. In our case, the two features have approximately equal weights. So we could say that they affect the class about equally. Now we can also see that logistic regression transforms the distances from the decision boundary to class probability. For example, the probability of attrition of 0.6 is about here. And on the other side, the probability goes below 0.5 with around 0.4 here and 0.3 around here. Now, how does one compute the distances from the decision boundary? Well, we use the equation that defines the decision boundary. To get the distance, just fill in the values for M and Y. If the value is higher than zero, the employee will most likely leave the company. Otherwise, the employee will probably stay. The next reasonable question would be how to turn this distance, call it D, into a probability. Well, logistic regression uses the following equation called the logistic function, which looks like this. You can see when the distance from the decision boundary is zero, the class probability is 0.5. Then, as the distance increases in the direction of the normal vector, the probability approaches one and as it decreases, the probability approaches zero. So logistic regression is just a weighted sum transformed by the logistic function. This means it's very easy to represent the entire model as a graph. Simply sum up the effect of each feature and then transform them into probabilities. We call this a nomogram of logistic regression. To construct one, let's remove everything except the data. Now, we add in logistic regression and connect its output to the nomogram widget. There, I'll set the target class to attrition. Now, according to this visualization, years at the company seem to have the biggest effect on attrition. Let's try increasing it to 40. Yeah, anyone working there for that long will surely have led. Or maybe they just retired. So, let's lower it again and take a look at some other features. It seems working overtime also has some impact, but not as profound as the total years worked. Do notice that the nomogram turns the product of each feature and corresponding weight into a point scale. For instance, working overtime contributes two points. Then, all these points are summed up, representing the distances from the decision boundary. In our case, we get about 17, which using the logistic function means a probability of 70%. Nomograms are a great way to graphically represent a logistic regression model. We can use them to see the contributions of each feature, rank the features by importance, and give and say an employee make a prediction based on the effect of the original features. We'll see you in the next video. Bye.