 Welcome back to the final video on regression or random forests with classification. In this video, we're going to talk about variable importance. So a lot of times we've understand how our model is doing how accurate it is, but we want to know why it's so accurate, which variables are contributing the most to that accuracy. And that's where variable importance comes in. So in order to get the variable importance, we just have whatever we call our model that summary. And when we run this, it's going to come up with, there's a typo. It's going to come up with this long, long, long text cell, but it's got a lot of good information in there that that we really can make use of. So first, it's got some information about our model itself. So we've got that we're doing a random forest model. We've got that it was classification. We've got all of our explanatory variables. And then we get into variable importance. You know, probably at least 10 ways that we can measure variable importance. But the one that we are going to be most interested in is going to be this one, this mean decrease in accuracy. So what this means how to calculate it is imagine that you are interested in seeing how important kilowatt hours is for example. What the computer does is it takes kilowatt hours, randomly shuffles it does a reallocation similar to what we did in lesson five for the hypotheses. It shuffles it with replacement, and then reruns the model and sees how drastically did the accuracy decrease after that shuffling. So in theory, an important variable will actually create larger decreases in accuracy upon shuffling, because suddenly those relationships between predictor and response have been broken. And so that's what we're seeing here, that when kilowatt hours is shuffled, it leads to on average 8% loss and accuracy. So we go from 0.7 down to close to 0.6. Same with cubic feet and of natural gas we end up with a seven and a half percent decrease in accuracy. And these are very high, which is why we actually see these hashtags, which tell us how significant a certain variable is in terms of its importance measure. So the mean decrease in accuracy, 8% might not seem like a lot to you and me, but the computer is telling us that that is very highly significant, that that is a very massive reduction in accuracy for our data set. So here where we've got gallon F fuel oil we get a 0.9% decrease in accuracy, it still does get a slight significance for still being a fairly decent decrease, whereas these last three are not very significant. And so this is the primary variable importance measure that we'll use, but you can also look at all of these other ones, we can see that some of them actually change here, natural gas is on top and kilowatt hours is down below. And so they just sort of have all of these differences. And some of them, you know, they they're pros and cons to using all of these. But like I said, the mean decrease in accuracy is going to be the primary one we are interested in, because it really gives that direct application to our ability to predict and classify new data sets. So as we keep scrolling down through here, eventually we get to some information about the model itself again so we've got the accuracy here. We've got the number of trees and the total number of nodes that were built. So we've got some information about the individual trees here that can also be useful so we've got sort of how often things show up in nodes, how often they show up at certain depths. And then finally, some information on this training out of bag error. So while all of this can be important, like I said, what we are going to primarily want to focus on in terms of variable importance is this one right here.