 Welcome back to the video series on random forests, specifically classification. So in this video, we are going to focus on interpreting the results from our random forest model. So as a review in the previous video, we separated our data into training and test. We defined the model using our random forest model. We fit the model to the training data set. And then we computed the accuracy on the test data set. And so you could, in theory, stop here, but oftentimes we want to know what's going on and possibly why we got the accuracy that we got. And so we can come in here and we can actually look at these trees. And so essentially to plot a tree, we use our TFDF library, and we do model underscore plotter. And then we have a special command for plotting the model in co-lab. So that's plot model in co-lab. And this should be an underscore right here. And then within the parentheses, we give it our model name, which we called model. We give it the tree index. I'll just say three. And we give it the max steps. And I'm going to say three here. And if we run this, we can see that it gives us this example of a decision tree. So each node here is showing the rule that has been used to make a split. So in this case, if cubic feet of natural gas was greater than 378, the data set goes here, otherwise it goes here. One of the nice things is that there is this color bar that's actually indicating the percentage of each class. So we can see that initially our whole data set, we've got 67% of class two. Those are single family detached homes. But if we come up here after our first split, that jumps to 85%. So more homes, more single family detached homes, use more than 378 cubic feet than some of the other house types, such as apartment homes, which are down to 21% in this. And this keeps going. So we have pellet amounts, we've got kilowatt hours. And so you can always come in here and see sort of where the splits are. But this isn't the end of the tree. So you can see there's some continuation here to let us know that this is not the leaf node, the terminal node. So if we increase our max depth to what our actual, what we specified as our max depth up here, so 12, we can see that this plot becomes much larger. But if we scroll all the way to the end, we can see that there are these max nodes. So this particular node that we end up in is 100% single family homes. And it tells us that there are six data points in this terminal node. This one also 100% class two has 17. Here we've got a mixed class so it's majority class three, which is single family attached homes, but there are a few detached homes so not every leaf node, every terminal node is going to be 100% there. And so you could come through here. But because it's so big, we often just want to cut it off at three, maybe four so that you can see the most important splits are at the very beginning. And then you can play around. So say you wanted to look at the 30th tree. We can see that now under this tree kilowatt hours is more has the first split, whereas before we had cubic feet of natural gas. So each of these trees is going to be different from one another, and the power of random forest comes when you average all of those trees together to really get a strong predictive model.