 All right. In this video we're going to introduce regression with random forest. So similar to regression with neural networks, this is where we use quantitative variables that we're trying to predict than that numerical value as opposed to trying to classify in groups. And as we go through these videos you'll see that a lot of the analysis functions the same way. We still are going to split our data to find the model, fit the model, and evaluate the accuracy. But there are going to be a few key differences that I'll point out throughout the videos. So with that let's go ahead and jump right in. So we're here in Google Colab. I've already installed the libraries and ran the rex data set. So our goal is going to be to predict the cubic feet of natural gas that are being used by households given some key household characteristics. So once again we train, split the data into training and test sets to give you an idea of what this looks like. We've got our response variable and all of our predictor variables. Similar to the classification we need to convert this into TensorFlow objects. So we say tfdf.carus.pd data frame to tf dataset. So same process as before. We give it our training data set. We tell it our label which is our response variable. So in this case it's cubic feet of natural gas. And then we have one more step. We need to tell the program what task we're doing. So with random forest the baseline assumption is that you're doing classification. But if you're doing regression then we need to tell it that that's what we're doing. So we say tfdf.carus.task.regression. And then I'm going to copy that all over and redo it for the test dataset. So then all we need to do is change this. We maintain the same task. And then we're good to go in terms of splitting our data. So to give you a reminder our training dataset as a TensorFlow object is really just an object in memory space but it tells the computer what to do with it. So once we've split the data into training and test our next step is to actually define the model. So I'm going to call this model reg model for regression. But the command is the same as classification. So regardless of which type of analysis you're doing if you're working with random forest you're always using the same command. And we've got some of the same arguments. So we can still say compute out of bag variable importances set that to true. We can still set our number of trees. In this case I'm going to set it to 200 a little higher than before. And we can still set our max depth which I'm going to set to 10. But once again we need to define our task. So this is the second area that you have to do this. You have to do it when you're splitting your data and turning it into TensorFlow objects. And then once again when you're defining your model. So you can run that. It's told us where it's saved it. And then we have another option to specify our metrics. And this becomes a little bit more important for regression models because there are a lot more options for metrics. In particular we want to do MSE and MAE. So MSE is the mean squared error which is indicated here. It's the difference between your actual data and your predicted data squared and averaged. And then mean absolute error is the difference between your actual data and your averaged. No square but still then averaged itself. So both are purely numerical metrics that tell us sort of the error looks at how different our actual data is from our predicted data. So once we've compiled the model the next step is to fit the model. And this works exactly like it does in classification. We say regmodel.fit and give it our training data set. And so it's telling us that it's reading in the training data set and goes through all of our different steps here. We can see that it's taking a little bit longer. So our last one ran in less than five seconds total. This one took about 10. And that's because we've increased the number of trees to 200. But once the model is fit we can go through and do our evaluation. And this works the same as it did before. So we just say regmodel.evaluate test data set. And I'm going to return it as a dictionary. So then we can see that it's given us our MSE and our MAE. And you can see that the MSE is apparently huge. And that's because this has been squared. And so this is really in feet to the sixth instead of cubic feet. So in order to get this back into our units that we're working in, cubic feet of natural gas, you need to calculate the root mean squared error. And so here we just say mp.square root. And it's just the MSE from our evaluation. Make sure I spelled that right. I did not. And then we can print it. And so here we can see it's at 374. And so that is more in line with our MAE, at least in the same order of magnitude. Generally, though, we want to provide units alongside our root mean squared error because it is unit dependent. And so we can say cubic feet and G. And so that just provides a little bit more information because depending on what we're actually working with, 374 could be a huge error or it could be a very small error. And so similar to when we were working with outliers and means and medians, it's important to compare this to our actual dataset. So if we look at the cubic feet of NG and say dot describe, we can see that the average is 334. So our error is actually greater than our average. So that's not really ideal. So you can imagine that if you are the utility that's trying to predict the household's cubic feet of natural gas, perhaps trying to estimate for next year's supply. And your error is greater than the average usage within your service area. That's probably not great. That means that this average household, you could estimate within your error bounds more than double what they actually would do. So it's not great. And that's usually how you can make a decision about what actually is this, how good is this error, is it acceptable, is by looking at what the average values are and how your error compares to that.