 Hi, this is Dr. Dunn. I want to take a bit of time here and walk you through question three and the remix for lab four. And this question is on regression, and it's based on what we did in rehearse two in lab four on linear regression. And the first thing in the write-up here, it gives us a code chunk to load the real data frame we created in rehearse two. So I'm going to copy this, and then I'm going to go over to poset cloud, used to be our studio cloud. It's still functionally the same. And here is the student name, .RMD file, I renamed it to lab four remix. You would put your name there, ideally. And here's question three, again repeated, is there a real linear relationship between land dot value and price in the houses in the real data frame? What is the equation of the straight line? So we're doing a linear regression, we're going to get the equation of the line. Now over here, you can see in the environment, we don't have any data frames yet, no data objects. So the first thing to do is create that, but I don't have a place to copy that code. So I'm going to click on the plus in the C and put in a blank R code chunk. Now if I'm lucky, I still have that copied, ctrl V PC, and it gives command V and a Mac, but don't hold me to that. So I've got this in here. The first two lines are loading some libraries we may need. And then the operative line that we need is this line here, 249, and it says real equal read dot delimited, and read dot delimited is the function, delimited type file, and we're putting the URL, the web address, for Dr. DeVue's Saratoga dot text file, and we're going to read it in and assign it to real. The equal sign there is another way of assigning a value like you can with the less than and the dash. Here we're creating the real with the single equal sign. And then the options down here, you don't have to worry about that. That's just something to make things a little bit prettier. So I'm going to run this code and you can see now up here in the environment, we've got our real data object and you'll need that later on. You can inspect the real by clicking on the little blue triangle there. You can see it's roughly 1700 rows of observations on 16 variables. And here are your variables named their price, lot dot size, waterfront. You can scroll down to get to the bottom of their rooms, bathrooms, and college. So those are your variable names that you might be using and you should always pay attention to how they're spelled, how they're capitalized. And when you're putting these variables into code, you need to use them exactly the same way they are here or you'll get in here. So let's keep moving. The next piece of code, I'm going to move this up a little bit here. We're given this and we just need to edit it. It says model, that'll be our new data object will be model. And then we're assigning the values from a linear model. That's the function we're using, LM. And this is saying sales, this looks like it's on YouTube marketing data. The database, the data object is called marketing. And we've got two variables here. Sales is a function of, that's what that little tilde sign is. You think of it as it's saying function of, sales is a function of YouTube. That must be the number of views on YouTube. This would be sales and dollars function to the number of views on videos on YouTube from the data, the marketing data file. So we want to edit this. Now I'm not going to show you the exact answer. I'm going to just pick two numerical variables here. Let's get living area, living.area and lot size. It was like lot sizes and acres. So I'm going to put those two variables in. I'm going to pause this and type those in. Okay, I've got it edited, I think, and I want to run it. Whoops, you see we've got an error there. And it says lot.size not found. What do they have to misspell that? Let's look over here. It should be lot with a capital S. Okay, no biggie, I will edit this again and then get rid of this error and then run again. And there we go. Now over here in the environment, we've got our model object as a list of 12 things. And you can see here, it gives us the formula. It is living area, function of lot size from the real data frame. And remember from high school geometry, y equal mx plus b. So y would be the living area. Lot size is x, so this is m, 145 times x plus 162. That would be our intercept. So that's your equation of the line. And what you need to do, of course, would be to edit this to put in your variables that we're interested in here, not necessarily a lot size of living area. The next part, it says create a scatter plot between price and land dot value of your two variables using this code chunk. And then, of course, you will need to edit it. Now I've added some comments here that weren't in their code chunk. Remember the little pound sign creates, let you create a comment to help you understand what's going on and R knows not to try to run that as code. I said find the correlation coefficient R between the two variables. Now I want to just leave it as they had it, living area and price. And of course, you'd have to edit those two variables and change out to match the two that you're interested in, okay? And then the second part makes the scatter plot. And again, they've got the two variables there inside the single quote, their back quote, living area, price, the two variables. And then the rest of this, you don't have to edit. So if you run this, that would give you a plot, and I'm going to go ahead and run it. You can see the first part gives us our correlation coefficient, 0.71. 0.7 is generally considered to be a pretty strong correlation. And it's positive, that means as living area goes up, price goes up. They go in the same direction. And that's what we see here in this plot, that as living area increases, price increases, which would be logical. So that's what you need to do. Again, you're going to edit this to put in your two variables. And then finally down here is the correlation strong. Remember, if it's about 0.6 or more, then it's strong. If it's less than about 0.3 or negative 0.3, it would be very weak. So you can kind of remember those. And then it says, do you think some extreme data points are substantially influencing the slope? So they're called influential. You say down here in the real world, you would remove those data points and rerun the code. You don't have to do that. But I do want you to think about it and comment. For example, my first inclination here, I look at that data point. It's way out here all by itself. That went to, and those might be influential. Now, if we took those two parts out, that line might move a pretty good bit. So that's the kind of thought process I want you to go through. So I hope this helps.