 sales and temperature would be another example where you would think that as the temperatures go up you would plot how many how much ice cream you're going to have so now temperatures going up and you plot ice cream sales you would think that as the temperature goes up that you would have more ice cream sales that might not always be the case because you might have had a cold rainy day that had like a festival next to your ice cream shop or something and you sold a lot of ice cream even though it was cold but in general you would generally think that would be the case right purpose of scatter plots so show the relationship between two variables so we're back to our hands and our eggs if we plot these two things together we can show the relationship now obviously intuitively if I was a former I would have a pretty decent sense that hands are causing the eggs right that more hands means a there's a relationship between the two but each point represents a pair of data but if I plot them then I get a better sense of exactly what that that relationship is and and then I can start to make decisions like how many hands would I need if I want so many eggs by giving myself a linear kind of equation that I can and I can put calculations in so end up identifying patterns so linear patterns indicate potential correlation so this one's the height and weight again now with height and weight you would think pretty pretty confident that you make a hypothesis that there is going to be a cause and effect relationship if someone is taller they're going to have more mass they're going to weigh more typically everything else equal so so if there's if I plot that you could see that pattern now again there could be things where you're just combing through data and you see a pattern that's positively correlated like this and there is absolutely no rational reason as to why it would be it just happens randomly that happened to be correlated and so that's what we have to be careful with the correlation equal in causation but the purpose of us seeing the correlation is to try to then draw the conclusion as to whether there's a cause and effect relationship and if there is a cause and effect relationship try to nail down what the cause is to the degree that we can or how causal it is and see no pattern might suggest no correlation so in other words if I plotted this out these two data sets whatever those might be right whatever these and I got this set of points and then I tried to draw a trend line and I get no correlation or possibly a very low correlation then that's going to indicate to me that there isn't a cause and effect relationship in other words might the point of doing the correlation is to try to see if there's a cause and effect relationship and if we get a correlation then it's quite likely that there might be a cause and effect relationship then we have to drill down and say well is there a cause and effect type of relationship it's not necessarily the case but if there is a cause and effect relationship then you would think that you would have to be able to find some kind of correlation whereas if you find a correlation it doesn't necessarily mean there's a cause and effect relationship but if you find zero correlation then you would think at least with those two variables and alone that there's not a cause and effect relationship right because because if there was a cause and effect relationship you should be able to find some kind of correlation whereas if you find the correlation it's not necessarily the case that there is a cause and effect relationship now you could have a more complex situation that maybe you need more variables maybe if you look at it through a multiple variables that there's kind of that there's some kind of relationship that happens but again if there's if you have a low correlation that would generally indicate okay there's not a cause and effect relation the way I have it laid out here so why use regression so to make prediction based on the relationship between variables so again with the hands and the eggs why do I do the regression well if I just have these dots of data points I'm not going to be able to answer a question like how many hens do I need to buy in order to produce so many eggs that I'm going to sell in the future but if I can get this line if I can draw a trend line then I can make a general prediction right this first like this first hand for example this is hen three hens made around a hundred eggs in a year I guess is it and then five hens went up to like a hundred and seventy five or so and then and then when I went up from from five to six that last hen was kind of a slacker we got a slacker hen not not that you know any kind of egg production I think is tough work out it's not a job I would want to do but I noticed that the other hens you know they made more eggs and this one and number six I noticed but then that last one then going from six to seven you all you had a high producing hen over here so that so I can look at the trend line and say well how many hens would I need to produce so many eggs right all right so then we have a simple linear regression using one independent variable to predict the value of a dependent variable that's what we're going to focus in on in our practice problems where we have the independent variable in this case being the hens which is going to give us the predictive power over how many eggs are going to be produced basically in this example so residuals the difference between actual and predicted values so the line representing our predictive values here's the actual values on the data points and the residuals are the differences now our goal here of a regression is to minimize these residuals so the least squares method in other words this line that we're putting between these data points is minimizing the differences between the predictive values and and and the actual values so then multiple regression so we can get more complicated in this of course when we have more than one independent variable so we might we might come to the conclusion that hey look looking at just these two factors it's a more complex system than that you can't come to the proper conclusion just looking at those two things in some systems so then we might have multiple regression advantages provides a more accurate model by considering multiple factors example predicting house prices using factors like size location age of the house and so on notice if you think about this scientifically for example what do we try to do scientifically when we're trying to prove something we try to remove all the variables we try to go into our lab and say I'm just going to put this one atom together this with this other atom and see what happens