 Statistics and Excel correlation simple with few data points example got data let's get stuck into it with statistics and Excel you're not required to but if you have access to one note we're in the icon left hand side one note presentation 1725 correlation simple few data points example tab also uploading transcripts to one note so that you can go into the view tab immersive reader tool change the language if you so choose be able to either read or listen to the transcript in multiple different languages using the timestamp to tie in to the video presentations one note desktop version here thinking about correlation having different data sets to see whether there's a mathematical relation or correlation between the different data sets in other words are the dots and the different data sets roughly moving together in some way shape or form if there is a mathematical relation or correlation between the two different data sets the next logical question would be is there a cause and effect relationship that is causing the correlation or mathematical relation between the two different data sets and if there is a causal relationship the next logical question would be what is the causal factor which is causing the causal relationship which is causing the correlation or mathematical relation between the different data sets. In prior presentations, we thought about a perfect positive correlation and a perfect negative correlation, things that are useful to think about in theory, but aren't usually exactly what we have in practice because normally we don't have a perfect correlation, we have somewhat of an imperfect correlation or trend that we are observing. So this time, we'll look at a data set that has less information in it but is not perfectly correlated. So our example, we're gonna imagine that X is now gonna be the number of hens. So we're talking about hens and Y is gonna be the number of eggs. Now note that if you're looking at two different data sets, you might have some pre assumption, some hypotheses that you are gonna make from the data. So for example, if you're talking about hens and eggs, you might be thinking that the hens are gonna be the causal factor that's gonna be producing the eggs, but you do have a chicken and eggs problem. I mean, if you were the farmer, you could buy eggs that would produce hens that would then make the eggs, but you might usually generally think that the farmer is gonna buy the hens first, which are going to be producing the eggs or something like that. So that's a question of the cause and effect kind of relationship. Remember that when we're thinking about the mathematical correlation, we don't necessarily know if there's a causal factor or not and what that causal factor is. We're just looking at the relationship with the mathematics. So we're gonna imagine that if we had three hens, we've got the number of eggs 105, five hens, we got the eggs at 185 and six hens, the eggs at 201, this is gonna be eggs per year, given the number of hens, and then seven hens, 345. Now the idea here would generally be, well, if I had more hens, then I would produce more eggs, you would think. So you would think that there would be a causal relationship between them. If we plotted these out, if I just plot these four points, noting now that it's an easier thing to plot because we're looking at few data points and we can see kind of just from the type of data that we have that you would think that there would be a causal relationship between the number of hens and the number of eggs. So now we're gonna say, if we were to plot this then, and if I plot this in Excel, I can just select the X and the Y, the X will automatically plot as a default on the X axis here, which is good for us, we're using a scatter plot, and then we can basically label this thing. So you can see our four points. So with three hens, we have 105 eggs, we had the five hens here, with five hens, we had 185 eggs, and then with six hens, we had the 200 eggs, and with the seven hens, we had the 350. Now, as you would expect, we have a positive kind of correlation type of relationship. We can draw a line, a trend line in there, and that is a useful thing to do because if we were trying to think about in the future, whether or not we need to buy more hens, if we wanna have more eggs, and we're trying to think how many more hens do we need in order to achieve so many more eggs, I can't really look at these different dots and try to figure that out. I can kind of like say, okay, I'm gonna put a dot up here somewhere, but if I have a line, then of course we can use the formula of a line to give an idea of what the approximate number of hens would be to produce the next number of eggs. Now also again, remember that usually we put the hens or we put the independent variable, in this case, the hens on the X generally, and we put the dependent variable on the Y. So again, I would imagine as a farmer, you're thinking about how many eggs you're gonna make that you would go buy hens and then say how many hens do I need in order to possibly produce enough eggs. However, again, you could think of it as, well, what if they were to buy eggs and then the eggs would make the hens, but some roosters, maybe roosters that you'd have to eat or something before they start roostering and then you, but you say you could think about it that way too, but so that, so, but there it is. So now if I was to flip them, what would happen? What if I put the eggs on the X and the hens on the Y? Would I get a negative correlation? No, you're still gonna get a positive correlation. Mathematically, you still have the positive correlation showing here. So now you've got the number of eggs. So if I had this number of eggs, then you've got three hens, right? If I had, so you can think of it in this fashion, if I had around 100 and whatever that is, eggs, 180 I think it was, then you can predict that you had, you know, five hens in that fashion as well. So you still have the positive relationship. You can still draw the trend line whether you put, you switch out the Xs or the Ys. Okay, so now let's do the mathematical kind of relationship. We can say what's the mean of this? So the mean calculation like normal is the average. So if I take the average number of Xs, we can actually calculate this in the calculator because we don't have many Xs, three plus five plus six plus seven divided by four is gonna be the 5.25 and on the Ys, 105 plus 185 plus 201 plus 345 divided by four is gonna be the 209. And then we're gonna take the sample and the sample is gonna be the formula in Excel equals the standard deviation, not the sample, a standard deviation of the sample, standard div dot S of these two data sets, we get the 1.71, that's the measure of the spread and the 99.92. So once we have that, we can do our calculation, which is gonna be, here's our formula for the calculation which we're gonna take each X minus the mean divided by the standard deviation. So we'll do this in a step by step format. So we're gonna take each of the Xs, here are the Xs and then do the same with the Ys, subtract minus the mean over the standard deviation which is basically the Z score, then we'll sum all of them up and divide by N minus one. Let's do that one by one. We're gonna say first we have the Xs. So let's do it each of the data points minus the X. So we're gonna say three, three, five and seven minus X. So we have, let's look at that over here, the three here minus, minus the 5.25, which is the.