 Statistics and Excel correlation simple with few data points example got data let's get stuck into it with statistics and Excel you're not first a word from our sponsor yeah actually we're sponsoring ourselves on this one because apparently the merchandisers they don't want to be seen with us but but that's okay whatever because our merchandise is is better than their stupid stuff anyways like our accounting rocks product line if you're not crunching cords using Excel you're doing it wrong a must-have product because the fact as everyone knows of accounting being one of the highest forms of artistic expression means accountants have a requirement the obligation a duty to share the tools necessary to properly channel the creative muse and the muse she rarely speaks more clearly than through the beautiful symmetry of spreadsheets so get the shirt because the creative muse she could use a new pair of shoes if you would like a commercial free experience consider subscribing to our website at accounting instruction.com or accounting instruction.thinkific.com require to but if you have access to one note we're in the icon left hand side one note presentation 1725 correlation simple few data points example tab also uploading transcripts to one note so that you can go into the view tab immersive reader tool change the language if you so choose be able to either read or listen to the transcript in multiple different languages using the time stamp to tie in to the video presentations one note desktop version here thinking about correlation having different data sets to see whether there's a mathematical relation or correlation between the different data sets in other words are the dots and the different data sets roughly moving together in some way shape or form if there is a mathematical relation or correlation between the two different data sets the next logical question would be is there a cause and effect relationship that is causing the correlation or mathematical relation between the two different data sets and if there is a causal relationship the next logical question would be what is the causal factor which is causing the causal relationship which is causing the correlation or mathematical relation between the different data sets and prior presentations we thought about a perfect positive correlation and a perfect negative correlation things that are useful to think about in theory but aren't usually exactly what we have in practice because normally we don't have a perfect correlation we have somewhat of an imperfect correlation or trend that we are observing so this time we'll look at a data set that has less information in it but is not perfectly correlated so our example we're going to imagine that X is now going to be the number of hens so we're talking about hens and why is going to be the number of eggs now note that if you're looking at two different data sets you might have some pre assumption some hypotheses that you are going to make from the data so for example if you're talking about hens and eggs you might be thinking that the hens are going to be the causal factor that's going to be producing the eggs but you do have a chicken and eggs problem I mean if you were the farmer you could buy eggs that would produce hens that would then make the eggs but you know you might usually generally think that the farmer is going to buy the hens first which are going to be you know producing the eggs or something like that so that's a question of the cause and effect kind of relationship remember that when we're thinking about the mathematical correlation we don't necessarily know if there's a causal factor or not and what that causal factor is we're just looking at the relationship with the mathematics so we're going to imagine that if we had three hens we've got the number of eggs 105 five hens we got the eggs at 185 and six hens the eggs at 201 is going to be eggs per year given the number of hens and then seven hens 345 now the the idea here would generally be well if I had more hens then I would you know produce more eggs you would think so you would think that there would be you know a causal relationship between them if we plotted these out if I just plot these four points noting that now that it's an easier thing to plot because we're looking at few data points and we can see kind of just from the type of data that we have that you would think that there would be a causal relationship between the number of hens and the number of eggs so now we're going to say if we were to plot this then and if I plot this in Excel I can just select the X and the Y the X will automatically plot as a default on the X axis here which is which is good for us we're using a scatter plot and then we can basically label this thing so you can see our four points so with three hens we have 105 eggs we had the five hens here with five hens we had 185 eggs and then with six hens we had the 200 eggs and with the seven hens we had the 350 now as you would expect we have a positive kind of correlation type of relationship we can draw a line a trend line in there and that is a useful thing to do because if we were trying to think about in the future whether or not we need to buy more hens if we want to have more eggs and we're trying to think how many more hens do we need in order to achieve so many more eggs I can't really look at these these different dots and try to figure that out I can kind of like say okay I'm going to put a dot up here somewhere but if I have a line then of course we can use the formula of a line to give an idea of what the approximate number of hens would be to produce the the next number of eggs now also again remember that usually we put the hens or we put the independent variable in this case the hens on the x generally and we put the dependent variable on the y so again I would imagine as a farmer you're thinking about how many eggs you're gonna make that you would go buy hens and then say how many hens do I need in order to possibly produce enough eggs however again you could think of it as well what if they were to buy eggs and then the eggs would make the hens but some roosters maybe roosters that you have to eat or something before they start roostering and then you but you so you could think about it that way too but but so that so but there it is so you now if I was to flip them what would happen what if I put the eggs eggs on the x and the hens on the y would I get a negative correlation no you're still gonna get a positive correlation mathematically you still have the the positive correlation showing here so now you've got the number of eggs so if I had this number of of eggs then you've got three hens right if I had so you can think of it in this fashion if I had around 100 and whatever that is eggs 180 I think it was then you can predict that you had you know five hens or in that fashion as well so you still have the positive our relationship you can still draw the trend line whether you put you switch is you switch out the x's or the y's okay so now let's do the mathematical kind of relationship we can say what's the mean of this so the mean calculation like normal is the average so if I take the average number of x's we can actually calculate this in the calculator because we don't have many x's 3 plus 5 plus 6 plus 7 divided by 4 is going to be the 5.25 and on the y's 105 plus 185 plus 201 plus 345 divided by 4 is going to be the 209 and then we're going to take the sample and the sample is going to be the formula in excel equals the stand I'm sorry the standard deviation not the sample of standard deviation of the sample standard Eve dot s of these two data sets we get the 1.71 that's the major of the spread and the 99.92 so once we have that we can do our calculation which is going to be here's our formula for the calculation which we're going to take each x minus the mean divided by the standard deviation so we'll do this in a step-by-step format so we're going to take each of the x's here are the x's and then do the same with the y's subtract minus the mean over the standard deviation which is basically the z score then we'll sum all of them up and divide by n minus one let's do that one by one we're going to say first we have the x's so let's do it each of the data points minus the x so we're going to say 3 3 5 and 7 minus x so we have also look at that over here the 3 here minus minus the 5.25 which is the mean we get to the negative 2.25 and then I take the 5 minus the 5.25 and I get to the point 2 5 I take the 6 minus the 5.25 and I get to the point 7 5 and I take the 7 minus the 5.25 and I get to the 1.75 so that's what we have here there's those three numbers and then to get to the z score we take those numbers and divide them by the standard deviation so all we're doing now is the next step we would say okay what do we do again we took the 3 minus the 5.25 divided by the standard D divided by 1.71 and then the next one would be 5 minus the 5.25 divided by the standard D 1.71 and so on and so forth so we we have those here's the second one approximately 1.15 and then we do the same with the y's here's all the y's minus the mean of the y so so we'd say okay the y's over here would be for example 1.05 minus the 2.09 boom and then we would take the next one 1.85 minus the 2.09 and so on and so forth so if I go over here we're going to say there we have it negative 24 negative 8 and then we take each of those and divided by the standard D so we're just would do then same thing if I took this first one 105 minus the 2.09 divided by the standard D 99.92 about 1.04 on the negative so there we have that and then we just multiply the z's together so these two together and that will give us then if I take this first one 1.32 times the 1.04 we get the 1.37 and so on so if I sum up this last column I get the numerator so I can then use my little table here and sum that up I'm going to put it in a table format here's the sum of this column I can actually do it in the calculator why not because there's only four numbers 1.34 plus 0.04 minus 0.04 plus 1.39 gives us about 2.77 rounding is involved then the denominator is just n minus one n is the number of items there's rows one row two row three row four row n minus one is going to give us three and then we have the numerator and denominator in the outer columns 2.77 divided by three is going to give us the 0.92 notice again the format that I have here of this formula kind of useful to put it in a table when you're working like an excel spreadsheets or something like that it's useful to see it this way you can build your worksheets this Wednesday this is the numerator which is this bit and then the denominator I'm going to do a sub calculation and I'm going to break that out as many sub calculations as I need and pull them into the inner column indicating it's a sub calculation with the colon with the indentation n minus one the result then bouncing back out into the outer column which I can call n minus one or simply in this case the denominator and then I'm dividing out just the outer columns 2.77 divided by 3.92 now I can see this and excel and use excel to do this with the analysis tool which isn't in excel by it's in excel but it's not turned on by default you can find that in the options we do that in the excel problem if you want to look at that in more detail but then in there I can do the correlation and just pick up this data set you have to have the data set next to each other so I just highlight that data set in excel and excel will then give me this prompt and I'll have to populate here's where the data set goes I'd have to check off the range or that I had the labels involved if I clicked on the labels and then tell me where I want to put it if I was to put it in excel and it'll give me something like this and I'm focused in on the x and the y which are intersecting here there's the point nine two one nine and so on that we got to here although we rounded it so this isn't dynamic however so if I change the data set this isn't going to change with it so it's not a great tool for your worksheet if you're making a dynamic worksheet but it's a great tool to analyze the data up front or to check your data kind of as we are doing here you can also use the the same data analysis tool and look at this descriptive data and I just want to point that out even though it's not our main point of focus here to give you this kind of descriptive information for the x and the y this is our general kind of statistics info you've got the mean you got the standard error the median the mode the standard deviation the sample variance the minimum the maximum the sum the count and so on I and this again is not dynamic it doesn't change as your data changes so it's a good tool to use as a preliminary analysis it might be the first thing you do before you build something out of your data set to get a feeling or an idea of what's going on with them and you can highlight multiple data sets and have it spit out or you can use it as a check figure for your data sets so just a quick recap here we're now looked at a at a perfect positive correlation a perfect negative correlation now we're looking at more of a realistic example where it's not perfectly correlated but you have a general trend this one being one where in advance you would expect to see some kind of general trend and by plotting out that trend you can get more understanding about the data sets and possibly giving you predictive power into the future such as how many hens would I would I need to buy by you know using the mathematical formula obviously that you know these hens we're doing we're doing great and then these purchases of hens were kind of slacker hens and they weren't up to you know the production line that that we were expecting from them but again laying eggs is I'm not laying eggs is difficult I would I would assume so I'm not complaint I'm not like you know it's tough work but you would think like the other hens were doing you know did a little bit better some you know then these hens but then this one is outside but then you have the trend line and the trend line can help you to predict to predict of course and and then of course we can see what the exact correlation is with our calculation here mathematically which will give you an understanding of how good that relationship is how reliable you can kind of be on using that you know basically the trend line possibly to make predictions you can do that with a formula calculation which is useful sometimes because as we'll see in future examples breaking this information out like this looking at the z-scores will often give you more information or could quite likely give you more information then simply using excel to spit out the z the correlation so but either method would be good and then of course graphing it when you graph it out you give that pictorial representation so we can look at the correlation conceptually we can might have an idea about what the correlation might be and then of course we can plot out it on a graph and see it pictorially and pick up the formula of the trend line which could be useful and then we can do a mathematical calculation of the correlation in this case having of course a positive correlation but not perfectly positive