 statistics and excel correlation calculation with strange result got data let's get stuck into it with statistics and excel first a word from our sponsor yeah actually we're sponsoring ourselves on this one because apparently the merchandisers they don't want to be seen with us but but that's okay whatever because our merchandise is is better than their stupid stuff anyways like our crunchy numbers is my cardio product line now I'm not saying that subscribing to this channel crunching numbers with us will make you thin fit and healthy or anything however it does seem like it worked for her just saying so yeah subscribe hit the bell thing and buy some merchandise so you can make the world a better place by sharing your accounting instruction exercise routine if you would like a commercial free experience consider subscribing to our website at accounting instruction dot com or accounting instruction dot think of it dot com you're not required to you but if you have access to one note we're in the icon left-hand side one note presentation 1740 correlation calculation with strange result tab we're also uploading transcripts to one note so that you can go into the view tab immersive reader tool change the language if you so choose be able to read or listen to the transcript in multiple different languages using the time stamps to tie into the video presentations one note desktop version here thinking about correlation where we have different data sets to see if there's a mathematical relation or correlation between them in other words are the different dots and the data sets moving together in some way shape or form if there is a correlation or mathematical relation between the data sets the next logical question would of course be is there a cause and effect relationship that's causing that correlation or mathematical relation and if there is a causal relationship the next logical question would be what's the causal factor in the causal relation which is causing the mathematical correlation in prior presentations we thought about a perfect positive and perfect negative correlation which are things that you don't actually often see when working practice problems or in practice because usually we're looking at two different data sets that might have trends together for some reason but not a perfect correlation so it's useful to think about the perfect situation in theory but in practice it's not usually going to be a perfect correlation we then looked at an example with a few data points so that we can see an imperfect correlation with a few data points so we can analyze it fairly easily we then looked at a correlation where we had random data sets that we generated so we can see how we generated the data sets and what the correlation between them were now we're looking at a situation where we're going to get an unusual result with the correlation this being a reminder that like any statistics that we can't simply rely on one number all the time we still have to use our intuition we still have to think about what is actually happening here what is it telling us and oftentimes we have to look at things from multiple angles if we want to get a proper perspective about what the data is actually telling us so we're going to construct our data this way we're going to have an x and a y this being our two different data sets and we're just going to randomly we're just going to pick an x and a y and see whether or not they're correlated the points we're going to be plotting are x and y are one x and y are two x and y are three x and y are four x and y are five wow they seem very correlated but then the last one you've got x is zero and y is seven so if we were to consider this data let's do our mathematical calculations obviously looking at it we would say huh you know if i looked at that data set i'd be like yeah they look like there's some kind of relationship going on there there looks to be uh some kind of uh you know one looks to be tied to the other possibly in some way shape or form let's do the math on it so if we do the calculation of the mean the mean is going to be the average we can actually do it now since we don't have that many data points we could say well this is going to be one plus two plus three plus four plus five plus zero divided by how many one two three four five divided by five is going to give us let's do that one more time i think i messed up one plus two plus three plus four plus five plus zero divided by one two three four five six divided by six and then they get the two point five if i do the same thing for the other one i get a three point seven or six seven on it and then if i take the standard deviation we're taking the standard deviation of the sample with this is a major of spread of the data you will recall 1.87 and 2.16 now let's do our mathematical calculation taking in essence the z scores of the first data set each point minus the mean divided by the standard deviation times each point same z score of the second divided by n minus one so if i take my first data set of x point one and i look at the z the z will be calculating the z is going to be one minus 2.5 divided by the standard d 1.87 gives us about point eight and obviously the second one would be two minus 2.5 divided by the standard d 1.87 that's going to be the point two seven about we do that all the way down for all of the data points and here's the related z's and then i can do that for that's the y's same kind of thing so the first one is going to be one minus the mean this time for the y is 3.67 divided by the standard d for the sample of the y 2.16 is going to give us 1.23 about we do that all the way down and then we can multiply the z's together to see what the z's together will be 0.8 times the 1.23 we get the 0.99 about next one of course would be the 0.27 times the 0.77 we get about point two one and so on then we just sum up the z's to see what that'll be and that'll be the numerator is what it'll be so if we do that we're going to say let's format this to get the the numerator the sum of the b's it comes out to zero if i add up all these column it comes out to zero and then if i say that the denominator is n minus one denominator n minus one and one two three four five six minus one six minus one is five so zero numerator divided by five denominator gives us of course a correlation of zero zero correlation so so that might be surprising we might say well hey hold on a second no correlation here if i look at those data sets is that what we would really want to say if we see these data points in real life would we really want to say in our mind that there's no correlation right and and that's the problem of here because we probably want to say it looks like there's something going on here but then it got hit it got messed up by that one you know data point so we're going to say all right if i was to calculate this in excel we can use the data analysis tool here if you don't have the analysis tool pack you can turn it on as we do in the excel practice problem and it'll do the calculation for you we would just select the data sets looking for the correlation and then we would select the range i would put labels place it somewhere and then excel will give us this one and here's the x and the y we're looking at that zero it gives us the zero correlation this is a static thing here it's not dynamic it will not move as we change an excel worksheet but it's great to give a preliminary look or a double check now note that if i did that again and of course if i eliminated the last data point if i did the correlation for just this down to this data point eliminating the last one we would have a perfect correlation of course it would be one perfectly correlated so so clearly if we were to look at those two data sets we might come to the conclusion that getting a zero correlation might not be exactly right we might want to do some more digging in it if we were to plot this out we can see the data points here one one two two three three four four five five looks like there's a positive correlation between them but then you've got this a zero seven up top which of course skewed our correlation calculation to zero but looking at it pictorially does does this line that represents the uh the the trend line being exactly straight zero correlation is that what we really want to say about this data we're one two three four five data points are in perfect alignment together probably not so this is just a point that no matter no matter what tool we're using uh if we just if we just plug this thing into the into the computer using say the data analysis and i get one or i get zero and i say it's totally not correlated at all that might not be that's not the only angle we would want to look at just like all of statistics we don't want to look at it from just one angle typically we would want to then plot it out and say huh well maybe that seems kind of funny maybe there's something else going on here i'd want to drill down into it possibly further than that and obviously you could see that pictorially you can also see it if you do the math over here you would and just look at the data sets and start looking at the z scores it's likely that you might pick up some more information that would that give you some more insights as you as you go through the correlation calculation