 Statistics and Excel Correlation Large Data Sets Focus on Z-Score Relationship. Got data? Let's get stuck into it with Statistics and Excel. You're not required to, but if you have access to OneNote, we're in the icon left-hand side, OneNote Presentation 1750 Correlation Large Data Set Focus on Z-Score Relationship tab. We're also uploading transcripts to OneNote so you can go to the View tab, Immersive Reader or Tool, change the language if you so choose. Be able to either read or listen to the transcript in multiple languages using the timestamps to tie in to the video presentations. OneNote desktop version here thinking about correlation, having different data sets to see if there's a mathematical relation or correlation between them. In other words, are the data dots and the different data sets moving together in some way, shape, or form? Now if there is a mathematical relation or correlation between the different data sets, the next logical question would be, is there a cause and effect relationship causing the correlation or mathematical relation between the different data sets? And if there is a cause and effect relationship between the different data sets, the next logical question would be, what's the causal factor that's causing the cause an effect relationship, which is causing the correlation or mathematical relation between the different data sets. So in prior presentations, we thought about perfect correlations in a positive direction, as well as perfect negative correlations, not things that we see oftentimes in practice because we're usually looking at trends with the correlation. We also noted that correlation like any other statistic is something that we want to have as one type of tool when we look at our data, be able to look at the data in other words from different angles as well. This time we're going to have some data sets that are going to be a little bit longer and we'll do some added focus in on the Z score relationship, which is going to be a primary component of the calculation of correlation. Here's going to be our data sets, the height measured in inches and weight measured in pounds. Now our data set, of course, here, we didn't include the entire data set. If you want to look at it in Excel, we'll have the entire data set there, but it's got more data in it than we have seen in some of our prior presentations, noting that when we first look at these, of course, like with any data set, we might first come up with our assumptions as to what might be the case. So if I'm looking at height and weight, then I might say, well, these have to do with nature, for example, so I think maybe there's a bell curve related to these items. And I also might then think, well, if these things might be correlated because, of course, I would think that the independent factor, the driving factor would be the height, which might cause the weight to be higher because if you're taller, then you would have more weight might be some assumptions that you might make some theories about the data going into it. So in any case, there's going to then we're going to have our data on the right-hand side. Now first, let's do a histogram. So if I did a histogram of the heights, just selecting the heights and entering a graph and Excel a histogram, then we can see that in the buckets in the middle, 68 to 68.33, 68.1 to 68.33 seems to be the middle point. It seems to taper off looking very bell shaped as we would kind of expect with something that's nature related, measuring things like lengths of animals or humans or heights or weights and that kind of thing. If we did the same thing for the weights, we have a similar result in that we have different buckets, of course, down here corresponding to the weights, but it kind of looks like a bell type curve, which is what we might expect when dealing with weights. These being measured, of course, independent, noting that as we've seen in the past, the fact that they both look like a bell curve doesn't necessarily mean that there's a correlation between them. So we've seen examples in the past where our data sets seem to be going towards more like a uniform distribution versus a bell curve distribution and so on. And it doesn't necessarily mean, no matter what the distributions are, that they're going to be correlated, but it might give us some other insights about the data which might further strengthen our hypotheses as to whether they be correlated or not. So then if we go on over and say, let's do our calculations for the correlation, which of course looks like this, we're going to take each of our data points minus the mean divided by the standard deviation, multiply together, sum them up divided by n minus one. This data set doing this manually is very tedious because we have quite long data sets here. But with Excel, not too bad, and we can also use the tool which we'll do in a second here to analyze this more quickly with Excel's data analysis tool. So if we take the mean and the mean of the two, this is just simply taking the average of this data set, the heights, and then the weights data set. We get these two numbers, 67.99 for the height and 127.08 pounds for the weight. The standard deviation, if I take just simply these two data sets and take the standard deviation of a sample for them, I get the 1.9, the 11.66, this measuring the spread. Now just looking at those data points, it's not going to really help me most likely to see if there's a correlation between them, but that'll help us with our correlation calculation obviously. So here's our average calculation, here's our standard deviation calculation in Excel. So then I'm just going to map out our data sets, here's the height, and we can just take each of these data points and calculate the z score now. So this is the top part of our formula we're going to take.