 Statistics and Excel correlation large data sets focused on z-score relationship part number one. Get ready, taking a deep breath, holding it in for 10 seconds, looking forward to a smooth soothing Excel. Here we are in Excel. If you don't have access to this workbook, first a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay whatever because our merchandise is better than their stupid stuff anyways. Like our accounting rocks product line, if you're not crunching chords using Excel, you're doing it wrong. A must-have product because the fact as everyone knows of accounting being one of the highest forms of artistic expression means accountants have a requirement, the obligation, a duty to share the tools necessary to properly channel the creative muse. And the muse, she rarely speaks more clearly than through the beautiful symmetry of spreadsheets. So get the shirt because the creative muse, she could use a new pair of shoes. If you would like a commercial free experience, consider subscribing to our website at accounting instruction.com or accounting instruction.thinkific.com. It's okay because we'll basically build this from a blank worksheet. But if you do have access, there's three tabs down below example, practice blank example, in essence, answer key practice tab having pre-formatted sales. So you can get to the heart of the practice problem, the blank tab blank worksheet with just our starting data in it so you can practice formatting these sales within Excel as we work through the practice problem. If you don't have access to the data set, you can find sample data sets by going to resources such as Kaggle.com. Let's go to the example tab to get an idea of what we will be doing, where we will be going. We're looking at correlations once again, different data sets to see if they have a mathematical relationship or correlation between them. Are the data points moving in the same direction? For example, if there is a mathematical relationship or correlation, the next logical question would be, is there a cause and effect relationship? And if there is a cause and effect relationship, the next logical question would be, what's the causal factor in that relationship? We're going to be looking at two data sets of heights and weights. This will be a little bit different than what we've seen in the past in that we're going to have a lot more data. So these are more extensive data sets that we will have, and we're going to do our standard calculations to the mean, the standard deviation. We'll plot this out, noting that both the height and the weight have a similar look to them and that they have that bell-shaped curve. That doesn't mean necessarily that they're correlated, just like we saw in prior presentations when we had the uniform distributions, when we had random number sets, didn't necessarily mean that there's a correlation even though there's a similar kind of shape to the curves. But these, of course, both do conform to more of a bell-shaped type of curves because we're talking about things in nature. And in this case, you would think that there might be a relationship, of course, between heights and weights possibly. So then we'll do our calculations for them with the z-scores to calculate the correlation. According to our formula, we'll also do the formula calculation with Excel's data tools. We'll do the other data analysis with our tools. And then we'll also take a look at the relationship between the z-scores to get a better sense of what's happening with regards to these z-scores because, of course, the z-scores is kind of the heart of what we are doing with this correlation analysis. So having a better understanding of it would be a good idea. Let's go to the blank tab to then get started. I'm going to delete this Kaggle thing. And we're going to first, I'm going to format the entire data set by going to the triangle up top, right-clicking on the selected area, formatting the cells. I like to go to currency, negative numbers, bracketed and red, no dollar sum. I'm keeping the decimals this time because there's decimals in our data sets. We'll keep that as the default. Let's go ahead and say, okay, I'm going to make it bold as well. Home tab, font group, make it bold. And let's center these top ones and make them into our header. Home tab, font group, dropping down bucket black and the white for the lettering, centering it and the alignment, wrapping the text and centering. So we have our height and inches, our weight in pounds. Now obviously both of these are measuring, and also note that the data set is quite long if I say control shift down goes down to 25,000 rows down here. So if I go back up top, now just intuitively we would think, oh, these both sound like they would kind of correlate to a bell curve possibly because we're looking at something in nature. We can kind of test that out. And we'd say, well, there's probably, there might be a relationship between the two of them because you would think of someone's taller that they might weigh more. And then you might have the question, well, what's the causal factor? If I was going to make a hypothesis about it, I would think, well, if someone's taller, I would think that would have, that would be maybe a causal factor of that would be causing someone's weight to be heavier. So that's kind of like my thought process going into something like this. Let's do some let's let's check it out. So I'm going to make a skinny C, a skinny C, and let's do the same height and weight headers here. Let's just name it high. I don't need the inches. Let's just say height, h e i g h t, weight, weight. That's weird spelling for both of those height and weight, home tab, font group. I would not win the spelling contest when they gave me those words. I give up. I give up. I'll sit down. That's what I usually did when I was in grade school. Oh my goodness, height. I just give up. I'll sit down. I know I'm out. I'm out. So we have the mean and the standard deviation. So on the height, let's go ahead. This is going to be equal to the average tab. And we're going to go to the heights control shift down and enter. And there it is. Let's do the weight. I can just copy this across to the right now. Let's just copy it across. And let's make this a little wider so we can see the weight 127 on the mean. And then we're going to say the standard deviation standard deviation. And for the sample standard div s is going to be this in this stuff control shift down 127 is the is the mean. I'm heavier than that. But I think it's because I'm like totally yoked up with muscles and everything and muscles like as you know, muscles heavier than fat. So that's why I'm heavy, heavier than anyway, I'm just guessing here. Let's go ahead and add our bell curve. So I'm going to select the this data control shift down and control backspace to get back up top. And let's see what this data set looks like on its own by going to the insert tab charts. Let's put a histogram in their histogram. And let's say that there's our histogram. So obviously it looks somewhat bell shaped. Let's say that this is going to be H H E I G H. Alright, there we go. And so there is that. And then I could do the same for this data set and say, well, do they look similar? Maybe control shift down, control backspace, insert charts histogram. And so there's this one also looks kind of similar, looks kind of similar. I don't know looks kind of similar to me. Let's say this is going to be weights. And we'll put this down here. So heights and weights now they both look of course bell shaped doesn't necessarily mean that there's a correlation between them, the fact that they both kind of conform to a bell shaped kind of curve. But it might we might, you know, might give us some indication that we could say, maybe that's more likely that there's a correlation or something like that. But again, it's not necessarily the case that there would be. Alright, let's make column G a little bit larger here. And we'll do our calculations for it. I'm going to copy the formula to do so. And we'll do our calculations of like so let's make this a little smaller. Bring this up top, up top is where you need to be right there. And we'll make this one a little bit larger. This needs to fit in that space, fit in the space here, everybody needs their own space. Okay, so then let's copy this data, I'm going to put my cursor on column A, and select to column B, control C, going from A to B, and then control C on the keyboard, and then we'll paste it down. And let's calculate our Z scores, I'm going to put my cursor on column I, right click and insert. And this is going to be the Z of H. So we'll do the Z of H. And this is kind of a pain because we're going to do this entire data set, which we saw was very long data set, but we're using Excel. So not a problem really. So we're going to say this is going to be equal to brackets, the Z score is each individual x minus the mean divided by the standard D of the sample. So we'll each individual x divided by the mean, which is this 6799 F4 on the keyboard dollar sign before the E and the two closing it up and then dividing it by the standard D standard deviation right here at four on the keyboard dollar sign before the E and the three and enter. And so there we have it. Let's double click on the fill handle to drop it down. So there it is. And let's do the same thing over here for the Z scores of the weights, Z of weights, something like that. I'm going to say let's make this same formatting home tab format paintbrush it right there. Z score is going to be equal to brackets each individual y minus we're on y now this one minus the y's mean F4 in the keyboard dollar sign before the F and the two closing it up and then dividing by the standard deviation F4 on the keyboard dollar sign before the F and the three and then enter double click it on the fill handle to drop it down. Now we'll multiply them together. So we're going to multiply these two together and then we'll sum it up. We'll have then the numerator. So this is going to be then the Z of H times the Z of W. I'm going to format paint it again, home tab, clipboard format paint right here, multiplying this out. This is going to be equal to the Z of H times the Z of W and then I'll double click that down. Boom. Let's make these a little bit thinner. This one could be thinner. These three can be thinner for sure. These three can be thinner and so then we can say okay. So now let's do our the rest of the calculation. Now we're just going to sum up this outer column that'll give us the numerator divided by N minus one. Let's make a little worksheet on the right to do that. Making a skinny M to do so. We're going to say this is R or the correlation and I'll make these three black and white home tab font group black white and then I'm going to say this is the sum of Z H times Z W putting this in the outer column because it's going to be our numerator equals the sum brackets of this outer column Z of H times Z of W control shift down and enter. There we have it. Now I'm going to calculate the denominator with a sub-calculation. I'll pull into the inside and indicate it is a sub-calculation by saying N minus one colon and that means I'm going to do a little calculation below it which is going to be N space space because I just want an N and then less one which is going to give me the answer or the denominator N minus one without the colon and if I indent this it's going to look like this I'm going to select these three home tab alignment indent and I'm going to indent this one here as well. So now we've got the count which is going to be the count of the columns. There's a bunch of them equals the count and I can select any of these columns because I just want to count the number of rows control shift down enter there's 25,000 rows minus one and then I'm going to put an underlying home tab font group underlying subtracting this out as our formula says to do 25,000 minus one is 24,999 and then we can do our division problem here which is going to give us our R or correlation because now we've got the numerator we've got the denominator we're going to divide out the numerator divided by the denominator and we get the 0.01 let's make this a little bit larger home tab home tab number and make it a little bit bigger now something doesn't look right to me right here because I would think that the correlation would be higher than that and if I analyze what happened what happened here if I go on over here and by the way I can double check this by saying let me see the computer do it and see what the computer comes out to and then we'll fix it so I'm going to say over here if I go to the the data and use the analysis data analysis if you don't have that you can find it in the file tab on the left you can go down to the options and then down to the add-ins and we want the excel add-ins go the data analysis tool pack and so once we have that we're going to open up the tool pack and then I want to do a correlation okay and then the range is going to be this range I'll select the headers including the headers control shift down and control backspace so I have that whole thing I'm going to select this item I want to pick up the labels because I included the labels and then I want to put it somewhere on the range I want it on my range over here otherwise excel puts it on someone else's range and I want it on the home on the range this is my home I want it next to my home on the range so I'm going to say okay let's say okay and so it should come out to according to excel 0.5 so if I go through this and say okay what did I do wrong here I think my mean calculation is right my average the standard deviation is correct it looks like the z score over here is what is wrong and what happened I divided instead of subtracting so I'm going to fix that and say this is going to be this minus this and then copy it down and see if that fixes everything else should populate properly and you can see it fixed my z score over here so notice it's useful excel's spitting out the answer like this is useful sometimes it's used it's also useful to do it ourselves because then we get a better sense of what is happening and it can give us more understanding about the data and then we can also simply use as we did here which we totally did on purpose of course for demonstrational purposes that you can use this just kidding but we could use this as a a double check for what we what we put in place all right so let's do some underline here we're going to say font group and underline I'll make this black and white home tab font group we can make this black and white let's center this home tab font let's go to the alignment let's wrap it and center it okay it's already centered okay let's make a skinny queue now I'll clean this up a little bit now and then next time what we'll do is we'll go in and look a little bit more at the calculation of the z scores and because you would and see the relationship between the z scores for these two data sets because that's a key component of one of the calculations we're making here let's let's do one more thing with our toolkit though before we do that let's go to the insert or data tab up top let's go to the analysis group the data analysis and let's go to our descriptive statistics and say okay and then input a range so the range is going to be this again so I'm going to say control shift down and control back so there is our range and I'm going to add the labels the output is going to be where do we want the output I want it over here we want this range on the v and I'm going to say give me the summary statistics and the con and the confidence at 95 okay and then okay so that gives us our nice summary data takes a little while to analyze because we had a whole lot of data this time so it's spitting it out give it a little give it a little time here and we'll say so that gives us then our calculation of the means uh standard error median mode standard deviation sample variance but once again there's static numbers so this can give us kind of a double check of this of the mean standard error and so on I still think when you're building a model that's useful to do your calculations yourself and then use this kind of as a double check so that you have a dynamic worksheet one that changes as you make adjustments to it but let's make this one home tab font group black white let's make all of this blue and bordered make it look nice so I'm going to make this bordered and blue if you don't have that blue right there it's in the bucket drop down standard blue and let's just do that to the rest of it and then we'll continue on with the z score stuff next time control shift down I'm going to make this blue and bordered or border blue the other way around and border blue selecting this stuff and control shift down border blue selecting this stuff border blue selecting this stuff border blue let's make a uniform skinny column control I'm holding down control and selecting all the skinnies so we have a uniform skinny width and we'll then make it like that skinny that's the skinniness that we want it to be so that looks much better mucho mejor and next time we will continue on with some analysis of the z scores