 Statistics and Excel. Correlation, simple, few data points example. Get ready, taking a deep breath, holding it in for 10 seconds, looking forward to a smooth, soothing Excel. Here'll be first a word from our sponsor. Yeah, actually we're sponsoring ourselves on this one, because apparently the merchandisers, they don't want to be seen with us, but that's okay whatever, because our merchandise is better than their stupid stuff anyways. Like our crunching numbers is my cardio product line. Now, I'm not saying that subscribing to this channel, crunching numbers with us, will make you thin, fit and healthy or anything. However, it does seem like it worked for her, just saying. So, subscribe, hit the bell thing and buy some merchandise, so you can make the world a better place by sharing your accounting instruction exercise routine. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. Or in Excel, if you don't have access to this workbook, that's okay, because we'll basically build this from a blank worksheet. But if you do have access, three tabs down below, example, practice blank, example, in essence, answer key. Practice tab, having pre-formatted cells, so you can get to the heart of the practice problem. The blank tab, blank worksheet, so we can practice formatting those cells within Excel as we work through the practice problem. Let's go to the example tab to get an idea of where we will be going, what we will be doing. We're looking at correlation, once again, two different data sets to see if there's a mathematical relationship between them, a correlation. In other words, if there is a mathematical relationship, the next logical question would be, is there a cause and effect relationship? And then the next question would be, what's the causal factor in that type of relationship? We're going to use an example this time, which is not a perfect positive or negative correlation, which is going to be similar or more like what you would see in practice, meaning the data sets might be moving together, but they're usually not going to be perfectly correlated in either a positive or negative kind of scenario. We'll start this one off with just a few data points. We're going to be imagining we have hens and the number of eggs that the hens will be producing as our data points. We're just going to have four of them. We'll take a look at our correlation formula and calculate that, and then we'll also do some of our statistics using our data analysis toolkit. So let's go to the blank tab and start, let's get into it, let's get into it. So I'm going to start down here with just our definition. I'm going to say that x is going to be, for this problem, is going to be the number of hens. That's not a number, the number of hens. And then we're going to say y is going to be equal to the eggs that the hens produce. Now before I get much further into it, let's do some formatting of the entire worksheet, which is usually what we started. You're getting ahead of yourself, selecting the entire worksheet, right click, let's format those cells like we normally do every time. We want the currency, negative numbers bracketed, no dollar sign, no decimals. And okay, let's embolden the worksheet by going to the home tab font group and bolding the entire worksheet. Alright, so then given that, let's list our data points. We're just going to list our four points, which will be somewhat correlated here. Now this might not be an actual data set. We're just kind of making the four points, but you get the idea of it in the example. Let's go to the home tab. Let's go to the font group and make this a header, dropping it down. I'll make it black and then white on the font. Let's center it alignment group and we will center it on the center. And that looks good. So then I'm going to say the data points for X are going to be 3567. Those are the number of hens, the eggs that are produced by the hens. We're going to say 105185. I think this is a year 201 and 345. There's our data points. Let's go ahead and format these. I'm going to go to the home tab fonts group. Let's put some borders around it and I'll make it blue as well. Drop down. If you don't have that blue, I'll hit this one down here. It's in the standard color going to make it blue. You could do whatever you want to do, but I'm going to make it blue. And then I'm going to go home tab font group. I want all borders, not just those borders. Let's do the same thing here. Blue and border, border blue. We can make this a little bit thinner without messing anything up, I think. Let's skinnerize, skinnerize the A. There we have it. I could use some skinnerize in myself. I'm on a diet, but someone stole my protein shakes. Now I was on my protein diet. Now I can't do it now. Anyways, that's another story for another day. X and Y here. We're going to say let's do the same thing. And I'll highlight these two. Let's go to the home tab clipboard format painter. There's our headers. Paste that on down here. Do some simple calculations. Let's take a look at the mean. Let's take a look at the standard deviation. And we could do the variance. Well, actually, let's not do the variance yet. That might confuse what we're looking at. So let's pick up the mean. That's the average equals the average tab of the first set of data, which are the number of hens. Control shift down. And we have an average of five of them. Let's add some decimals so we can really recognize the data. Home tab number group. Decimalize it. And then let's go to the average of the Ys. The number of eggs equals the average of the number of Ys. Control shift down. Enter decimalizing it. Home tab number group. And boom. So we've got around five and a quarter hens. And they give us like 209 on average eggs. That'll keep me going. That'll keep me going. That'll get me yoked up so I can build some muskles. I don't need my protein powder now because I've got my eggs. So this is, we're going to say standard deviation. I'll just drink them like Rocky does without Rocky Balboa, the boxer without, I don't even cook them or anything. Anyway, standard deviation with the S sample. We'll select this data and enter. And then home tab number group, adding some decimals. And then on the Y, this is going to be equal to the standard deviation, standard deviation S tab. And we'll pick up this data and enter adding some decimals. Home tab number group. Let's add some decimals there as well. All right. So there we have it. Now we could do our correlation calculation. And we'll use this formula to do so. Before we get there though, let's take a look at just making a graph of this. So I might say the easiest thing I can do is just say, well, there's the X and the Y. Let's just graph this out and see if, and plot it. So we're going to go to the insert tab, selecting that data, insert tab charts. We're going to our insert here for the scatter. Let's just make a normal scatter plot of it. Now when I make the normal scatter plot, as we have seen, oh, what is going on there? Undo that. I didn't mean, I didn't mean it, man. I didn't mean it. Let's hide that. Let's put this down. Let's make it a little smaller. Okay. Okay. Okay. Everything has now, let's scroll it down a bit. This chart needs to be bigger than this. And then I'll delete the top one. Now you'll recall that the X, the one on the left, will be the X that it's applying on this side and the one on the right is going to be the Y. So if I add my labels, I usually start with the X. I'm going to say this equals, it's going to be the one on the left, which is the X in this case, or the hens, right? The number of hens. And then over here, we're going to say this is going to be the Y, which is going to be equal to, I'll just say the Y, which are the eggs. So there's the X and the Y. And maybe that would be better for me to do. This equals the X hens over here, and this equals the Y, the eggs. Because, and then I can say, okay, it looks like there's kind of a relationship, which you would expect if you're like a farmer or something, you're trying to buy some hens to get some eggs, you can get some protein, because someone stole your protein powder, then you could make some predictions on how much protein you can get from those hens by counting the eggs. So then I could add a trend line, and hit the plus button down here and add a trend line. And let's see some options for the trend line. So clearly there's not a perfect correlation, but there is a correlation. I can make a formula for that trend line by just selecting this option, which is nice because that gives us that kind of predictive power. So if I'm trying to buy my hens, and see how many hens I need to get so many eggs, for example, then the linear line, hopefully might give me some predictive power on the number of hens or something I might need. If I go to the first tab over here, let's make the drop-down, I'm going to make it a straight line, and then I'm going to make it orange. That's what I kind of like to do with my trendy lines, which are called the trend line. Hold on, I wasn't on the trend line anymore. Let's hit the trend line. That's where I want to be. More options on the trend line, because that was on my formula, and I put an orange thing around the formula. But that looks nice, too. So I'll leave it there. I'm going to go to the trend line and the trendy line. If you really want to be trendy, you have to be orange. That's what the end trend is these days. That's the color trend that people are into. So there we have that. And now, note that in this case, it's pretty clear that you would think if there's a correlation that the hens would be the causal factor of the eggs. You have a chicken or the eggs problem here, and at least from the farmer's standpoint, you get the hens and then the eggs happen. You could do it the other way around. You could buy an egg and then get a hen. But you would kind of think the hens are the driving factor generally, because you're probably going to buy the hens to get the eggs, I would think. But I'm not a farmer. Maybe you do it the other way. But you could say, like, if I flip these, I could say, let's flip these. I could do that and insert another chart and say, boom, another scatter plot down here. And now I'm just going to flip the x and the y. So now I'm going to say that here I want the labels. Let's add the labels. I want the x now to be equal to the eggs and then the y to be the hens. So there we have that. Now, how can we flip that here? I've got to go into my data set. So I go into my chart, my select data, and I'm going to edit the data. And this time on the x here, I'm going to delete the x, be careful because it can be a little tricky here. And this is going to be equal to the y's this side now. The x is equal to the y's or the eggs. So we'll say, OK. And then the y's are equal to the x's. Wow, that's confusing. The y's are equal to the x's, OK, which are the hens. OK, so there we have that. And so then I'm going to say, OK, and OK. We still have that positive kind of relationship, but now we've flipped it. So note, you can plot either way, the x or the y, and you'll get that positive trend line relationship, but it's the general custom to try to put the independent factor on the x, which you would think in this case would be the hens. So I'm going to let's put a trend line in this, hit the plus button. Trendy line, there's our trendy line. And let's make it more trendy. Let's put an equation on it. So there's our equation on the trendy line. Let's make sure I'm back on the trendy line again by going to the options. And then the bucket, I'll make the trendy line a solid line and orange. So there we have it. So it looks like there's a relationship. If I go back to the top one, you can see the hens are over here. As the hens go up, we get more eggs. We would get more eggs. So here's this first point. We had what, three hens, and then we got 105 eggs, which makes sense. And then the second point was at five hens, and we got 185 eggs. And then at six hens, we only got up to 200. And one egg, so it's some slacker hens, because you would think they would be bringing me up to the trend line here, but got some lazy hens, I guess. And then on the seven line, I mean, laying any eggs is hard work. I'm not trying to put down the hint. But in any case, this is the 345, and that brings us up to the 345 with the seven. So that's okay. That's our trend. So then let's do our formula to see what the correlation is going to be. So I'm going to make this one a little bit larger, and then I'll make these smaller so they fit in the space over here, just to do some housekeeping to organize this, because the hens are making a mess everywhere, and you've got to get everything organized when you have all these hens running around. So then we're going to say this is going to be... Let's copy the data first. I'll just copy this data, and I'll paste it over here. And let's do our analysis. So I'm going to do the z-score, z-score times the z-score over n-1. I'll actually do an added step here just so we can see that first bit a little bit more clear. I'm going to select two columns here, right-click and insert. So now I've inserted two columns after the x. I'm going to do the first one, which is going to be the x minus the mean of each of our data points. So I'm going to say this is going to be x minus... I'll call it just the standard deviation, or you could call it the sigma of the x-bar. I'm sorry, x minus the mean, which I should say is the x-bar. Let's see if we can get fancy with that. I'll put an x, and then I'll go into the insert, and we'll go into the symbols. And I have my bar over here. I think it's in the Greeks, but I have it in my recently applied area. Let's see if I insert that if it does what we want. There it is. So you can insert that x-bar. If you don't have it down here, you can look for it up top. We did that in a prior section, so I won't get into that in detail. But there it is. So now we're going to do that for each of our data points. So I'm just going to be taking then this number three, the first data point, minus the mean, which is the 5.25. I want to keep this the same as I copy it down. Therefore, F4 in the keyboard, making an absolute dollar sign before the E and the 2, and Enter. Let's do some decimals, and Home tab, number, decimalize it, and then copy it down. So there we have it. And then I'm going to give the Z-score, which means I'm going to take the result of this one and divide it by the standard deviation. And that's going to give us basically our Z, Z of x. It's the Z-score of the x, which is going to be equal to this, divided by the standard deviation of a sample instead of the population. F4 on the keyboard, making an absolute dollar sign before the E and the 3, Enter, decimalizing it, Home tab, number group, decimalize, copy it down. There we have that. Let's do the same for the Y. So we're going to say this is going to be Y minus Y bar, insert, symbol. There's my bar, insert, and OK. And then I'm going to copy the formatting of these. Let's just copy the formatting of all of these cells. And then Home tab, Format Painter and paste that here. And it messed up my bar. OK, that's OK. I'm kind of upset with that, but I'm not going to get too upset. So we're going to say this is going to be equal to the 105 minus the 209, which is of course the mean. F4 on the keyboard, dollar sign before the F and the 2. So we can copy that down and the 209 doesn't move down. Enter and then let's copy it down. Let's do then this one is going to be the Z of X, which means we're just going to take this amount divided by the standard deviation. So we're going to take this over divided by standard deviation, 99.92 F4 on the keyboard, dollar sign before the F and the 3. So the 99.92 doesn't move down when we copy it down. Enter, decimalizing it. Why didn't I decimal? Home tab number decimalize and copy it down. So we'll copy that down. OK, so there we have that. So now we've done this first bit for each data point, all of the X's minus the mean divided by the standard deviation, which is in essence the Z score. The Z score now will multiply them together, which is the Z, which is the Z of X times times, which I'll do with an asterisk, the Z of Y. And let's go ahead and home tab font group will make that black, white, center it, wrap it. And there we have it. And then we're going to say this is going to be equal to Z of X times the Z of Y and decimalize it. Home tab number group decimalize, copy it down. So there we have it. So now we've done this whole top part, except we need to sum it up and then divide and then divide by in the number four of them minus one. All right, so let's make this blue and bordered before we do that. And so I'm going to go border it and make it blue. And can I make any of these thinner? These are taking up a lot of room. Let's see, these can be thinnerized like that, skinnerized. This one can be even more skinnerized. And you could be skinner. I know I could be skinnerized a little bit, but it's not so easy. It's not so easy as that to skinnerize me. I'm going to say, okay, so then we're going to say, this is going to be, let's make a skinny O and this is going to be the R or the correlation. And I'll make this the header black and white home tab font group. Let's make it black. Let's make it white. And I'm going to start with the sum of the D of X time. It's hard for me to hit a Z on the keyboard because I don't think I finger it right. Z of Y, I don't think I'm using the right fingers of that. My typing instructor in high school would scowl at me if you saw me. This would be equal to the sum. It's one of the hardest classes I had in high school that took a typing class. It's supposed to be easy typing. Easy It makes it difficult but probably a good class. N minus 1, very useful skill here. And then we're going to say this will be N this will be a less 1 and so N minus 1 and so then we're going to say then this needs to be just N not that okay and then so the N is going to be the number so just count them. I'm going to use a count formula to do that because that's the fancy way to do it and you know we're being fancy using Excel here. Count them which is going to count all four of them four of them minus 1 and then underline this is going to be equal to the 4 minus 1. Alright let's do some indentation here. Home tab alignment indent indentation here home tab alignment indent and we'll just kind of recap what we did here so we did the whole top part for each of the data points to get to here then we summed them up to get the whole numerator which I put on the outer side because that's our numerator then I did a sub calculation inside and like a tax return style here so we can then see the table of what is happening. We did n minus 1 with a colon and then pulled inside here indicated with a tab as well as it in the inner column of the number of cells n which are four of them minus 1 according to the formula gives us 3 and then we're just going to divide to give us the r or the correlation so we're going to say this is going to be equal to 0.77 divided by 3 and decimalize it home tab number group decimalize we get 0.92 on the correlation so there's definitely a correlation there it's not a perfect correlation as we saw in the prior presentation if it were it would be 1. So let's format that I'm going to select this stuff we're going to go to the home tab font group border and blue and then now that we have that we can just kind of double check it by doing the simple correlation calculation with our data analytics I'm just going to select this is the data that we're going to use I'm going to go into the data tab and let's go into our data analysis if you don't have that you go into the file tab on the left you go into the options down below you're going to the add ends the excel add ends and then you go into the excel add ends and you check off the analysis tool pack if you want to be on top of the pack you need the analysis tool pack so there's the data analysis so now we're going to go in and say data analysis we're going to click on it I'm going to say it too you don't have to say it you could just click on it but I'm going to say it and click on it I think it's better to say it because then you know it then as you kind of get what you're doing so variance so anyways this is a correlation we're going to say okay the data we want is this data I'm going to select the header and include the header if I include the x and y header I'm going to remember to check off that labels to note that I did include the labels I want to put it somewhere I want to tell excel where I want to put it I will tell excel where it needs to go otherwise excel just puts it wherever it wants to put it like on another tab or something you're like excel as you do that's not where I wanted it excel is like you didn't tell me where I wanted it it was like okay I guess you have a point excel you win this round this is where I want to put it so we're going to go to the home tab font group this is going to be black white this is going to be blue and bordered bordered blue and so now this is the number we're looking at because that's looking at the relationship between x and y and you can see that you have the point 9 to 1 9 8 8 relationship tying out to what we have here if we did that other data analysis with this little bit of data just to practice it as well if I go up to the data and I go into my data analysis the super cool tool pack we're going to say let's do the descriptive statistics and so I'm going to pick the range where I have my home I have my home on the range home on the range and then I'm going to pick that's a song sorry about that I got distracted I'm going to check these two then statistics and I probably don't really need this one but I'll check it off anyway the output range I want to put it where do I want to put it I'll tell you not in a new worksheet excel I want everything in one spot otherwise I need to know where it is you start putting stuff on new worksheets and everything's all scattered all over the place it's like going into some messy room or something it needs to all be where it's supposed to go so then I'm going to say okay so then it gives us our standard stats so we've got the mean the standard error which may not be applicable the median the mode there is no mode because there's only four data points standard deviation variance kurtosis skewness range and so on and so forth there's our count of four but remember this is static data so this is a great tool to kind of first look at your data set but you might not want to use it to build your model because the model would be nice if you could change it if there were changes to your to your data over here alright let's just do some formatting to clean this up I'm going to do some blue borders blue borders here and let's do the blue borders over on this stuff too why not let's do it this way first I'll do the blue borders blue border blue and black and white movie be to the end be and at a most looks good alright so that that is it let's spell check it actually spelling correlation I can't spell it that's what I'm working on I can't spell it alright I'm going to spell it right next time next practice problem you watch I will be able to spell correlation