 Statistics and Excel. Correlation, random number generation example. Get ready, taking a deep breath, holding it in for 10 seconds, looking forward to a smooth soothing. Excel. First, a word from our sponsor. Yeah, actually we're sponsoring ourselves on this one, because apparently the merchandisers, they don't want to be seen with us. But that's okay, whatever, because our merchandise is better than their stupid stuff anyways. Like our, trust me, I'm an accountant product line. Yeah, it's paramount that you let people know that you're an accountant, because apparently we're among the only ones equipped with the number crunching skills to answer society's current deep complex and nuanced questions. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. Here we are in Excel. If you don't have access to this workbook, that's okay, because we'll basically build this from a blank worksheet. But if you do have access, there's three tabs down below. Example, practice, blank. Example, in essence, answer key, practice tab, having pre-formatted cells, so you can get to the heart of the practice problem. The blank tab is a blank worksheet, so we can practice formatting those cells within Excel as we work through the practice problem. Let's go to the example tab to get an idea of where we will be going, what we will be doing. We are looking at correlations once again, two different data sets to see if there's a mathematical relationship or correlation. Are the two data points moving together in some way, shape or form in other words? And if they are moving together, if there is a mathematical relationship, the logical next question would be, is there a cause and effect relationship? And if there is a cause and effect relationship, the next question would be, what's the causal factor in that relationship? We're going to start off this time by making two sets of data from just randomly generated numbers. And then we'll copy and paste that so we have static randomly generated numbers. We're going to notice that each of these data sets, if we were to make a histogram, are going to tend towards the uniform distribution. As we saw in the prior sections when we looked at different types of distributions that are common, uniform distribution, bell curve distribution, Poisson distribution, exponential distribution, for examples. So because we use the random generate and each of those sets will tend towards a uniform distribution, we'll plot our statistics in terms of the mean standard deviation. We'll plot these two against each other to see these points in our scatter plot. And we want to then also just kind of recognize the relationship of random data to also what people often think of as random data. To do that, we'll make another data set which is closer to what people might imagine or what they often do when they're trying to come up with random numbers, which is to space the numbers out and you get a data set which looks kind of like this. So this is just to point out the difference between what randomness generally kind of looks like, which is usually a lot more clumpiness of numbers than what we kind of think in our mind of what randomness will look like, which is this spaced out kind of thing where all the numbers have their own kind of little area. So if we were to try to make something random ourselves, we're probably more likely to do something like this as opposed to some actual randomness that would look more like that. But what we're looking for here is the relationship between the two data sets. And so whether they be random or spaced out somewhat like this, we had two different methods to create those data sets. So you would think that there's going to be a very low relationship for the regression line and then we'll plot the correlation mathematically as well here and then we'll do it with Excel. All right, let's go to the blank tab to get started. I'm going to format the entire worksheet this time doing that first selecting the triangle up top right clicking on the worksheet and let's format the whole thing. And we're going to make it currency as we normally do. Negative numbers bracketed and red, no dollar sign, no decimals and say OK and click on OK. You can say it. You don't have to really say it, but you have to click on. I'm going to go to the home tab font group. I'm going to make the whole sheet and boldened and so let's just add our data set. So first data set we're going to say this. I'm just going to call it Rand one because they're generated from random numbers and let's call this one Rand two. Going to select those two and go to the home tab font group. Make it black. Make it white on the letters as is our typical header structure alignment. We can just center it. We don't really need to wrap it. Then I'm going to use a random number generator. I'm going to make a random numbers between one and 100. So this is going to be equal Rand and tab or actually Rand between and then tab and the bottom number. I'm going to say is one comma top number is 100. So give me random numbers between one and 100 closing it up and enter and there we have it. So let's see. Let's make like I don't know 200 and some of these. Let's go down like 200 and something of these randomly generated numbers randomly. Let's go 216 which should be 215. If we take the header column out there we have it. Boom randomly generated. Let's just copy this whole thing over. Let's copy this and paste it over here for more random numbers randomly generated for number two. So these have all been randomly generated but they're not connected to each other in any way. So you would think there would be a very low correlation between randomly generated one and randomly generated two even though there are numbers between one and 100. Now because this random generation changes every time I do something I want to copy these two and put them over here and paste them just formulas only so that it doesn't keep shuffling around. And then I'm going to make a skinny C skinny C C I made a skinny C C see what I did with the C I made it skinny. I'm going to go to the home tab font group. Let's make this black and white and alignment and let's center this. Okay. So there's our data sets. So now let's do it over here and do our statistics for rand one header rand two header. Let's format it the same formatting as we have here so I'll select these two cells go to the home tab up top clipboard and paint brushy paint brush it right there. Actually I should move it to the right. I'm going to cut this right click and cut it and move it to the right by pasting it right here. That's that's doing the same thing by the way as moving it like this cutting and pasting is the same as basically doing that. All right let's make a skinny F like that and then say that we want the mean we want the standard deviation and that let's keep it at that mean and standard deviation. So this is going to be equal to the average which is the mean formula. It's a mean formula. We're going to say control shift down for ran data one enter 49 let's add some decimals to get a little bit more precise home tab number group decimalizing it and then let's do the same thing this time. Let's just copy it over if I just copy it over like so double click on it to double check it and it is picking up the correct data there so that looks good. Let's do the standard deviation from the sample standard deviation for the s s t d e v dot s sample tab selecting the data control shift down for ran one data and enter decimalizing home tab number group decimalize copying it to the right so we do the same thing for ran data set number two. So there it is double click double check and it's done what what it needs to do. All right note we could look at this and say what if I made a histogram of this data sets to see if they look similar so I could say control shift down and then control backspace. Insert and let's go to the charts and let's make a histogram of this histogram for ran data set number one looks like this randomly generated buckets this is going to be for ran one. So it made our buckets and it listed the numbers that are how many times we had a number between one and eighteen eighteen thirty five thirty five fifty two. And if we were to approximate this with some kind of curve or formula it's going to approximate a straight line because this is going to go towards a uniform distribution because we chose randomly generated numbers. So let's let's do it as opposed to a bell shaped distribution right it's more of like it's going to go towards as we have more data it's going to tend towards a uniform distribution or straight line. Let's go to the ran two and do the same thing selecting all this data can and I had control backspace and then insert chart histogram boom next one. And this is going to be ran two so it looks very similar see so see you'd say OK this one looks like it's going to be a uniform kind of distribution to it's got kind of a little bit unusual. Look here that you know it's tapered off a lot on the ninety seven to one thirteen which is interesting but it should still be because we use we use randomly generated numbers. The more numbers that we produce it would tend towards a straight line so they look kind of similar like that. But still you would think they might not have a correlation because there's there are at least not a high correlation because there's nothing really tying these two together other than the randomly generated from one to a hundred. So let's let's say OK let's make this one a little bit larger let's make this cell a little larger and I'll make this one smaller so it fits in there fits in the spot there put everything in its place needs to go where it needs to go. And then I can choose these two control shift down and control backspace and do a scatter plot between them insert charts let's do a scatter plot boom looks like that. So now we're scattering these. I'm going to delete the top one and this is the random numbers and so now I'm going to say let's hit the plus button up top and label them. So I'm going to put the axis labels and by default if I go to the first let's go to the X axis first I usually think about that first the X axis is the one on the left so this is the random one data. The Y axis is the one on the right so this is equal to the random to data. So there's our there's our correlation and you could see if I was to try to draw a regression line between them or in here we could say plus button trend line and you could see it's it's got a low negative correlation to it. So I can say if I add my formula there's the formula for the line and then I can say let's make sure I'm back on the trend line. I'm going to say plus trend line options I'm bucket I like to make my line solid and orange solid orange line that's solid man that's solid. So there we have it so you can see it has a low you know negative correlation for the two that have been randomly generated between one and a hundred. Now you could also say well what if I wanted to put because we might not know of course what's we don't we don't know what the causal factor would be which should be on the X and which should be on the Y because really they're not related other than picking random numbers between one and a hundred. So if I select these two and control back and I want to put ran two on the X axis I can go to the insert I can go into the charts scatter plot scatter there's a scattered plot out there and to do something. We're going to say plus button let's do the axis titles this time I want the X axis to be ran two and I want the Y axis to be the ran one which is backwards because by default Excel will always plot the one on the left on the X. So how can I switch the two if I don't want to rearrange the actual columns we can go into the chart design the data we can go into our ran data and edit it and then this is the X values I'm going to delete these and say that delete that. And this was ran one we want to be ran two info yo and then OK. And then this one I'm going to delete that and say this needs to be the data for the ran one so we switched the X's and the Y's hopefully if I did that properly. And so I'm going to say OK. And OK. And so now we've got our scatter plot which looks very similar still negative negative well that's the wrong one. Negatively correlates put that right back where it goes. And then it's still going to be negatively correlated so if I add the trend line trend line negatively correlated so it's not like it flipped to a positive correlation. And then hitting the arrow we're going to say let's make the trend line. Solid and let's make it orange. So clearly in this example unlike the hands and the eggs example we saw last time we can see that there's a mathematical correlation no matter which one we I mean there's a low you know correlation it doesn't look like it's very correlated. But if there was any mathematical correlation then we can kind of determine that with the trend line. But we have we don't really have any idea what the cause and effect relationship would be at this point if there was any because with the hands it seems pretty clear that you could say well the hands are causing you know the eggs. I would think the hens would be on the X but you can plot it either way and you're still going to get if there is a correlation that's directional correlation of a negative correlation sloping down positives correlation sloping up. Okay so there is that so now what we want to do is is compare that let's compare that to to a system just to see what people normally think of when they think of random numbers this is way too like clumped up for most people when they actually try to generate. Random numbers so so let's add another data set what which would be closer to what people would actually kind of do when they when they try to set up random numbers I'm going to set one set. So let's say set one and then set two and what I'm going to do is I'm just going to say I'm just going to space everything out by five so I'm going to say this is going to be let's do it. This is going to be equal let's just do it this way we'll say five ten I'm going to bring that up to 100 and we'll stop it at 100 to do like that and then I'm just going to copy that and repeat it multiple times. So I'm just going to copy this and paste it and paste it now these are in the way now get out of my way and then paste it and paste it and paste it and paste it and paste it. I'm going up to like 200 and 15 is what we had before 200 and paste it to 215 or 16 right there. Let's delete these and then what I'm going to do is I'm going to do the same thing but I'm going to just have it off a little bit so I'll start this one like at the 30. So I'll paste it. I'll paste it here. I'll copy this same thing five to five, 10, 15, 20, 25 to 100 so we'll copy that but I'll start it let's start at 35. Paste, paste, paste, paste, paste, paste, and then I'll copy this last bit, copy, and that should fit basically up top here. Not exactly, but I'll keep, I'll do it, that's fine, I'll do that here. So there we have it. So there's our numbers. Now, obviously most people when they pick random numbers, they wouldn't just count by five, but they tend to space things out. So for example, if I add another column, I'm going to put my cursor on column K, right click and insert, and then I'll make this a skinnier column, and I'm just going to say this is going to be my random numbers for one, and then I'm going to insert over here, insert, this is going to be random for two, and I'm just going to generate random numbers this time. Those random, and this is going to be just random, not between, and that will give us a long decimal format, home tab number, it's going to be a long decimal formatted number. I'm going to double click that down, taking it all the way down, I'm going to copy that and put that over here, control paste, and double click copy and it down. So now we've got this random number generator. Now I'm going to, now I want to put a space between these two, I want to put a column between them, and then I'm going to make two different tables for random one and random set two. So I'm going to select column M, I'm going to right click and insert. So now we have a spacer between these two, then I'm going to insert a table for both of them. Let me just scroll down and make sure I have everything done correctly. Notice I have too many cells on this side, I should have brought it down to two sixteen. So I'm going to delete this last little bit before I insert the table, deleting that, and then I'm going to scroll up top, put my cursor here and go to the insert and let's insert a table. So we'll insert a table, data set one, and this side I'll do the same home tab, I'm not insert tab, tables, insert a table for the second data set. Alright so now I can shuffle these, so I'm going to shuffle them randomly, so now I took my number that had a pattern and I shuffle them over here and I shuffle these ones as well. And this is more like what people would probably kind of come up with, they wouldn't always end with a five, but if you were thinking about coming up with random numbers people would be like, ah five and then like forty five is like away from that, and then like thirty is pretty far away from the forty five and eighty five, you wouldn't really think to do another forty five right here or something right next to it, right, people will usually kind of space everything out is what the point is. So let's copy these two so it doesn't keep on shuffling, I'm going to put my cursor on column L, put my cursor on column O as I hold control so I have non adjacent cells that I'm going to copy, right click and copy, let's put that in column Q, right click and paste just one, two, three, just the value so it doesn't keep shuffling. And then I'm going to make a skinny P column, let's make this the header by going to the home tab font group black, white, let's center it and then let's just select this data I'm going to say control shift down, control backspace and then insert and charts and let's make a scatter chart of this information. And so now you have this information that hold on a second that a lot of people would say well that looks random because it's nice and spaced out. But the point is that this is not exactly random it's this is this is going to be what it looks like when it's random where you have all these clumpings that happen more often. Randomness isn't a sign of unrandomness oftentimes, if you see something that's too uniformly distributed like this, it's likely that there's not randomness in it. Now whether it be random or not, however, we're looking at the correlation between the two, between the two data sets. So we'll continue on next time and we and neither of these data sets are really exactly connected to the other. The first one, we selected two completely random data sets. And the second one we had kind of a system. And there's a pattern to the two data sets that we used and then we shuffled them. But again, they're not really connected together. So you would think the correlation would be similarly low. So in other words, if I hit the plus button here and add the trend line, you've got a very low correlation, I'm going to say more options, trend line. Let's make it solid and orange. We could still, even though there's a low correlation, we could still make it a trendy line by making it dressing it up properly. So it looks appropriately dressed for the occasion. So anyways, next time we'll go in and we'll do the formal calculations continuing on with the practice problem.