 Statistics and Excel Uniform Distribution Dice Example. Got data? Let's get stuck into it with statistics and Excel. Actually, we'll be looking at one note here, but we'll be talking about Excel. You're not required to, but if you have access to one note, we're in the icon left-hand side, one note presentation, 1510 Uniform Distribution Dice tab. We've also been uploading transcripts to one note so that you can go into the view tab, immersive reader tool, change the language if you so choose, being able to either read or listen to the transcripts in multiple different languages using the timestamps to tie in to the video presentations. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us, but that's okay whatever because our merchandise is better than their stupid stuff anyways. Like our CPA six pack shirts, a must have for any pool or beach time, mixing money with muscle, always sure to attract attention. Yeah, even if you're not a CPA, you need this shirt so you can like pull in that iconic CPA six pack stomach muscle vibe man. You know, that CPA six pack, everyone envisions in their mind when they think CPA. Yeah, as a CPA, I actually and unusually don't have tremendous abs. However, I was blessed with a whole lot of belly hair. Yeah, allowing me to sculpt the hair into a nice CPA six pack like shape, which is highly attractive. Yeah, maybe the shirt will help you generate some belly hair too. And if it does, make sure to let me know. Maybe I'll try wearing it on my head. And yes, I know six pack isn't spelled right, but three letters is more efficient than four. So I trimmed it down a bit. Okay. It's an improvement. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. One note desktop version here in prior presentations, we've thought about how we can represent and describe different data sets, both mathematically using calculations such as the average, the mean, the median, the core tiles and so on and pictorially using box and whiskers as well as histograms. The histograms often being what we visualize when we're thinking about the distribution of the data and then using terms to describe the distribution of the data in the histogram, such as skewed to the right skewed to the left and so on. What we would like to think about now is the families of curves and formulas that we can put together that can often characterize certain data sets. And if we can do that, if we can represent a data set with some kind of curve, some kind of formula, it gives us more predictive power over that data set. So that's kind of the goal that we would like to have if we can say, hey, this data set looks like it can be characterized at least approximately with some kind of line or some kind of curve that we have a formula for. That would be a useful tool to have. Now the first one that we're going to look at, the first family of curves will be the uniform distributions. It's going to be the easiest one because it's basically a straight line. So when we said uniform distributions, you might have imagined that we're going to kind of distribute out uniforms for the accounting instruction statistics course and you're going to get a uniform or something. No, we're talking about uniform distribution as a family of a curve, basically representing data. All right, so we're going to be thinking about dice rolls here to get an idea of what this will look like. So let's say we have a die and the die has six sides to it. And if we were to roll the die a thousand times and what would be the likelihood that any one number, whether that number be a one, two, three up to six, what's the likelihood that we roll how many ones or what's the likelihood to roll a one each time, for example, well, it would be one over six, which would be the 16.66 on and on. So if I rolled it a thousand times, what would be basically the expected value that you would have for any one number? It would be this times 1000, right times 1000. So you would expect there'd be 166.66 and so on of each individual number of one through six, that would be kind of our visualized outcome in our mind. Now note that this visualized outcome is just a model, we're coming up with a model that hopefully provides us some predictive power, but of course is not perfect in real life, which is clearly described to us by the fact that we have an un-whole number here. So it would be impossible for our predictive model to actually become true, because there's no way that we're going to get 0.67 of a one or a two that we roll, right? We can't roll it 0.67 times. But you can see that the model gives us predictive power over what what possible the chances are in the future. So if we take then that data and and I was to graph it or plot it out, if I look at the dice numbers, there's six numbers on the die one through six. And if I rolled the die a thousand times, each one of those numbers we would expect to come up to around 166.67. So our our kind of perfect model that we have in our head, which is too perfect because it doesn't take into effect, I count the randomness because this is basically a sample instead of the entire population of dice rules, which we imagine to be like an infinite number of dice rules. It would look like this. Now, if I was to graph that in a histogram, then we've got the dice one through six, and the expected roll, it would be just a straight line, right? We would expect all of them to be 167 across the board. And obviously, we have now a straight line. And notice that the straight line, you might say that the uniform, that's what the uniform distribution will be. You might say, well, look, there's only one of those, not really a family of curves. But obviously, if we roll the die for some other number other than 1000 times, if we roll the die, you know, 200 times, we would expect the outcome to be 200 times the point 166.66 on and on. So it'd be 33. So the so it's actually kind of a family of curves, because the straight line is up here. If we roll the 200 times, it would have a straight line at the 33. So these are family of curves, that which are basically straight lines, which are just straight lines, right, which are the uniform distribution that that that we would have, that would be our expected outcome formula for it, f of x equals C, we're going to have the same, the same outcome, because it's uniform. Nice, easy equation for us. Our predictions are nice and easy, although they're not going to be perfect, because in real life, there's going to be the randomness involved. Now, if we were to approximate what actually would happen, if I roll the dice 1000 times, you could do this in Excel, and you could do it by by using the random number generator, which would look like this random between, and then the bottom number would be one top number would be six. My voice cracked. I'm just going to copy it. If we copied that down 1000 times, I don't think I added all 1000. I only went down to here. But if you do this in the Excel worksheet that we will have as well, you would have 1000 numbers that are approximating that are randomly generated as a dice rule would be random in theory, right? One through six. So the likelihood of this one coming out a two was, you know, one out of six, right? So so we rolled a two, then we rolled a five, then we rolled a three, then we rolled a six, then we rolled a one, a one, a four, and so on, and so forth. So if we take then that data, we could say let's do it this way, we can say, okay, now we've got the dice one through six, we've got the X, the expected rules were even. This is what we expected to happen. But this is what actually happened. Now this actual data, we're pulling in from from our data set over here by basically counting the numbers that are coming up. And the formula in Excel would look something like this, we're going to say equals count if brackets, we're going to be picking up our entire range, you can see it goes down to 1000 and in Excel. And then we want the criteria. So we want you to count every number in this range, if it has q two, which represents this number one. So if you find this number one in the range, count it. And it says that that happened 182 times. And then we have how many twos happened 170 163 is 164 fours 149 fives and 175 sixes. And we can see that those add up to 1000, which makes sense because we rolled it 1000 times, that's kind of our check figure. The difference then this is what we expected to happen. This is what actually happened. So there's a difference of 15 is pretty close, but not exact. A difference of three, a difference of seven on the positive side, a difference of three, a difference of 18, a difference of eight, and so on. So then if I was to plot this, I can say this is the actual outcome, right? So the expected was a straight line. But the actual outcome is not exactly a straight line. But if I was to try to predict what's going to happen in the future, it's useful for me to be able to use a function of basically just the straight line, right? I'm going to you if I'm going to say what's going to happen in the future. Well, this looks like it can be approximated pretty closely with the straight line. So that's why the straight that's why that's going to give us some predictive power about what will happen in the future. If I wasn't able to do that, if I was to say, hey, look, this doesn't look like it conforms to anything. It's just the numbers are coming up randomly, which could very well happen in different circumstances. Some data sets might not be able to be represented with some kind of line or approximating a formula. And if that's the case, then then it's going to be a lot more complex for us to use past data to project future data into the future. But if we're saying, hey, look, this looks like it approximates some kind of actual curve in this case, a straight line, then we can use the formula to help give us some predictive power of what's going to happen in the future. So notice that this histogram up top, I made with a bar chart, and we could make the make the histogram as well with a histogram in Excel. If we do it with a histogram, the histogram in Excel is going to try to give us a top and bottom number. But you can see there's a there's one, it's one number distance apart. So we so you can use either of those formula those charts to in essence get get the same results. So if you want to check that out in Excel, we'll have that in Excel. Now in Excel, if also, you wanted to run this experience, this experiment multiple times and say, okay, that's pretty close right here to the straight line. What if I did it four times? Uh, then and I can run multiple experiments and say, okay, are all of them going to come out similar similar. So we rolled once again, we did the the random number generator between one through six, as if we rolled the dice 1000 times four times, right. So again, this is what what apparently they used to do for for the in the universities, you know, if you worked in there, they had, you know, people just roll in dice all day. And they're part of the union job and stuff. But now we have the computer doing that, you know, it took a long time, but that but now the dice rollers are out of there. And we just we generate it with a computer now. So then if we if we made our our histograms this way, you could see again, it's approximating a straight line. This is the first results. This is then the second results. It's not exactly the same, of course, because there's randomness in it. But you can see it still kind of approximates the straight line. Here's the third one of 1000 rules. So they're all different. But they all, you know, approximate basically that straight line. And the idea, if you think about the sampling concept, would be that if I was to roll this an infinite amount of times, which would be the entire population, then it would in essence be, you know, a straight line representation, meaning you would expect the outcome to be one over six, right, for each roll times an infinite number of times, right? But because we have a sample of the data, it's not going to it's not going to come out perfect, but we can approximate it with our formula. And so that's the easiest kind of formula to approximate, right? It's a straight line. We can see we can see that now, obviously, if we can do a similar thing with curves, which we'll talk about in future presentations, representing the data with a more complex formula, but still a formula so that we can make predictions with with a with a formula, then that would be great as well. And we'll get into some of those in future presentations.