 Statistics and Excel. Calories, data, statistics, sample, example. Got data? Let's get stuck into it with statistics and Excel. Well, actually, we're using OneNote here, but we won't be talking about Excel, too. You're not required to, but if you have access to OneNote icon left-hand side, OneNote presentation, 1360, calories, data, statistics, sample, example. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay, whatever, because our merchandise is better than their stupid stuff anyways. Like this CPA thinking cap, for example. CPA thinking CAP, you see what we did with the letters. And this CPA thinking cap is not just for CPAs either. Anyone can and should have at least one possibly multiple CPA thinking caps. Why? Because based on our scientific survey of five people, all of whom directly profit from the sale of these CPA thinking caps, wearing this CPA thinking cap without a doubt, according to the survey, increases accounting productivity tenfold. Yeah, at least. Yeah, apparently the hat actually channels like accounting energy from the quantum field ether directly into your head, allowing you to navigate spreadsheets faster. It's kind of like how in like the matrix when Neo learns kung fu, or at least that's what the scientific survey saying. So get one, because the scientific survey participants could really use some extra cash. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accounting instruction.thinkific.com, full tab. We're also uploading transcripts to one note so you can use the immersive reader tool, change the language if you so choose, either read or listen to the transcripts and multiple different languages using the time stamps to tie in to the video presentation. One note desktop version here data on the left hand side related to counting calories. We have the date on the left, the calorie count on the right. So for example, this first one, 312, 2016 calorie count, 2,990, 312, 16 calorie count, 1777, and then 313, 16 calorie count, 2480. Now we're going to treat this data on the left hand side as though it's the entire population of data so that we can run a few statistical analysis on it and then imagine that we're going to take samples from this data set so that we can then run statistical tests on the sample to see if that information is something that can tell us about the entire population. So similar type of strategy here that we have done when we looked at the heights of individual, but we're going to use a little bit different methods when we get to the sampling. And our goal is to think about the statistics involved as well as how we might use tools such as Excel to help us practice with these concepts. Also just realize that if you want to look up some of these data sets, Kaggle.com, K-A-G-G-L-E.com might be a place to look. Let's start off by taking the information from the entire data set. So this is the population data set. We can calculate the average or mean, which is going to be taking the entire sum of the number, adding up all of the numbers, and then dividing by the count, one, two, three, four, and so on. Or we can use the average function, which is average, and then we just sum up the data or average in this set of data. And that gives us the two, one, eight, nine. We might also take the median, which is picking the one in the middle. So if we listed this from top to bottom, lowest to highest or highest to lowest, and then pick the one in the middle, that would be the median, just like Rockies, the boxers, Coach told them to, when he sees three of them out there, hit the one in the middle, hit the one in the middle. The max is the highest one. So if we were to sort the column over here and pick the highest amount, that's the maximum. We don't even have to sort them though, because we have the formula of equals max to pick the max. And then the men is the lowest one. So we could sort by the lowest one to see the men, or I can simply use my men formula. So we had zero calories out. We were locked in a closet one day or something. I don't know. That's not sure that's exactly healthy. We're fasting. It's just one day, not a big deal. All right. So then if we were to take this data, and then put it into a histogram, so we just select this entire data set, make a histogram. Here's from the categories of 0 to 370, from 370 to 740, and so on. And it looks like kind of like the middle or biggest area where most of the results are folding in is between 1,850 and 2,220. Now calories is another one of those areas where you would kind of expect, because we tend to stay at a similar weight, a similar range between a few pounds, so that you would expect that our calories would also be within a pretty reasonable range. So this is another one of those areas where you would expect most days your calorie counts are pretty much in a range. And then it would look kind of bell shapes you would think that would be higher or lower on certain days. So we don't have as extensive a data set here as we had with the heights. And therefore we don't have as much detail that you might expect if we had a whole, whole lot of data. But we're going to assume this is basically our entire population. So then we're going to think about how we can create a sample of that population. So if I'm going to create a sample, what I want to do is take these numbers and in essence shuffle them up. I want to shuffle up those numbers. So once again, we're going to use the technique of using the random number generator. So the random number generator is this one just equals random. And if you just put a random number, it's going to use a decimal I've been added the decimal. So it's a pretty long decimal. So all of these randomly generated numbers should be unique. And therefore if we sort them, they will be sorted, you know, in a unit and they'll shuffle the sorting. So if we add the calories, so now I've added all of the calories and these random number generator to it, so that when I sort them, it will give me a random shuffle. And then if I just pick the first 10, for example, I will have picked kind of a random sample of the entire population, that's going to be the idea. Now I would like to do this multiple different times because I want to mirror the concept of running multiple tests, meaning taking a random sample multiple times. So I'm going to try to do it like 75 times. So what I want to do is make I'm going to make 12345 of these random generator tools, meaning I have all of the data in the entire population, and then a random generation column next to it. And then I can sort each of these, and that sorting will shuffle them each time. So I'm going to shuffle each of these each time, which gives me a whole set of the entire population randomly shuffled. And so if I want to then make 75 samples of however many, like 175 times, I can reshuffle these every five times. That's one way I could do it, right? So I could then go over here and say that that we're going to then copy and paste that the same. Well, let's do it this way. First, I can make it even a little bit easier. I can say that these are my samples now. So this is just going to say equals. And I'm pointing then to this cell. So that nine. So this whole column is simply pointing to, and this is 20, that just goes down to 20. So I'm just pointing to this cell, this cell, this cell, this cell, this cell. This is a formula referencing the table. And I just chose 20 out of however long the table goes. So now I've got a count of 20 that were randomly selected out of the entire population. And I have 20 different samples of I mean, I'm sorry, five different samples that are 20 long. Now what I want to have is 75 samples. So what I could do to make this whole table of 75 of them is that I can do five at a time, right? I can copy this this table. I can paste it here. When I paste it here, I'm going to paste it just the values, not the formula, just the values. And then I'll have five randomly generated samples of 20 in the sample. And then I can do it again, right? I can reshuffle these shuffle, shuffle, shuffle, shuffle, shuffle, shuffle, which reshuffles this whole set. And then I can simply copy the these again. And that would be another five going out to 10, right? And then I can do it again. It's kind of tedious. But I can this is a this is a way that you can start to play with larger kind of sets of tests in Excel to see the impact on things like what the histogram is going to look like and whatnot. As you increase the sample size, you can play with these different random number generating techniques, they can help you get a concept of what's going on. So I shuffle them all again. And then I copy these again. And then I paste it over here for the next for the next five and I do that at five at a time until I get to what I wanted, which I think was 75 75 samples, all of which randomly selected randomly selected 20, 20 items. So now we can so now we can imagine our test so so now we can imagine our tests and say, Well, these are the results. If I if I take the average of the sample, I'm going to get my results down here. So so in this case, we had our these each of these columns, the data in the column represents the calorie count for one for one sample. So and then if I take all of those and I take the average of them per column, here are the averages for these 75 samples that we took of 20. Then I might want to put this in a vertical format. So I might take all of these that I did 75 times, put them vertically, which we can do in Excel fairly quickly. So now I've put them in a vertical format. And then I've compared it to what the actual data was, which was 2189. So remember the actual data over here, when we did this on the full data set, we're imagining this is the entire population. It was at the 2189. Then we took 75 examples, 75 samples of 20 each, to see how close that data mirrors what's in the actual total, right? So now I've got the averages for each of those 75. And I can compare that to the actual middle number. And you can see some of them are higher. Some of them are lower. So that's kind of what we would expect. If I take an average of the averages, I come out to the 2210, which is pretty close to the 2189. And then of course, we can build our histograms. So you can start to play with the pictorial representations. Now this one is actually a histogram of the sample of data. So in this case, we took a histogram of this sample 75, where we took 20 of them. So this is plotting out those 75 calorie counts in the buckets of from 538 to 1155 and so on and so forth. The middle point is here at 1773, 22390. We had eight that landed in here. And again, we know the actual for the entire population is at the 2189. So that's for just one sample. And this one is just for one sample. This is 74. So in this case, we took sample 74. All of these numbers for 2020 tests. And it looks like this, obviously, they're not going to look exactly the same because these are two different sets of 20, which were randomly selected from the entire population. And then we took another one. This is this is number 73. So if we take each one of these samples that were randomly selected, we get different, you know, histograms. And if you take larger samples, then then you'll get, you know, different shapes of the histogram rather than taking 20, you can take as many as you want 105 thou, you can you know, you can test it out and excel and play with larger samples and see what the adjustments to these histograms would be. And then this one one is taking a histogram of the averages. So in this case, remember that this column is the average of all of the 75 samples of 20 that we took. And so if we take a histogram of the averages, then here's what that looks like. So so here's from 18979 to 2039. And again, we would expect the middle point here to be at the 2189 because the that's the actual amount, right? And so you see it's starting to look, you know, something more like that, right? And then it's and then it's tapering off and we've got it kind of skewed to the right for this histogram. And we can see that when I take the average again, of all the averages, you know, that middle point is at 2210. So this is just another tool, we do this in Excel as well, if you want to see how to do this in Excel, but you can start to play with these data sets and use your pictorial representations and get to fairly large data sets so you can get an idea of what happens when you're using small numbers versus larger numbers, and then you can and then you can actually build your histograms and charts and whatnot, based on the results, and you're going to get a better, intuitive fuel of what is actually happening in future presentations. Of course, we want we'll get into more specifics on how we can kind of describe some of these things mathematically. But it's quite useful to to play with this stuff and it and Excel, which allows you to do, you know, fairly large amounts of data to see what the impacts would be when you make say histograms, based on those results, and also it's great practice for using Excel.