 Statistics and Excel, Standard Deviation and Variance for a Population, Calories Data. Get ready, taking a deep breath, holding it in for 10 seconds, looking forward to a smooth, soothing Excel. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one, because apparently the merchandisers, they don't want to be seen with us. But, but that's okay, whatever, because our merchandise is better than their stupid stuff anyways. Like our, trust me, I'm an accountant product line. Yeah, it's paramount that you let people know that you're an accountant, because apparently we're among the only ones equipped with the number crunching skills to answer society's current deep, complex and nuanced questions. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. Here we are in Excel. If you don't have access to this workbook, that's okay, because we'll basically build this from a blank worksheet. But if you do have access, three tabs down below, Example, Practice, Blank, Example, In Essence, Answer, Key, Practice tab, having pre-formatted cells, so you could get right to the heart of the practice problem. Blank tab, blank worksheet, but with just our data in it, so we can practice formatting the cells within Excel as we work through the practice problem. If you don't have access to this data, it's a pretty long data set, so it would be difficult to simply type it in there. You can look at online resources for sample data sets, such as Kaggle.com, K-A-G-G-L-E.com. Let's go to our example tab to get an idea of what we are doing. We're going to be looking at calories data, doing a similar kind of process that we have done in prior presentations, just working with different data sets now, and that we'll be doing our statistical calculations using mainly Excel functions. Then we'll do our histogram of the data, and then we'll break out in more detail, focusing in on the standard deviation and the variance of this data set. So let's go to the Blank tab to the right. I'm going to remove the Kaggle. I'm going to format the entire sheet like we do pretty much every time, putting my cursor on the triangle, right-clicking on those selected cells and formatting the cells. So let's go to the currency, then negative numbers. We're going to make them red and bracketed. I don't want any dollar sign. We don't need any decimals. Let's remove those as well and okay. So let's embold the whole thing, home tab, or I will. You don't have to. Fonts group, bold it. I think that might make it easier for the screencast. Holding Control, scrolling in a bit so we can see a little bit more detail. I'm currently at the 265 on the zoom-in. Let's put a table. Now note, by the way, that when I formatted the entire worksheet, it messed up the date field. So the date field is still there, but now I'm just going to reformat the date field to be a date. So I'm going to select column A and then go to the home tab, numbers group, dropping it down on the numbers. We want the short date and that should convert it back. Hold on. That's normal date. I want the short date and I'm going to make this a little bit larger. There it is. So let's hold Control and scroll down just a bit. So now I'm currently at 220 on the zoom-in. Now I'm going to put my cursor in the data and we're going to go to the insert tab up top, the tables, and then put a table around this data, the dancing ants doing their magic dance around the table, creating a table from it, allowing us then to sort the data by date, which is what it's currently sorted in or if we wanted to by the calorie count, which we can go from lowest to highest. So we have the lowest calories here on these dates where we didn't have any calories. We were like starved to death or just we didn't want to get out of bed those days or something. I don't know. And then we have the highest calorie counts from Z to A. Taking that data, let's make a skinny C column. That's what happens when you don't eat the calories. You get skinny and C. So you see what happens? You get skinny. So in any case, then let's do our normal calculations here. So we're going to have the mean or average calculation and that's going to give us the truth about the calorie count for us, even if it's mean, even if it's mean to do it, it's the facts, just the facts here. So then we're going to take the, let's take the minimum. Let's take the Q1. Let's take the median, median, which you could be called Q2. Let's take Q3. Let's take the max and then we'll do the standard deviation, which is going to be for the population, we're going to say, and that's represented by a sigma. I won't put the sigma in here, but then we've got the variance for the pop population, not your dad, but for a population versus a sample because we're working with population stats for the most part at this time. All right, let's do it. Let's do our calculations. I'm going to do this quickly because we've seen it in the past. We're just doing it with this data set, which is a fairly long data set here. We'll do a little bit longer data set next time, but you can see we have a pretty large sample and we can do our same calculations and the functions obviously quite helpful to get to the calculations quickly, but it still can be useful to break out in like a table format what is actually happening because that's another way to sort the data, another viewpoint at the data that could give us more insight. So let's say this is going to be equal to the average and I'm just going to say tab now to pick up the formula. So there it is and then I'm just going to put my dropdown arrow on the data, the dancing ants around the data and enter, so there's our average. I'm going to do these fairly quick equals the min, the smallest number, tab to get the function. My cursor is already in the section for the arrow dropdown and there it is. By the way, if I want to look at the min, I can sort from lowest to highest and there's those zero days, those zero days when we had no calories at all. I was starving, but whatever. And then it's all for the cause, man. Core tile, let's do the core tiles and we'll say this one is going to be the core tile. This one needs another argument, a comma, and we'll put a one next to it. Core tile one and then this is the median, which could be core tile two formula or the median for function tab, selecting the data. There's the median and then equals core tile number three, selecting the data. This meeting another argument there for a comma and core tile number three, the three next to it. And then the max when we maxed out on the calories. So the max selecting the, oh, hold on a sec, something happened. This is the max and I got to hit shift nine. I could do that to get my function ready to roll and then selecting my data. So that's when we ate a whole bunch of calories apparently. I don't know. I was just drinking, drinking bacon fat or something. Anyway, then we have the standard deviation on this one equals the standard and we're looking at the population data at this point. Let's take the standard deviation for the population P and select that. And then this will be equal to the stand the variance for the population. And we'll pick up that. Let's just do the same thing for the sample. So just to pick those up. If it were a sample standard deviation, Sam and variance, variance for the Sam. And this is going to be equal to the standard deviation for a sample. And this will be equal to the standard deviation for a I'm sorry, this will be equal to the variance. Variants for a sample VAR dot s. Okay, so there's all the stats that I get them right this time. These are the population population. This is for a sample versus a sample. If I look at this one, by the way, home tab number group and I add some decimals just to see some a little bit of a differentiation between the two calculations. All right, let's put some blue borders around this. I'm going to select our data and we'll go to the insert tab font group hit the bucket drop down. If you don't have that blue, it's in the more color in the wheel standard side and hitting that wheel and okay. And then we'll go to the font drop down, put some borders around this whole thing. Let's let's put a histogram in it now. So let's take our data, the calorie data, insert tab up top charts and hit the drop down, but not too hard. Don't break it and then go to the histogram. So there we have it. So there's our data. It's fairly fairly centered data a little bit, you know, skewed to the right here, but there's our pictorial format of the data. And so now let's do our standard deviation calculation using and our variance using like a table format. So we get an idea of what's actually happening with those because they're a bit more complex calculations. So we've got the variance there. Oh, hold on a sec. Variant. Now I hit the cap lock variance and I'm going to copy my formula over here. So there's the formula. I'll bring it down to like 12 on the font to 12 font. Now you can type in that formula. Remember with the insert tab, hit an equation and then I would use the ink thing so that you can actually kind of type it in here or hand write it in here, not type it in here. That's the point. And it'll start to do your formula up top. So I'm going to make this one orange as well. And we'll make this black and white up top for the header black and whites. Okay, and then we're going to say this is going to be the standard deviation form you lie and we'll put that in place making that 12 on the font as well. Orange in it, making it orange. That is that's what we do. That's what orange in it means when we orange it. And so there it is. Okay, so we can see that these two are related, of course, because this this whole bit here for the variance sigma squared is the same thing as what's under here on the standard deviation. And then we're going to take the square root of it to get the standard deviation. So let's do that calculation on more of a manual method, which although there's a whole lot of data points here is still pretty easy to do. So let's just copy this whole thing. I'm going to copy from column A to column B, select in those two columns, right clicking and copy. And let's put them on over here in column R, R and S are so we're going to say control V or just paste it. Let's make a skinny queue column, skinny queue. And then we'll just do our standard kind of thing here. So here's all of our individual points represented by X, X, I. So we're going to go from one all the way up to however many of them there are end of them that is. So we're going to say this is going to be the mean. So we're going to compare all of them to the mean. And if you're if you're over the mean, we're going to be mean about it and say that you're getting fat because you're over because you have you have more calories than the mean point which. Okay, that's not nice. So this is going to be equal to the average. Let's take the average of this. And then so there's the middle point, the mean of our data set. And then we're going to take the difference differ difference. And this is going to be equal to the calories that day versus the middle point the average and we're over on these days. But of course we're under on the days way down here because that is by definition the middle point and you would assume that we would be hovering somewhere around the middle point. Otherwise we would get amazingly large or amazingly skinny. You would think if we were on one side of the other for an extended period of time. So anyways, then we're going to take that squared. We're going to square that. So now we've done this bit and we're going to square them and then we will sum them up getting the numerator. So this equals the data points squared is a shift six the carrot to the power of two to the power of two. And there we have it. It's not quite as powerful as grayscale to the power of grayscale, whatever that means but still it's pretty effective. And then down below, let's put a total column and let's total this stuff up. I'm in my table. So I've got my table design up top in the table style options. Let's give ourselves a total column and then over here on the calories. Let's go ahead and just take the average. I'm going to recalculate the average just because we can. And then here let's do a count so I can count them meaning one, two, three, four, five on the line items 457 line items, which is a pretty fair amount of data but so easy to do and work with in Excel because of the functionality that Excel provides us. If we sum it up, it should still add up to zero because we're taking the difference of every data point from the mean, the mean being in essence that middle point. And then let's make this column a little bit wider so I can see what the number is. It's a huge number because we squared everything which got rid of the negative numbers but still is now it's all squared. So let's take that and complete our variance calculation and the standard deviation. So the squared difference from the mean is basically what we have here or the numerator in essence of our formula for the variance formula. Again if we if we divide that this thing so I can put a divide by the count which is n in our formula, which we calculated here the number of line items 457 equals the 457. Let's put an underline under that font group underline and then let's take the variance variance. And just so we know the symbol is a sigma squared oftentimes represented as let's go to the insert tab, go into our Greek lettering so we can be cool with the Greek stuff and Greek and Coptic. And so then we have I'm just I have it in my favorites down here but it's right there too. And then insert and then okay and then I like to hit enter and then go back into it then put a two then hold down shift and select the two or you could just select just the two right click and then format the cell and make it a subscript so I can get that squared notation looking nice like that. Then we'll do the division problem. This equals this number divided by this number the square difference of the mean divided by the number the count gives us the variance. Then we want the standard deviation standard deviation of the population population population of data. And we're going to say that this is going to be then the letter then would be sigma. Let's go to the insert tab symbols and add a sigma because that's the cool thing to do if you add those little symbols. People really think you know what you're talking about tell you what it's all you need to do and people are people will will say that your stuff is good man. So this is going to be the square root square root of that. And so there's the 815 if we add a couple of decimals home tab number decimal lies in it with a couple of decimals. There we have it. That's not really a word. Some people get mad that I use it. But I like to I think it should be a word and it will be at some point due to our use of it. So font group if I hit the drop down here let's make this blue and bordered. So there we have it. So this just another kind of example of us getting our calculation with a fairly long and different data set that than the salaries. And remember that which a lot of times what you would be doing with different data sets is basically be able to compare you know the variance and the standard deviation of this data set possibly to related data sets. You know if this was one population versus another population and those will give you some ideas about the spread some concepts that we'll get into in more detail in future presentations. But that's the 814.59 which we calculated over here as well the 815.59 the 814.59 and this was the for the sample for the sample data.