 Statistics and Excel. Standard deviation and variance for a population comparing two data sets related to weight. Get ready, taking a deep breath, holding it in for 10 seconds and looking forward to a smooth soothing Excel. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay, whatever, because our merchandise is better than their stupid stuff anyways. Like our crunching numbers is my cardio product line. Now, I'm not saying that subscribing to this channel, crunching numbers with us, will make you thin, fit, and healthy or anything. However, it does seem like it worked for her, just saying. So, yeah, subscribe, hit the bell thing and buy some merchandise so you can make the world a better place by sharing your accounting instruction exercise routine. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. Here we are in Excel. If you don't have access to this workbook, that's okay because we'll basically build this from a blank worksheet. But if you do have access, there's three tabs down below. Example, practice blank. Example, in essence, answer key. Practice tab, having pre-formatted cells so you can get right to the heart of the practice problem. Blank tab, blank worksheet with just our data in it so we can practice formatting the cells within Excel as we work through the practice problem. This is a long data set so it would be difficult to simply type in the data. But if you want to get practice data, one place you can go is Kaggle.com, K-A-G-G-L-D-L-E dot com, K-A-G-G-L-E dot com. Let's go to the example tab to get an idea of what we will be doing. We're going to start off in a similar fashion, putting a table around our data. We're dealing with data related to weight this time, which is kind of similar to the data we saw with heights. And that you would expect if you have a large data set that it would be distributed and kind of like a bell curve because we're dealing with something in nature in essence. So you would expect there would be some in the middle with most people's weights around here and for it to basically taper off and this type of pattern, which is what we see in part because we do have a fairly extensive data set. You can see this data set goes all the way down to 25,000. So we're going to work with a longer data set, do our similar process. That's one of the things we want to be practicing with. We'll do our calculations this way. We will also see our calculations for the variance and standard deviations using our table format. And then we will adjust our data set for the weights so that we can then have some comparisons between say the standard deviation of our adjusted data versus the original and we'll make our histogram will basically remove some of the data kind of in the middle and then compare basically the standard deviation and the variance with two data sets that are similar but different to data sets related to weight that are different, which is often what might be done in different scenarios in practice when you have a variance and standard deviation possibly comparing it to those calculations of related or similar data sets. Let's go to the blank tab and I'm going to remove the Kaggle thing here. Okay, so let's format the entire worksheet like we normally do. So I'm going to select the triangle, right click it on the data. We're going to format the cells and I'm going to go to the numbers, make them currency, negative numbers bracketed and red get rid of the dollar sign. This time we do have some decimals you'll see but we probably don't need four of them. So I'm going to break it down to just two decimals. I think that's going to be good enough for us. So that will include the two decimals. Notice that the index here now has two decimals. If I want to remove the two decimals on the index, I could select the index and then go to the home tab numbers and remove the decimals like so. Let's put a table in around our data. We'll put our cursor in the table here. We're going to go into the insert up top and I'm going to go to the tables and make a table around it. So we have the dancing ants working their magic and we're going to say, okay, so now we have our table. I'm going to make these headers rounded and make them white. I'm going to go to the home tab font group. Let's make them white alignment group. I'm going to wrap them and then center them. And then I'm going to go into the weights and make a space between these two so I can shorten up column B a little bit just to clean it up. And right now really we only really want the weight data. I don't really even need the index over here. So I could sort by the index if I want to or I could sort by the weight. We'll keep the index there because that helps us to sort by either one but we could sort by the index. We could sort by A to Z or from Z to A. Okay, let's make a skinny C column. We're going to make a skinny C and then we'll do our normal calculations on this very long data set. I'm making a larger D. We're going to say the first one is the mean or the average. And then we're going to say we've got the men. We got Q1 the median Q2 the max. And then we've got standard deviation for the population. I'll just say pop and then the variance for the pop. And then let's just do the standard deviation for the Samp and then the variance for the Samp. We're focused mainly on population calculations at this point but we'll do a comparison between the samples. Now we've done this before because we have it in a table although there's 25,000 entries. It's still quite easy to do these calculations because we're in Excel and we have it in a table. Let's make the whole, I'm going to make the whole thing bold, home tab, font group and bold. So we'll do this fairly quickly because we've seen them in the past. Equals the average tab and I could just select the entire data set because it's in a table with just the drop down even though again it's a very long data set. So there's the average. The men equals the M-I-N. I can hit shift nine. My cursor is already in the data set. So hitting the data set, there it is. Now if I want to see that men, I can sort from the smallest to highest and there's that 7801, quarter one or quartile one equals quartile tab. Then select our data. This one needs the second argument which we go to the second argument with a comma and we need a one for quartile one. Next is quartile two. This should be Q2 and this should be Q3 or the median. The median is more common, easier to implement because it only has one argument, median, and we'll pick that one up. And then this is going to be quartile three. So quartile selecting comma three, the second argument. The max equals the max and shift nine to get our argument and selecting the data. Enter if I want to see that max. Of course, I can simply sort this from z to a and see the max standard deviation for the population. So equals the standard for the pop S T D E V for the pop and selecting that data. Enter equals the variance for the population V a R dot P for the population. And then let's just do the same for the sample just to see the difference. Standard deviation for the sample S. And equals the population for I'm sorry, the variance V a R for the sample S. So there's our stats that we can if we get obviously as you can see we get good at that we can run these stats fairly quickly. Now in this case, the population and the standard and the sample are pretty close, right? So let's put our table around this. Let's put our blue borders around it. Home tab font group. I'm going to make it the bordered. And then if you don't have that blue right there, you can find that in the more colors so you can use a different color. But that's the color I use the color I use is the baby blues. Okay, and then let's add let's add a histogram. So I'm going to select my entire data set. Insert charts histogram. And it gives us a nice bell shaped histogram. We have a whole lot of data related to a natural type of thing, not height this time, but weight. But we have a similar distribution. This is comes from the same kind of source. I believe as the height data that we had where we had a lot of data, right? So it gives us this nice, this nice shape when we when we plot it out for a histogram. All right, now let's take that same data and apply our our variance and and standard deviation in a table type format, which is still fairly easy to do even though we have a whole lot of data because Excel. So let's do the variance calculation. Here's the formula for it. I'm just going to copy and paste it here. And let's put bring this down to like 12. And so there we have it now. I won't type that in, but you can type it if you want insert equation. I would use the ink thing so you can type in your equation and it kind of like does it for you, which is super duper. It's super duper. Let's hit the let's make this orange and I'll make this a header. Home tab font group making this black and white on the header. And if I had the standard deviation for the population standard deviation deviation. This is a standard deviation. I deviate but only in a standard way, which is okay. So let's go ahead and make that orange popular make this home tab font black and white. Okay, so as we can see, the variance is basically what's under here for the standard deviation. And then we're going to take the square root of it. So as we calculate the standard deviation, we will hit the variance basically as a pit stop along the way. And sometimes the variance can be a useful tool as well. So it's no it's it's useful to note it out, not just as a pit stop, right? That's it's it's could be a useful tool. So let's make a skinny T column here. Skinny T. Mr. T's skinny. It's like Mr. T got skinny. And then we're going to say that we have let's go with column B right clicking and copy column B. And then we're going to put that in column you right click and paste. And so let's put a table around this now because I didn't put a table around it by the way, because I just selected part of the table. So now I'm going to insert a table because I don't need the index insert table. Make a table around it dancing ants table implemented. Let's make this the you needs to be skinny again. The you took on a lot of pounds right there got wide. But then we just but then it was easy to take it off again for so because it's really good at the flash diet. So the mean calculation equals the average. So we're going to compare we're doing the top part here comparing each line item to the mu. The mean or average so I can just take the average of the whole thing. And although there's 25,000 line items, it's it's able to do it's able to do this. So then we're going to take the difference and we'll say the difference of each line item is going to be equal to. Notice my computer has to think a little bit because there's a lot of data here. So compared to what we've been doing before at least. And so there it is. And so now we have the difference of everything from the mean. So if I go all the way down here, we have a positive difference. But if I go all the way down to the bottom, we're going to have negative differences and the sum of the differences would add up to zero. But we want to make them positive not with an absolute value, but by squaring them squared. And so I'll say boom. And so we square by saying equals that cell, the difference carrot, which is to the power of to the power of two. And then there it is. So now let's go to the bottom of the table and we'll do our totals and stuff. So down here, I'm going to be in the table, table design. Let's add a total and then within the total, it's kind of thinking. Give it a second, give it a second. We're going to we're going to say on these ones, let's take the count. I'm going to count these. Let's sum these up. Now notice there's 25,000, right? That's a that's a lot of data compared to what we've been doing before. Let's go ahead and some of these that should still come out to zero because the middle point is the mean and we took the difference from every middle point. And then this is the sum of all the squared items, which takes away the negative numbers, but squares them and gets to this quite large number. All right, so now let's calculate our variance. So I'm going to take this is the squared difference from the mean, which is what we got to here. And then we're going to divide that by the count, which is represented by N in our formula, which we calculated up top at the 225,000. And then the variance, which I could represent insert symbol, Greek and Coptic, Sigma, insert, close, enter going back into it. Type in a two, selecting the two, right clicking on it, formatting it and subscript for the cool format of the two square or the Sigma squared. This divided by the count gives us our our variance underline here, home tab, font group underline. And then we're going to get to our standard deviation represented by insert symbol, Sigma without the squared close, which is equal to the square root of that. And so we get the 1166 that 1166 and the 135 97 should be if I go all the way over here and all the way back up 1166 135 97. Right. So now, so there we have it. So so now we've seen that a couple of different ways. Now let's let's kind of change our data a bit. Let's go down here and make this blue and bordered. So I'm going to take I'm going to take my same data set and then and then adjust it a little bit. And and then so we can make a comparison between the standard deviation of two data sets related to weight, right? So let's imagine we have our standard data set. I'm going to copy column B again, right click and copy it. And then I'm going to go back on over to column Z, right click and paste. Then I'm going to insert my table, putting my cursor somewhere in the table, insert table. And I'm just going to make another table around it and OK. It made it fat and we're going to skinify it again. We got to take the weight down and then let's sort it this time from A to Z. And then I'm going to remove some stuff kind of in the middle of the table. So I'm going to start at the 10,000. I'm just going to look at the numbers on the left. I'm going to scroll down and try to get to around 10,000 in our data, which is somewhere in the middle because there's 25,000 data points. Once I get somewhat close, I can hit the page down might be the easiest thing to do. And then we have let's go from this 10,000. I'm just randomly picking a number in the middle. And I'm going to delete from 10,000 down to let's say 15,000. So I'm looking at the left hand side and I'm looking to like it all the way down to 15,000, which could take a little bit of time here because we have a fairly long data set. But we're close there. So let's go down to 15. We're almost there right there. So it doesn't have to be exact, which is because the idea is we're just going to remove some of this data, a good chunk of it in the middle, right click and delete the data. And we're going to say remove the rows. So now the data, if I put my cursor into the table and say control shift down arrow, we only have 20,000 data as opposed to around according to this as opposed to the 25,000. So now if I go up top and we do our same calculations, I'm going to make a skinny a double a skinny and then go into the left. It's a skinny double a and then we'll select our our stuff here. And then actually, let's just select the names. I'm just going to pick up the names and do our calculations quickly again with our modified data set. So the idea being that we have a similar data set with similar related data, but different. So I'm not going to say control V or paste it, make a larger a B and then we'll do our same calculations. I'll do them fairly quickly because we've seen them before equals the average average. Maybe not too quick because I can't do it average tab selecting the data and enter equals the men. And selecting the data and enter equals q quartile one picking up the data comma second argument number one for quartile one quartile two or median median is easier to type in tap picking up the data and enter equals quartile three picking up the data need another argument comma three for quartile three enter equals the max shift nine picking up the data and enter equals the standard STD EV for the P and equals the data. And this one is going to equal the variance the VAR dot P for the data. And then let's just do the same thing for the sample standard deviation STD V for dot s for the sample. And then equals STD dot EV dot s for the variance. Alright, so there's our numbers. Let's put some blue borders around this font group blue and bordered. Now the point just want to get to here which will dive into a little bit more in future a lot more in future presentations but notice that the mean all these numbers to mean the men quartile one, two, three max give us some indication about the data basically in and of themselves. But sometimes it's a little bit more difficult to think about the data related to the standard deviation and the variance. Unless we're comparing like similar data sets right so be so so so we might have two data sets related to similar items to different populations of data related to weight and then the variance is going to become and the standard deviation might be more relevant or helpful in those cases right so if I look at these numbers. I have a larger variance here and you can see that if I was to go to the insert tab up top and charts and make my histogram and hold on a sec I didn't select the data let's select the data and then insert charts histogram. And so now we have this histogram and you can see what we did I just basically deleted the data out of the middle of it. So because we have the data on the sides you would you would expect the average the mean might some be somewhat still similar right because we got the 12706 and we got the 12708 and then we can look at each of these points but because of the dispersion of the data if I compare say the standard deviation and the variance standard deviation is 13 variance 169. So standard deviation over here is 11 and the variance is the 135 so you would expect then because there's greater spread around the mean that the average is still going to be somewhere in the middle but you have more data that's basically off to the sides in our second data set and that's what these terms are typically trying to give us they're trying to give us. A representation of the array of the data sets around the middle point around the mean kind of how spread out the data is in one number is the kind of the idea. All right, so then we can we can of course do the same thing and calculate these numbers using our our second approach. I'm going to just do this fairly quickly I'll select the entire Z column. Last time we'll do this for this section I believe and then I'll paste this over here and column a n and I'll compare this to the mean and then we're going to say the mean and this equals the average selecting the entire thing. Okay, hold on a second. I messed going too fast. Hold your horse. Hold your horse. Average. Okay, and then the difference. I don't ride a horse man I've got. Like a motorcycle now. Doesn't need to be held. My horse has no feelings it's a motorcycle. Okay, then this is going to be the squared squared, which will be equal to which will be equal to this carrot shift six, the power of two. So now we've done our calculations. Let's pick up the bottom part of this one. I'm just going to take the same data to calculate the variance. Let's go control shift down arrow to get down to the bottom. And I'm just going to pick up this whole calculation control C or copy it and bring it on over to our new space. I'm going to I'm going to go all the way back up to the top so I can see what's going on. And let's go over here. I'm in my data control shift down arrow to get to the bottom of it. And then I'm just going to paste down here right click and paste this information and then I'll just enter the data. The squared information here was this the count is now equal to we don't have a count. Hold on a second. Let's pull this down. I'm going to select this data, pull it down. I'm going to add my totals, which is table, table design totals. And then let's add the count here counting it. Sum it up. It'll still add up to zero here just for our double check. And this is the sum of the squared items widening the squared area. I'm going to pull it back up a bit. Let's take this and pull it back up a bit. And now this equals that. And the count is equal to this. And it divided it out properly and took the square root properly. So there's the 16925 and the 1301, which should tie out to what we did over here, 1301, 16925. So there, so, so there it is. So we've practiced, you know, these calculations with multiple, multiple data sets.