 Statistics and Excel standard deviation and variance for a population comparing two data sets related to weight. Got data? Let's get stuck into it with statistics and Excel. You're not required to, but at first a word from our sponsor. Yeah, actually we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay whatever because our merchandise is better than their stupid stuff anyways. Like our crunching numbers is my cardio product line. Now, I'm not saying that subscribing to this channel, crunching numbers with us will make you thin, fit, and healthy or anything. However, it does seem like it worked for her, just saying. So, subscribe, hit the bell thing and buy some merchandise so you can make the world a better place by sharing your accounting instruction exercise routine. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. If you have access to one note, we're in the icon. Left hand side, one note presentation, 1467 standard deviation and variance for a population comparing two data sets of weight tab. We're also uploading transcripts so you can go to the view tab, immersive reader tool, change the language if you so choose. Be able to read the transcripts or listen to them in multiple different languages using the timestamps to tie in to the video presentation. One note desktop version here, data on the left hand side related to weight and prior presentations. We've been thinking about how to take our data sets, summarize them, represent them in ways so we can draw meaning from that data using both numerical calculations and pictorial representations. Numerical calculations including our standard statistics such as the mean or average, quartile one, the median, quartile three, and so on. And then pictorial representations including the box and whiskers or box plot as well as the histogram. Now we're concerned more with these presentations on the spread of the data, the histogram giving a very nice pictorial representation of that spread. The numerical representation we're working on then being the standard deviation and the variance. So here's our data on the left hand side. We're mainly focused on the weight information here. Also note that we have not included the entire data set because this was quite a long data set. So we're just giving a snippet of the data set to give an example of the process. Then we're going to do our standard calculations which of course in a very long data set gets quite tedious but using Excel quite easy calculating the mean or the average. We can just use the average formula adding up all the data, dividing by the number of data there are. The min, the smallest number in our data set at 7801. We've got quartile one, the middle point of the first quartile, which could be this formula in Excel, quartile.exe. But we have to have the second argument representing quartile one. We've got the first quartile. We've got the median which is the second quartile. We could use the same formula and put a two there, but it's easier to use the median which will pick up the middle point if we were to summarize the data from top to bottom and pick the one in the middle which again could be quite tedious if you have a very long data set as this one is or is in practice we snipped it here. Quartile number three, the middle of the third quartile and we have the same formula here except we're now picking quartile number three with the second argument. The maximum, the largest number in our data set at the 170.92. Then we have the standard deviation. This being our point of focus is at the 1166. It's as easy to get in Excel as just putting in the function equals stdev.p.p representing the population as opposed to .s the sample. We're imagining we have all the data for the entire population at this point. We'll talk about samples in future presentation. The variance equals var.p, similar kind of thing with the p versus the f's population versus the sample. And then we've got the standard deviation for the sample just as a comparison note. So you can see those two formulas down here. If it was a sample, you got the .s instead of the .p. Okay, if we look at our data, we can see we've got pretty smooth data because this is a large data set related to weight, which is kind of similar. I believe this came from the same area as we had with the height data where we had a very large data set. Because we're dealing with something that's in nature, you would expect then that we'd have this kind of distribution. If we have a very large sample, most of the items being in this center point, and then it basically tapers off to either side in a fairly uniform way. Then we've got the variance and the standard deviation in formula format. So even though we entered the function in Excel and got the answer, sometimes it's relevant or sometimes useful to do the calculations and look at the steps along the way if nothing else to get an idea of what the function is telling us. So the variance is going to be represented by sigma squared. And here's the formula for it. You can see that whole variance is part of the standard deviation, everything under the square root. So as we calculate the standard deviation, we're going to basically be doing the variance along the way. So if we do this in more of a manual method, but a way that we can kind of see what is happening, we can take each of these data points, which would be quite tedious because there's a long data set. But if you're in Excel, pretty easy to do. Take each of those data points and subtract it from the middle point, the mean, which was that 127.08 that we calculated before. This is the distance from each data point from the mean. Now, we've got this whole long data set. We snipped part of it, but if you added all of those up, there would be positive and negative numbers, and it would add up to zero. It doesn't here because we only snipped part of the data set. But remember that the characteristic here is if that's the middle point, and I take every piece of data minus the middle point, the fulcrum point, then I'm going to get the sum of the differences will be zero. Again, you can't see any of the negative numbers here because we just have part of the data set. But then we're going to square the data, so we're going to square all of it, which removes the negative numbers and squares it. And then if we did our calculation down below, we're summing this up. Remember, this is not all of the data set, but if I had all the data set it with sum to this, you could check it out in Excel if you would like to see the full data set and work with it yourself. Square difference, there it is, and then we're going to divide by the count. The full data set had 25,000 points in it. So if I counted all the rows of the full data set, 25,000, and that gives us a variance of 135.97, and then if I take the square root of that, we get to the 11.66. So there's our 11.66 that we calculated using the function over here, 11.66. But you'll note that doing it this way gives you kind of a pretty nice intuitive sense of what is going on with the data set. In this data set, what I have done is taken the same data set of weight, but then removed some of the numbers in the middle, resulting in two data sets that are very similar but different. And the reason this could be useful is because when we look at our statistics over here, note that when we look at the average or the men, quartile one, the median, quartile three, and the maximum, those give us a sense in and of themselves about the data set. But the standard deviation and particularly the variance oftentimes are more difficult for us to visualize in and of themselves and sometimes become more relevant if we're comparing two separate populations, right? So now if we had two different populations and we were to take these two numbers, then sometimes that's going to be a way for us to see where we might use those numbers like in practice such as like a variance type of number. So remember the general idea with the standard deviation is the smaller the standard deviation, the less of the spread that you would expect around that middle point, the fulcrum point, the average and the greater the number, the more of the spread. So in this case, I've removed a lot of the numbers like in the middle, the numbers that are closest to that middle point of the spread. So that's going to increase, you would think, right, the standard deviation and the variance. So let's check it out. So if I look at these two numbers and I compare the new data set versus the old data set, we come out to a pretty similar mean or average, 127.06 versus 127.08. The minimum number is the same. That's because I deleted the data in the middle so we still have the same bottom point. Cortile 1 is different but not substantially different. The median, the middle of the data set is 124.19 versus 127. Again, different but not hugely different. Cortile 3, 136.8, 134.89, pretty close still. The maximum is the same because I deleted numbers in the middle. Therefore, there's the same maximum, but the standard deviation is now higher. So now I've got a 1301 versus 1166, which gives us an idea of the spread, which we might not have gotten a difference in the spread between when I'm looking at just these numbers up top. These might not have given me a sense as much as possibly a standard deviation of possibly more spread in the data but the same with the variance, right? 135.97 versus 169.25, another kind of measure of the spread. And that should give us an indication that there's something possibly more going on than maybe I would have picked up with just the first numbers up top with regard to the spread of the data. And then if I looked at those with the sample, these are just a comparison with the sample. They also being higher if I use the sample calculation instead of the population. And here's the actual histogram so you can see what happened here. I deleted a lot of the data in the middle. So notice if I think about, well, what would have happened with my numbers? Well, you would think the average would still possibly be pretty close to what it was because now you've got these two sides that still kind of average out to something that's pretty close to the mean. And then notice the minimum and the maximum are the same. So minimum, minimum, maximum, maximum are the same because these outliers way to the end. I didn't delete any of those. So that kind of makes sense. And then if you look at the quartiles, they're pretty close. And you would think maybe the quartiles would be substantially different if I did this in a small data set. But because the data set was so large, then I didn't really impact a whole lot of the quartiles if you just picked the number in the middle of the first, second, and third quartile because even though I deleted a whole lot of data in the middle, we had a pretty significantly large data set. So you still end up with pretty close on the quartiles more than you might think. But the standard deviation kind of does give us an idea. So you might say, well, these all look similar, but then the standard deviation does give us an idea that the spread looks significantly different as does the variance. You're like, okay. And obviously that's represented here in the pictorial representation. Now just to wrap this up, if I did the same kind of calculation over here, I took my weights of my new data set where I deleted the numbers in the middle. I subtracted out the mean or middle point. This is the 127.06 now. Here's our differences. And the differences if I added them all up would add up to zero. I don't have all of them here because it was a very long data set, but it would still add up to zero with the new data set. And then if I square all of them, this is the squared amount. If I added up all the squared amounts, I would get to this, and then I would take that and divide it by the count, which now I only have $19,999 instead of I think it was 25,000 data sets before. So we removed a significant amount of the data, which is going to be a variance of 169.25, and square root of that would be the 1301, 169.25, and 1301 is what we got here, 1301, 169.25.