 Statistics and Excel. Average deviation, standard deviation, and variance for population with salary data. Got data? Let's get stuck into it with statistics and Excel. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay whatever, because our merchandise is better than their stupid stuff anyways. Like our Accounting Rocks product line. If you're not crunching cords using Excel, you're doing it wrong. A must-have product, because the fact as everyone knows of accounting being one of the highest forms of artistic expression means accountants have a requirement, the obligation, a duty to share the tools necessary to properly channel the creative muse. And the muse, she rarely speaks more clearly than through the beautiful symmetry of spreadsheets. So get the shirt, because the creative muse, she could use a new pair of shoes. If you would like a commercial-free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. You're not required to, but if you have access to OneNote, we're in the icon left-hand side. OneNote presentation, 1442, average deviation, standard deviation, and variance for population with salary data tab. We're also uploading transcripts to OneNote so that you can go to the View tab, Immersive Reader Tool, change the language if you so choose, be able to read or listen to the transcript in multiple languages using the timestamps to tie in to the video presentations. OneNote desktop version here, data on the left-hand side, imagining it to be salary or income information for a corporation or business. In prior presentations, we've been focusing in on the standard deviation as well as the variance as the new concepts we're focusing in on. In addition to some of the prior standard statistical concepts we've been working on, including things like the mean or average, the men, quartile one, quartile three, and so on and so forth. Remembering that the variance and the standard deviation have to do with an attempt to get a more numerical representation of, say, the spread of the data, how the data is being centered and populated around the middle point, average, or mean. So here's our data on the left-hand side. If we do some of our standard statistics calculations starting out with the mean or the average, that's the one that just takes all of the numbers, adds them up, and then divides by the count. We have a formula in Excel, which is simply the average, gives us the 71, 498. If we take the minimum number, that if we line up all our data from smallest to largest, or largest to smallest, is the smallest number. Excel can give us that quite quickly with an equal min function. If we take quartile one, basically the middle point of the first quartile of the data, similar to, like the median, we get the 69, 8. Excel can calculate that with a quartile function. However, we need a separate argument to give us number one to represent quartile number one. And then we can calculate the median, which is also quartile two. It's easier to use the function of the median calculation, which is where we sort the data from small to high, and we pick the one in the middle. And then we've got quartile three, which is the similar concept. But for the last quartile, and we can do that in Excel with the sub-calculation of three, or the second argument, I should say, of three. And then we have the maximum, the largest number in the data set, which we can see here is 84, because it's sorted from top to bottom. So these are some of our standard stats, great stats, but they don't really give us a great representation of the spread of the data around, say, the center point. Now, if we take the average can also be calculated as taking the sum of all of our data, which we could do with a trustee sum function, dividing by the number of items, one, two, three, four, and so on, which were represented by N, which we can use the count function in Excel instead of counting them manually. And that would give us that 71, 498 as well. Now we're thinking of, if that's the mean or middle point, we're kind of thinking about the standard deviation and the variance giving us a concept of the population of the data and how close it is to the center point. Now, if we look at the data, this is going to be a histogram of the data, which does give us a pictorial representation of basically how close and things are to the center point. So remember that if I'm taking the 71, 498 as the center point and I look at my histogram, 71 is somewhere around here. If I think of that as the focal point of a teeter totter, then you can get an idea of the spread of the data around that point, although it's a pictorial representation. And we want to get like a numerical representation of basically the standard deviation kind of like the average differences from the middle point. And so we could say, all right, so that we talked about that we first talked about that in an average deviation calculation, right? So I can say that's the first concept that we thought about. So let's let's do it with an average deviation and then we'll move to the standard deviation. So remember with this more complex data set, more numbers than the last example that we had, we can say, okay, well, that I would like to get a number that gives me an idea of the distance from that center point. How can I do that? Well, I can take each of the data points minus the middle point or mean represented by the mu and then take the absolute value because I'm going to end up with negative numbers and divide by n. Let's see what that would look like in a table format. If I put this into a table, I can say, okay, let's do it this way. Here's my data set. Here's my salaries. I'm going to compare that then to the middle point or mean. I'm going to compare each one of them to that 71 498, which we calculated with the average calculation over here. And then I'm going to get each data points distance from that middle point. So this one is 12502. I'm just subtracting 87 84000 minus the 74 498 to get the 12509. Some of those distances will be will end up in a positive because I took 84000 minus the 70, but some of them will be negative because because it depends on, you know, if it's over or under to the middle point. So that means if I if I sum up this column, those differences, I come out to zero. I'm always going to come out to zero. So although this is a useful tool to tell me each individual data point and how close it is to the middle point, it doesn't really help me to sum them up. So you might think first, I'll just take the absolute value maybe and that will help me to to then get a positive number. So now all those negative numbers, I in essence simply make them positive. This number now representing the distance from the middle point as opposed to a positive or negative distance above or below the middle point. So then if I add these up, then I get to 97,090 and then I can take that and divide by the count, which is represented by the 51 here. I could just sum up or I could just count all of these numbers, all of the rows and that'll give that'll give me the 190373. So this is the more intuitive calculation. Now in practice, we don't typically use the average deviation because we're going to use the standard deviation. But this gives us kind of the first thing that we would kind of think of if we were trying to work this out. So then let's look at the variance and the standard deviation. These two calculations being related where I might say, okay, now I'm going to take each number in our data minus the average again. But instead of taking the absolute value, I'm going to square it, which also has the character, the benefit of removing the negative numbers. But it also makes a large number because I've squared them all and then I'm going to divide by n. And so that gives us the variance represented by sigma squared and then I can take the square root to get the standard deviation. So and notice that these two calculations can be represented in Excel, right? Because we could do a function for them, but it's still useful sometimes to do the calculations basically manually. So you'd have the same starting point. I take all of my data in the data set. I'm going to subtract them or compare them to the middle point, the mean. And then I get my distances from the middle point, positive and negative. If I sum them up by add up to zero, just the same thing. But now I'm going to square those items. So you get, you end up with some larger numbers. Now, if you did this with your calculator, you might want a scientific calculator to do this, switching over to the scientific calculator. If I take this, for example, one, two, five, oh, two, and I square it, that's getting us to the 156. Now, it doesn't come out exactly because these are rounded numbers in the first place. So this number is a little bit different because I have decimals involved, but that's the idea. We'll do it in Excel if you want to check it out there. And if I took, the point is if I took a negative number as well, like a negative 198, and then I took that and I squared it. So that's coming out to 39 about because again, there's rounding involved. So now I can add all those up. And if I sum up all of this data, I get to a much larger number, but at least it's positive. And if I take that number and divide by the count, this is how many rows there are. So if I take this, and again, it's rounded, but if I take this 385949804 divided by the count of 51, we get to this 7567643 and then I take the square root of that. So this is the variance, which could be useful. And so then I'm going to take that and take the square root. And so that gives me the 2750. And notice that number is higher than the 190373. We did with a more intuitive kind of method. Now, of course, there's formulas for this in Excel. Standard deviation formula in Excel would be the stdev.p. And this is for the population, this is the variance formula. This is the standard deviation for the population, not a sample. And then if I want the variance, I can use the var.p. Sometimes, again, it's useful to actually do the table with it because that gives you a better understanding of what's actually happening and sometimes just scrolling through the numbers can give you another pictorial kind of representation of the data as well. So you can add these two calculations to our standard data set and calculate it with an Excel. Now, later, we'll talk about the calculations for a sample, which is a similar formula in Excel. The standard deviation for a sample versus the population would be equal stdev.s for the sample. And so you get a different number. We'll talk more about that later. And then the variance, which is var.s for the sample versus p for the population. So you can calculate those in Excel. So we'll get more deep into the standard deviation and the variance in future presentations. But just note that it does seem like a more abstract number. Remember, if I get to that, the variance of 7567643, you're like, well, what does that mean? Whereas if I get the average, I know what that means, right, and so on. But if you were to compare this number for this set of data for this corporation, this whole population of data for this corporation versus another corporation, for example, then the comparisons of the numbers relative could give you meaning as well. And again, we'll talk more about standard deviation and the use of them in future presentations. But notice it is a standard building block that we have to understand and calculate, even though when I compare it to the use of something like the mean or the quartile one and those kind of things, I think these are more self-explanatory and intuitive. And the concept of the variance and the standard deviation takes a little bit more thought to really comprehend what it is telling you and how you might use that in practice.