 Statistics and Excel. Population variance and standard deviation. Got data? Let's get stuck into it with statistics and Excel. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay whatever because our merchandise is better than their stupid stuff anyways. Like our, trust me, I'm an accountant product line. Yeah, it's paramount that you let people know that you're an accountant because apparently we're among the only ones equipped with the number crunching skills to answer society's current deep complex and nuanced questions. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. You're not required to but if you have access to one note or in the icon left-hand side, one note presentation, 1432 population variance and standard deviation tab. We're also uploading transcripts so that you can go into the view tab, use the immersive reader tool, changing the language if you so choose being able to then read or listen to the transcripts in multiple different languages using the time stamps to tie in to the video presentations. One note desktop version here, data on the left-hand side remembering that in prior presentations, we've been thinking about how we can take our data set and summarize it, represent it in meaningful ways using both numerical representations as well as pictorial representations, numerical representations including standard statistics like the mean or the average, the median quartile, one quartile, three and so on. When we look at pictorial representations, we talked about the box and whiskers or box plot as well as the histogram. Now each of these tools have their uses. However, we want to now think more about the spread of the data around a center point. The histogram does give us an idea of that pictorially. However, we would like to also have more of a numerical representation of that. We've thought about last time an average deviation concept and it's useful once again to think about it intuitively. Let's just give a quick recap of that and then we'll move on to the more standard calculations which are going to be the variance and standard deviation. We have our simple data set which just has negative four, positive six, which adds up to zero. If we take our average calculation, then we're going to get to an average of just zero because if I add these up, they add to zero. If I divide by four, they still add up to zero. That's going to be our middle point, zero. Our average deviation, this is not the standard use formula for this calculation of the spread, but the intuitive calculation we talked about last time would be simply taking each data point and comparing it to the middle point. You can see why that would be useful because that gives us the difference from the middle point of each data point, which makes sense because now I can think about where does that data point lie in relation to the middle point or average. That's the top part of our equation for the average deviation. Then we said, well, if I sum these up though, I'm always going to get to zero, which means I don't really get a meaningful number if I sum up these differences. What I can do is take the absolute value. That means I'm not going to care if it's above or below. I just want the difference from the middle point, and then we get our absolute value numbers, and then I can sum them up to give me 20, and I can then take that 20 and divide by the number of items four, one, two, three, four of them, and I get to five. That's a simple way to give an average distance. You're taking an average of the distances from the middle point kind of average deviation. We have a slight twist. This is the formula for the average deviation. Instead of taking the absolute value because you'll remember this absolute value, the point is that I can't have these negative numbers because then it'll add up to zero. I have to make them positive, so we just take the absolute value, make sense, and then divide it by n. Instead of doing that, the standard deviation is going to take the square of x sub i minus mu, which is represented in the mean, and then divide by n, and then we'll have to take the square root. So look at the difference over here. If I then take, or if I look at my image, by the way, this is a histogram, so you can see basically you've got the data, the middle point being zero, and then the data on the left side, negative numbers, and the positive numbers, if I was to plot out just those four points like on a histogram. So we can now look at the variance and the standard deviation. Now, note that both of these are useful because sometimes the variance gives us information that's relevant rather than the standard deviation. The standard deviation is probably the first thing that comes to mind when you're thinking about this type of calculation to get an idea of the spread of the data around, say, a center point represented instead of by a histogram with a numerical number. But the standard deviation is useful too, and it's kind of like if you're getting, I mean, if you're getting to the standard deviation, you're going to do the variance, right? It's part of the steps. So notice that the variance is calculated with a sigma two squared, and the standard deviation will often be represented by just the sigma, and we'll see that in a second. So this is going to be the variance represented with the sigma squared, and we're going to basically do our sum item if we're going to represent this, and we're going to say i sub one to n. So we've got x sub i, which is going to represent each number in our dataset. We had four numbers minus mu, which represents the mean. And then instead of taking the absolute value of this, as we did last time, we're going to square it. Now, note what that does when I square it. That does what the absolute value did in that it removes the negative numbers, because if I square negative numbers, they will result in positive numbers. If I square positive numbers, also result in positive numbers. So that does the same thing, but it also gives us a problem where now everything is bigger. It's all been squared. And then I'm going to divide it by n, which is the count, which is similar to what we had with the average deviation. So if I look at the standard deviation, we're just going to take it one step further. This entire thing is right here. But now we're going to take the square root of the entire thing. So that's why the variance is often represented with a sigma squared, whereas the standard deviation, just the sigma. So everything under the square root is the same. And you can kind of think, well, yeah, I squared it now. So now I've got this large number. If I reverse the squaring of it, taking the square root of it in essence, then you'd get kind of a similar point. Now, most of the times when people look at this, they say, well, why would you square it? Well, because I'm getting rid of the negative numbers. And you could say, well, why you could have got rid of the negative numbers by just taking the absolute value, and then you wouldn't have to take the square root. Why take the square root? And then, I mean, why take the squaring of it? So it can't simply be, is what I'm trying to say, it can't simply be that the only reason to do that is to get rid of the negative numbers, although it does have that feature, because you would think that taking the square root would still be easier if you were going to do that. So we'll talk more about that in the future, but note the squaring does have that capacity of getting rid of the negative numbers, and then you take the square root, and then you would think kind of to a similar point, but it's not exactly the same over here. So you can kind of compare and contrast the intuitive, which you might do with the average deviation versus the variance and the standard deviation. Okay, so let's take a look at this then. So now we have our same data set, and if I did this in a table kind of format, I would compare each of those data sets x sub i minus the mean, which came out to zero. So I get our same numbers because in this case, the middle point happened to be zero, so the distance is always going to be the same number away from that middle point in this data set. And then the difference between what we did this time and last time is instead of taking the absolute value of these numbers, because I end up with that problem, they add up to zero, that doesn't help me, right? So what I want to do is instead of taking the absolute value, I'll square them. So if I square all of them, six squared is 36, right? So now I get up to a much larger number than when I just took, you know, six, and I made an absolute value of six, right? I just made them all positive. So six squared is 36, four squared is 16, four squared positive, four squared is 16, and positive six squared is 36. So all the negatives get removed in that process. But then when I sum these up, 36 plus 16 plus 16 plus 36, I come out to 104 versus over here, which I came up to simply 20. And so then I can say, okay, that's basically this bit x sub i minus mu squared. And so then I'm going to, I summed those up, and then I'm going to divide by the count or N. So now I'm going to divide by N, similar to what we did before, the count one, two, three, four of them represented here. And so 104 divided by four is going to give us the variance, which is represented by sigma squared 26. And then I can take the square root of that, taking the square root of 26, gets us to the 510. Now, I know that if you're doing this on a computer and you pull out your trusty calculator, you can change your calculator type to something like a scientific calculator. So you have some more of these calculation tools. So for example, if I took this negative six up top and I said the negative six, I'm going to say negative and then six. There it is. And then I've got my squared item here. So I could say squared is going to be 36, right? So that's, you can calculate that in your calculator. And then down here, we've got the 104, obviously 104 divided by four gives us our 26. And then I want to take the square root of the 26 and that is here. So we get to about 5.09 so on and so forth. So, okay, so notice that that number is similar, but not exactly the same as what we got over here. Usually we're going to get something a little bit larger if I was to do it using the average deviation calculation versus taking the standard deviation. We also get that pit stop along the road, along the way, which is the variance. That's why it's represented again as the sigma squared and then we take the square root to get down to the standard deviation. Now also note at this point in time, we're talking about the data as though it's the entire population. You have a slight difference to the formula when you're talking about a sample. So we'll talk about those differences more in future presentations if we have a sample versus the entire population. But Excel, you can also calculate these using Excel formulas. And this one is calculated using the Excel formula for the variance and standard deviation for the population. These two are using Excel formulas to calculate for the sample. So again, we'll talk about the sample more in a future presentation. Now note, obviously it's nice to be in Excel and just simply put the function in place to say give me the variance, give me the standard deviation and I can basically add these to my set of numbers. But it's also useful to kind of go through this table sometimes because then you actually do get more of a visual representation of the data set to some extent and you might get a better understanding of what is being said by these numbers down here. Also remember that again, these numbers will talk more about them in future presentations, but they can seem more abstract than when we talk about simply the mean or the average of a data set. And sometimes it's useful to compare multiple data sets and we'll talk more about that in future presentations. Now this is, so now I just want to touch back on the question of why we would use this variance and standard deviation, which seems more complex than this. And we saw again, most people will say, well, why do you square the data to get rid of the negatives? Well, why don't I just take the absolute value instead? That would be easier. Well, one reason mathematically that you can argue for using the more complex standard deviation rather than the average deviation is that if I was to pick any other middle point, it gives me a unique number. In other words, if I chose, for example, in our data set instead of to use the mean as the middle point, but I want to look at the distance from point number one. So I use one instead of the mean and then I do everything else the same, right? I take the difference from that point number one and I get my differences. Now these differences are not going to add up to zero anymore because I'm not looking at the differences from the middle point. I'm looking at the differences from just a point that I picked one. And then if I was to take the absolute value of those numbers, then I'd still come out to 20, right? I still come out to 20 and then if I do the rest, I take the 20 and I divide it by four. I still come out to five. So notice I don't have like a unique number here when I pick the mean as the middle point as opposed to when I pick some other number when I use the average deviation. So if I did the same thing using two as my as my number instead of the mean of zero, I used point number two. And I looked at the difference between every point in my data set and point number two, which I just chose randomly. Again, it won't add up to zero. But if I take the absolute value of them, I come out to 20 and I still get five. And then if I do it one more time just to hammer the point home, if I used three, then I still come out to 20 and I get five. Whereas if I did that same calculation using the standard deviation and variance, here's my numbers. I picked point number one instead of the mean. I get my same differences, but then I'm going to square them. And I come out to 108. That 108 is going to be different than the 104 I came out with when I use the mean as the middle point. And so that of course will result in a difference when I divide that 108 divided by four. You get a different number, which would be kind of representative of the variance except we use a different middle point. And then you'd get 520. So now the 520 is different than the 510 that we got to when we used the middle point. So that's one reason that you could say that you get a unique number. If I use point number two, then again the same thing applies. And if I add it up, I get up to 120. And then my end result comes out to 548, which is not the same number as we had when we used the middle point of zero. So that's just an argument because that comes up a lot when you're trying to kind of explain the standard deviation, why you would square it and take the square root. And so it gives you a unique value is another reason that you could say that could be useful, right?