 Statistics and Excel. Average Deviation. Got data? Let's get into it. With statistics and Excel. First, a word from our sponsor. Yeah, actually we're sponsoring ourselves on this one, because apparently the merchandisers, they don't want to be seen with us. But that's okay, whatever. Because our merchandise is better than their stupid stuff anyways. Like our, trust me, I'm an accountant product line. Yeah, it's paramount that you let people know that you're an accountant. Because apparently we're among the only ones equipped with the number crunching skills to answer society's current deep complex and nuanced questions. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. You're not required to, but if you have access to OneNote, we're in the icon left-hand side, OneNote Presentation, 1428 Average Deviation Tab. We're also uploading transcripts to OneNote so you can go into the View tab, Immersive Reader Tool, change the language if you so choose, and either read or listen to the transcripts in multiple languages using the timestamps to tie in to the video presentation. OneNote desktop version here, data on the left-hand side. In prior presentations, we've been thinking about how we can take different datasets, summarize the data, representing the data in meaningful ways, using both numerical representation as well as pictorial representations, numerical representations, including our standard statistics. For example, the mean or average, quartile one, quartile two, or the median, quartile three, and so on and so forth. Pictorial representations including the box and whiskers or box plot and the histogram. So now we're really focused more in on how we can represent the spread of the data, say around a middle point, like the mean or the average. And of the data that we have looked at, we've seen that the histogram gives us kind of a pictorial representation of that. But when we look at the numerical representations, like the median and the average and so on, and even when we looked at the box and whiskers as we saw in a prior presentation, sometimes they don't give us as much information. So we would like to get a numerical kind of calculation of this kind of spread. The standard use or the standard number would be the standard deviation and the variance. Now we're going to get to those concepts in future presentations on how to calculate the standard deviation and variance, but it's useful first to kind of intuit what you would basically do if you're trying to come up with basically a number to get an idea of where the data points kind of lie around, say, a middle point, like the mean or the average. And then we'll get into a slightly more, a little bit of a twist that'll take us to a little bit of a twist, which will then get us to the standard use numbers, which are the standard deviation and the average, and then we'll dive into them in more detail. So we're going to start off with a very basic data set. We just got negative six, we've got four, we've got positive four and positive six. Now, obviously, these sum up to zero, the reason we're using these numbers is just to note that it's possible to have a data set with negative numbers, you know, and positive numbers within it as we do as we work through our calculations. First, we'll look at our mean or average calculation, which comes out to zero. So I could do that with a formula in Excel equals the average of these four numbers. If we did a more manual calculation of it, then we would simply be summing these numbers up and dividing by the number of numbers there. They sum up to zero because we have the positive four and the positive six netting out against the negative four and the negative six. So they're going to sum up to zero. If I did that in Excel, I could use the trustee sum formula and then the count or the N represents the number of items one, two, three, four of them. So the count is four in Excel. I can use the trustee count formula to count those four cells, zero divided by four comes out to the mean of zero. So that's going to be our formula, which we can represent down here. We're taking X1, X2, X3 up to XN, which are four of them divided by N, the count N, which we can also represent this way, summing X1 through the number of Xs, Xi, which is four here, and then divided by N. Okay, so that's going to be our average. Now we want to think about the average deviation. So so now we're thinking, okay, what I'd like to do, if I'm just trying to intuit and say I would like to get a number, a numerical representation of basically the spread of data around the center point, the center point. In this case, we just we've made our set of numbers be at zero. Right. So that's going to be the center point. How would you do that intuitively? You might say, well, why don't I see how far each of these individual numbers are from the center point or zero, right? That's the first thing that would probably come to mind. We're going to say, okay, well, if that's the average, well, then how far away are each of these numbers from the average? So that's, and so that's what we'll do. We'll take, we'll take this first and we'll get back to this equation in a second. But we've got, we've got x sub i minus mu, which is representing the average. Okay. So if I did that, we can say, let's do it this way, we're going to say, here's the six, here's the middle point, which is zero. That's the average six minus zero means it was six away. Obviously, these two numbers are the same because we happen to pick the middle number being exactly zero in this case. Right. So we've got, we've got then, then us for this one is the data point. This is the middle number, the average zero. So it's four units away in the negative. This is for this is zero. That means it's four points away. And this is six. This is zero. It is six points away. Now, if I was to sum up these differences, one of the problems here in my, in my process is that I'm always going to come out to zero and it doesn't, it's not coming out to zero just because I picked, I happened to pick a negative numbers here. If whatever number set I have, if this is the average or middle point, I'm either going to be great higher than that from when I, when I tracked from that point to, to whatever individual point in our data set, or I'm going to be lower. So I'm going to end up with positive and negative numbers, which net out to zero. So each of these steps kind of gives me an idea of the, how far this data point is from the middle point, the mean or zero. But I'm not really, I can't sum it up and get a number that gets very useful other than as a check that I properly calculated these numbers. So what's the next thing that we would do intuitively? We would say, well, what if I take the absolute value of these numbers? So now I just want to know the distance from this, from the center point, but I don't care if it's above or below. I just want to take the distances, all the distances, and then I'm going to divide that by the number of data points. So I'm kind of taking, you can see what I'm doing is I'm taking, you know, the average, the distance, you know, like the average distance from the mean, right? So we're taking all the distances from the mean, absolute value them to make them all positive. And then I'm divided by the number, the number. So if I absolute value these, they all become positive, right? The absolute value of six is positive six, the absolute value of four is positive four, absolute value of four is still four, absolute value of six is still six. So in other words, we, we in essence made everything positive, right? By the way, in Excel, the absolute value function to get the absolute value is equal ABS, and that'll give you your absolute value calculation. So now you can see that we come out with a 20, if I sum these up, this counts up to 20. This right here is not summing the cells, it's counting them. So I've got one, two, three, four of these line items. Okay, so if we have that, I can say, all right, well, then that means that the sum of the distances from the mean come out to 20. And I could basically, in essence, kind of take like an average, the average concept of it, there's four of them, one, two, three, four, the count, divided by four. And I get basically an average deviation from the mean. And you see that kind of, that kind of makes sense intuitively, because we're saying, okay, I'm just going to take the distance from the mean, make them all positive. Whether that distance from the mean is higher or lower is not what I'm looking for. I'm looking for the average distance from the mean, whether it be higher or lower. And then I'll divide, I'll basically kind of like take the average of those distances from the mean, and we can call that the average deviation. And so that can give us an intuitive sense of how spread something out is. Now notice, as we work with different data sets, sometimes it's useful when I get to these numbers, they look a little bit more abstract than, say, an average. When I look at, for example, an average in my mind, I'm like, okay, that's like the middle point in the data set. I would imagine that can kind of be the middle or the mean or the median is kind of in the middle. When we get to the average deviation or we get to like standard deviation and the variance, they start to be a little bit more subtle and abstract concepts. And sometimes they become, they show their relevance more when we're comparing different data sets, right? So if I'm comparing two data sets, which are similar data sets, then I can look at that average deviation and compare it to the average deviation of the other data set. And sometimes that's a useful tool to, that's why it can become a useful tool in a comparative setting. But we'll think about more formal ways to use the average and we'll lead, the average will lead into the standard deviation and the variance, and then we'll talk about those concepts in more detail in the future. And basically, just to give you an idea of what's going to happen next, we're going to take this average deviation and we're going to think, well, they're going to basically remove this concept of the absolute value and replace it with squaring it. But you can see if I square it, that'll also remove these negative numbers, but it also results in it being squared, so then I might have to have a square root, you know, in the, in the calculation. So it'll be a little bit more complex, a little less intuitive. You might also ask yourself, why make it more complex when mathematical equations are supposed to be, you know, simple, simple is better if the more complexity doesn't add any value, right? It should be beautifully simple, not more complex for no reason. So if we're going to make something more complex, we should also, we'll give a little, a few, at least a couple justifications as to why the standard deviation and the variance are kind of like the go to calculations and not this basically average deviation.