 even point if you think of this as like a teeter-totter. So if I want to get my spread, I might say, hey look, why don't I take each of the data points represented by x here, x sub i i equal 1 to n, all the data sets, take each data point, subtract mu, which is representing the mean or the average. So if I take each data point in my data set minus the fulcrum, the middle point, the average represented by mu, I'm going to get the distance from the middle point of each data point. Now if I take that, what's going to happen is if I add up all that data, it's going to add up to zero because some of these are going to be higher and some are going to be lower. And the property of the average means that we're going to end up with zero if I have positive and negative numbers. So you might then think the next thing to do would be to take the absolute value, and that means that we're taking the distance from the each data point to the average, but I don't care if it's higher or lower than the average, I'm not using positive and negatives, I'm just looking at the difference, whether the difference go to the right or go to the left, higher or lower, just the distance. And then I'm going to take that distance and divide by the number of units. And that would be the most intuitive thing that we might come to if we kind of mold it over. So one intuitive way to measure spread of data is to look at how far each datum is away from the mean. So this each datum from the mean represented by mu, then take the absolute value of the distance of each data from the mean, absolute value, take the average of those values dividing by n divided by n, the average distance from the mean is a potentially useful measure of dispersion, not the most commonly used measure. So although this leads into the most commonly used measures, variance, standard deviation, it's not the one most commonly used, you could use it, but not the one you'll probably be working with most of the time, delving deeper into dispersion. So now we have the variance and standard deviation definition quantifies how spread out the numbers are from the mean. So now we're moving from the average deviation formula to the variance and standard deviation. So you can see the similarities will go into the similarities and more detail between the average deviation taking the absolute value, whereas the variance is squaring, and then taking the standard deviation is just taking the variance, which is everything under here. And then we're going to take the square root of it. So these two are basically related. The variance is kind of a stepping stone to get to the standard deviation, which is why the variance often represented by sigma squared standard deviation, simply sigma. So the variance, let's go into it step by step, denoted by s squared or sigma squared, Greek letter sigma, average of the square difference from the mean. So we're going to say similar to what we had with the average deviation, we're going to take each of the points and subtract it from the mean, same thing we did before gives us the distance of each point from the mean. But instead of taking the absolute value, we're going to square it. Now, the squaring of it has the same property of removing the negative numbers, which we need to do so that we can take the average distance. However, it also squares it, which means we're going to end up with a lot larger numbers, right? So now we're going to square it making everything positive, but also making them squared and then divide by n, and that's going to give us the variance. Now, the variance is kind of an abstract number, because it's going to be a very it's going to be a larger number. But it in and of itself, especially when we're comparing different data sets like salaries in the US versus salaries somewhere else in the world, can be a telling factor oftentimes in comparative purposes, even though when you look at it in and of itself, it might seem like a number that's not giving you a lot of value. But then the next step would be the standard deviation. So now you're simply just taking what you had for the variance and taking the square root of it, transforming the variance, which represented by sigma squared to just sigma, the standard deviation. So same exact thing, except now we're taking the square root. So it's kind of like we squared it. And then we took the then we kind of removed the squaring of it by taking the square root kind of, you're not going to get to the same number as we got with the average deviation. But you can see a similar kind of process here, in that with the average deviation, we took the absolute value to deal with that negative number problem here, we squared it, and then basically took the square root. Alright, and we'll talk more about why we might use this, which looks more complex than the average deviation in a second. So standard deviation square root of the variance, so we just took the variance, and then took the square root of it gives the average distance data points are from the mean, so average distance from the mean values will be larger if the data set is more widely spread and smaller if the data are close to each other. So again, both of these numbers often seem a little bit more abstract. But if you're comparing different data sets, it becomes apparent, because you're going to say, well, if the standard deviation is larger, you would expect more spread in the data from the middle point from mu the mean, if it's smaller, you would expect the data points to be more compact around that middle point. So they are affected by outliers. So if there's a big outlier in the data set, notice we're comparing to the mean the middle point. So if the mean is impacted by outliers, you would think then it would also be the case that both standard deviation and variance would be impacted by outliers as well. So we have to keep that into consideration when we're dealing with outliers, basically the square root of the average square distance from data points to the mean note for samples n minus one is used as the denominator to account for degrees of freedom. So we're dealing with a population here, you might see a similar formula when dealing with standard deviation, but in the denominator, you've got n minus one. That's because it's a difference between taking the standard deviation for the entire population, where we have all the data for the entire population versus a sample, where it's a sample of data in the population, we'll talk more about that in future presentations. Right now in this section, we're generally focused on data, which we are imagining to be the entire population. All right, let's get it back into this question of why square the differences. So back into the question of why don't we just use our average deviation? If my problem up top is that when I take each data point minus the middle point or mean, that results in negative numbers. And I need to get rid of the negative numbers so I can sum up the differences from the mean. Why not just take the absolute value instead of squaring it, and then in essence, taking a square root of it. And one reason