 is that the population mean then a unique value that minimizes the sum of the square differences. In other words, it has a characteristic to it that it's gonna come up with a unique number. So we'll show this in one of our example problems, but when asked why we square the data, most people will tell you that you do that because that gets rid of the negative numbers and you need to get rid of the negative numbers. But then the question of course is, well, why don't you just take the absolute value? Because that also gets rid of the negative numbers and is easier because you don't have to square it and then take the square root. And in mathematics, normally we want things to be as easy as possible, removing any excess steps so that we get down to the simplest kind of formula that we can apply to a particular situation. So you would think there's gotta be a reason why we would do something that's more complex. And basically if we were to take some focal point other than the average here, for example, and I used just some other number as the middle point and compared to it, we could end up with the same number when I take the average deviation. Whereas if I take this method, the standard deviation, I come up with a unique number when using the average in this slot as opposed to some other number I picked as well, so that might not be completely necessary to understand to do the calculations, but that question often comes up and so it's kind of useful to get an intuitive understanding we'll work a practice problem related to that. So implications and applications, so comparing dispersion in different contexts. So notice that once you have these numbers, if we have to have the data sets in comparison to actual reality to be drawing meaning from the data sets. So for example, if we're dealing with salaries and large corporations in different countries and we had the data sets for these different countries and we were measuring some of our statistical tools such as the central points median and mean as well as the dispersion, standard deviation and variance, it might give us some implications about the different strategies of incentives and compensation from the different countries, right? We might be able to draw conclusions from that data set but of course we need to know the contexts of the data sets in order to be drawing the conclusion. We need to know that their data sets about salary related to one company versus another country that might have different strategies around compensation and then of course, when we get the data we possibly can draw conclusions around of that nature. So inferring meaning, while statistics provides valuable tools, it's the application and understanding of the context that brings deeper meaning. Data must be interpreted within its context. Now notice also, when we deal with data within context we also are gonna inevitably be dealing with some kind of politics around it as well whether that be corporate politics or other government politics or everybody's got their biases that are involved and that often again leads people to go to the old quote of lies and statistics, right? As if the statistics are at fault if there's kind of misleading data. So we have to properly be able to represent the context because again, it's not the statistics fault, the statistics are just the numbers, they are the stats. If the context around the statistics are being misrepresented then we have to get down to the misrepresentation of the statistics of the context just like we would if people misrepresent something in words, right? It's not the words that are the problem if people are using words improperly, misdefying words, making up new words, saying words that mean one thing and acting like they mean another. It's not the words fault, the words aren't at fault here. It's the people that are lying with the words. So the same thing is the true with the statistics here. So we have to keep that in mind, they're just a tool. So summary, the mean and the median, although useful, don't tell us anything about how widely spread the data are. So remember that most of the time those first central tendency numbers are the ones we look at, the median and the mean, but we also, it's gonna be quite useful to know the spread of the data which we can visualize with a histogram but also would like more of a numerical representation. A histogram gives a good visual sense of the distribution but not a summarized numerical one. So the histogram is great but we'd also like to have a numerical representation. The five number summary, you might say, and associated box plot gives some sense of how the data are spread out but can sometimes be misleading. We'll do some examples to show that. So you might say, hey, the five month number summary gives me a nice picture of the spread of the data to some degree, but we'll actually show that example to show where it falls short sometimes where we have two very different data sets which actually result in the same five number summary and box and whiskers plot as well. So the standard deviation is a numerical measure of roughly how far the data are on average from the mean. So when we look at that standard deviation, remember that's kind of the idea of it. You've got the middle point, the mean, the focal point if you're looking at the histogram and you're trying to think about the average distance from that focal point with the standard deviation calculation. Now, remember that the standard deviation and the variance can be a little bit more abstract of terms. In other words, when we think about the mean or the median and even the five number summary, the data in and of itself is usually enough for us to kind of grasp what it's telling us about the data to some degree. Whereas when we get into the standard deviation and variance, they can be a little bit more abstract. So working through practice problems and using different data sets and again kind of getting an idea of the context is often useful, especially in a term like the variance, for example, can look like a very abstract number but it can be a useful term when we're comparing different data sets. So we'll work some practice problems in this section and we'll continue on with these concepts in future sections.