 Statistics and Excel, Standard Deviation and Variance, Large Outlier Impact. Get ready, taking a deep breath, holding it in for 10 seconds and looking forward to a smooth, soothing Excel. First, a word from our sponsor. Yeah, actually we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay whatever because our merchandise is better than their stupid stuff anyways. Like our CPA six pack shirts, a must have for any pool or beach time, mixing money with muscle, always sure to attract attention. Even if you're not a CPA you need this shirt so you can like pull in that iconic CPA six pack stomach muscle vibe man. You know, that CPA six pack everyone envisions in their mind when they think CPA. Yeah, as a CPA I actually and unusually don't have tremendous abs. However, I was blessed with a whole lot of belly hair. Yeah, allowing me to sculpt the hair into a nice CPA six pack like shape which is highly attractive. Yeah, maybe the shirt will help you generate some belly hair too. And if it does, make sure to let me know. Maybe I'll try wearing it on my head. And yes, I know six pack isn't spelled right, but three letters is more efficient than four. So I trimmed it down a bit, okay? It's an improvement. If you would like a commercial free experience consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com Here we are in Excel. If you don't have access to this workbook that's okay because we'll basically build this from a blank worksheet. But if you do have access three tabs down below example practice blank example in essence answer key practice tab having pre-formatted cells. So you could get right to the heart of the practice problem blank tab blank worksheet with just our data in it. So we can practice formatting the cells within Excel as we work through the practice problem. If you don't have this data set you could just type it in there because it's not a very long data set. You can also create a similar data set if you so choose or you can look for data sets online possibly checking out Kaggle.com K-A-G-G-L-E dot com for practice data sets. Let's go to the first tab to get an idea of what we will be doing. We're looking at our salary data again. We'll do a similar calculations we did in prior presentations calculating the average but our major focus here is going to be on the variance and the standard deviation. And then we'll add outliers to our set of data to take a look at the impact of those outliers as we do our calculations. The understanding of the impact of outliers being very important when we're trying to analyze data. So let's go to the blank tab on the right. I'm going to remove the Kaggle thing here. I'm going to scroll in a bit holding down control scrolling in a bit. We're up at 250%. Let's format the entire worksheet first like we do every time. I'm going to go a bit faster this time because the beginning part of this will look similar to other practice problems and then we'll add the outliers. So I'm going to select the entire worksheet right clicking formatting the cells. I'm going to then go to the number group. I like to make it currency negative numbers bracketed and read removing the dollar sign and the decimals only adding decimals as needed. Okay, let's make the entire sheet bold home tab font group. Everything has been in bold and it's too bold. You're too bold, sir. And then we're going to go into the to the data here. Let's put a table in there going into the insert tab. Let's go into the tables and put a table dancing ants doing their mamba dance to make the the rain happen. No, that's not wasn't a rain dance. That's the table dance. So they put a table around it with that dance and then we'll put the skinny bee. Let's make a skinny bee and do our standard statistical type calculations. So we might have the mean or the average will do the mean calculation first because you want to get you want to keep an eye on the bully on the mean one. So that's the average. And then we're also going to do let's pull make a large C and we'll do like we'll do the men. We'll do the Q1. We'll do the median, which is like Q2. We'll do Q3 quartile one, two and three median also could be called Q2 right Q2 as we've seen in the past. And then we'll say that we're going to have the max and now we're adding the standard deviation for the population. I'm going to say for pop and then not for like your dad or anything, but for the population and then standard or no, the variance variance variance for pop instead of the sample. So let's do those real quick just with the functions in Excel. So we've all we've got all these cool functions under our belt. Our belt is getting quite tight. All these tools are digging into my belly. I need to get a bigger belt with all the stuff under there. So we're going to say this is going to be equal to the mean actually the average average average. That's how to be my fingers are in the average. And then we'll select our data. I'll do this quickly. We've seen these in the past. Selecting the data with just the drop down. See how quick we can do this. We can say this is going to be the in shift nine. My mouse is already on that data point. She's going to click the data and boom quartile one. So I'm just going to put quartile or have to have an equals and then quartile tab selecting the formula. Pick in the array just clicking on the mouse. I'm going to put a comma because I need another argument here. And that's going to be that it needs to be a one for q one quartile one. I could do the same thing for quartile two, but it's more common to use the median calculation tab selecting the data. I don't need to put a one next to it because the median means quartile two and then equals quartile three tab picking up our data. This one does need a second argument. Therefore comma and a number three for quartile three boom. And here's our our new formulas here that we can pick up the standard deviation. This for the population as opposed to the sample. So equals ST. So we want this one standard deviation. This one for the population and for dot P on it and then boom. And then we want the variance equals the variance with the P. Look at all that stuff under our belt. We got more belts going to be bigger than Batman's with all the stuff in there with all the utility tools we have. I missed the maximum. Let's this is going to be equal to the max and we'll pick up the max. All right. There's our there's our data. Let's put some blue borders around this selecting all the data. Home tab font group dropping down the color. If you don't have that blue and you want it, you go into the more colors standard wheel. That's the blue right there. Okay. And then let's put some font group borders around it the border and the blue. All right. We can put a histogram if we want selecting the data if we so choose inserting our histogram. We've got the histograms under our belt to got them. These are under our belt. It's like there's like a whole whole bunch of cool stuff in there. It's like Christmas. All that stuff under the belt got more stuff under my belt than I had under the Christmas tree when I was kid crying out loud. Okay. Let's select our data here and let's put the bins up to 11 11 bins and close this up. Okay. So now let's do our our we're just going to go to the to the variance and the standard deviation now. So we'll we'll go to the right and we'll do our variance and our standard deviation. And and I want to calculate it with a with a table method as we've seen in the past rather than just the function. Because I think that gives us a more intuitive sense of what the variance and standard deviation are. And then we'll add the outlier. So I'm going to say let's put our our variance formula. I'll just type in the variance formula so we can get an idea of it. So I'll copy it. I'm just going to copy it over. I won't retype it. But remember if you want to you can go to the insert tab. You can go to the equation and you can type in your formula. I would use the ink thing here to do so. But I'm just kind of I'll make it orange. And let's see if I can make it. I don't need it that big these days because we've seen it before I'll bring it on down to like 12. And so there we have it. That looks good. So then I'll make this a header home tab font group making this black and white. And then this will be the standard deviation. Standard deviation for pop for the population. Not your dad for the population. Your dad might be in the population. Who knows, but it's not specifically for your pop. All right. So we're going to say this is going to be. Here's the form you lie. If you put that down here, I had it at 12 on the font. Let's make this 12 font to. 12. Oh, so 12. And there we have it. OK, so then let's add our table. So I'm going to go back on over. I'm going to pick up our data set. I'm just going to select the entire column a that'll pick up the entire table and repaste it over here. Repaste and I'll just put it in column Q control V. I'm using the keyboard this time. I'm going to make a skinny P. And then we're going to say compare it. So now I'm just going to do my calculation this way calculating the variance. So I'm going to take the distance from each data point to mu or the average or mean and then we'll square it. Right. That's going to be the numerator. So I'm going to say this is going to be the mean is going to be equal to. Let's do that mean calculation where we hit the we hit the salary column over the head with our mean calculation because we're mean. It's the mean. So this is going to be average. We'll do it with an average that which isn't quite as mean of a function name. And then we'll say this is going to be the dancing ants over here. So there's our average all the way down the mean all the way down and then we're going to take the difference. The means going to cut down the salary. When we compare the two going to take out the means taken out the difference of the salary boom. There's our difference column and then we're going to square the differences. So we get all positive numbers and then squared. So we're going to say a squared. So we're basically completing the top part of our of our formula here. So we'll square it this equals and we're going to pick that difference up and take it. To the carrot of so to the power of two squaring it. So I'm going to say OK. So there we have it. So now we've squared our numbers. So now we've got basically that one's a large number right there. So we've taken we've done this first part of the top. Now we just need to sum it up right. So now we're going to sum it up and we'll have our numerator. I could do that by going down. I'm going to go into my table options here going to my table tools adding the total column. And let's make this one the average. So I'm going to average instead of summing. I'll make this an average. So that count that's our average again. This one is going to be then let's count here because I don't need to sum this so the count will work there. This one let's sum it just to give us a double check that it should add up to zero. And this one is our our sum of the squared numbers which we will now in order to get to the variance will take that number and divide it by the count. So I'm going to take these squared difference from mean which is equal to this number and we'll divide it by the count. So I'm going to say that thing divided by the count count which is represented in by n in our equation. We calculated the count at 51 of these items. Let's put an underline underneath it home tab font group under line. And then we're going to say we have the variance. This is this is Q or Sigma two. Let's insert equation a symbol and the Sigma is under the Greek and Gothic. I'm in normal text but I already have it down here in the recent areas. I'll pick that one up Sigma boom and then I'm going to put enter and then I'll go back into it and see if I put a two. It's a normal to highlighting the two right clicking on it formatting the cells to a subscript and enter. So there it is boom. So now I'm going to say I'm going to say that this equals this number divided by the 51. So there we have it and then the standard deviation stand third deviation for the pop population. I'm going to say is Sigma. So I'm going to say insert symbol making another Sigma insert looks good. Oh, I got two of them now. Whatever. I only want one. It's not double Sigma. Anyways, so then we're going to take the square root. So this is going to be equal to square root, which is sqrt function of that number. And then we get that two seven five one. Okay, so let's make that that we've seen that in the past. Let's go to the home tab font group. We're going to make this blue and bordered. So now what we want to do is say, okay, well, what if we add an outlier a large outlier like the CEO salary to this entire calculation? So let's take our same data set over here and it might be easiest to copy it from this table on the right. So I'll copy this whole data set in column a again, right click and copy. I'm going to go all the way back. And then I'm going to say, okay. Let's insert right click and right and paste. And then I'm going to go to the bottom of it. Well, we could make a skinny you. Let's make a skinny you. You already skinny. Okay, and then I'm going to put and the outlier. I'm going to put 1,000,000. So now we've got this big outlier down here. We've added to our table now. So now if I did my calculations, let's do our same set of numbers. I'm going to go to the left and I'm going to copy just, let's just copy the names and not the actual numbers. So I'm just going to copy those calculations and we'll redo them now that we have this outlier. And we can kind of see which ones of these have an impact on them and which one do not. So I'm going to make a smaller W, W, skinified skinny W. And then let's paste this. Let's make it a larger X next to the Y. And then we're going to say the average is going to be equal to the average. We'll do this again, double clicking, picking up the average now, including the outlier of that 1,000,000 has an impact as we've seen in prior presentations on the average. And let's make that outlier really big just so we can really see it. If I make this like the CEOs making, you know, 60,000 and they're complaining about it because they're not getting paid enough or something. It's like whatever the CEOs going on strike and like of the, okay, whatever. And then, but the men, if I calculate the men, we're going to say that is not impacted, right? The 67, 9 is still 67, 9. So the little guys still down there making, making what they're making. And then we've got equals the quartile one, quartile one. And so I'm going to select the data and put a comma one. And again, you would think this would be less impacted. We added another number, which will kind of shift things that when we're trying to count like if I was trying to order things and count the quartiles and the median. But again, you would think the one in the middle would be more static if there's this big outlier type of thing, right? So same with the median. So equals the median, pick the one in the middle. If we were to order them and pick the one in the middle, then if you were to think about your actual salary, you would think that this one would be more accurate than the average, right? On these ones, these are less impacted by the, by that outlier. Whereas the mean is impacted. So if I say then quartile three, quartile data, comma three, same thing. And then the max equals the max, the biggest number. Now the max is obviously impacted by the outlier in this case, because the outlier is on the high side. Whereas the men wasn't impacted by the outlier because the men is on the low side. But if the men was impacted, like if someone was making a dollar, perhaps we had a CEO kind of like that was trying to grow the company, like a Steve Jobs or something and made a deal that said, I'm only going to earn $1 unless we start, you know, making more money or something like that. And then they didn't make more money unlike the Steve Jobs thing, right? And then they don't earn, then now the CEO's only earning a dollar, right? And that would be an outlier on the low side. But in any case, if we had the standard deviation equals the standard deviation, we're picking up the population and selecting the data. We come up to eight, two, three, zero. So you can see that that is substantially impacted by the outlier. So when we try to pick this number, obviously what kind of what we're doing, as we saw when we looked at the average deviation, we are in essence kind of taking the average or the mean of the distances from the mean. So you would think that there's going to be an impact on that number from the outlier. So that's something that obviously we have to kind of keep aware of when we're using our standard deviation, same with the variance. So if I take the variance for the population, we're going to see that that outlier is going to give us a big change on the variance. The variance is huge because that's that squared number, whereas the variance here was that seven, right? So let's just do that again. So I'm going to put my cursor here. Let's make this bracketed and blue and let's change that outlier instead of making it 60 million. Let's just make it one million. So now it's the that you still have an impact, but it's not quite as extreme. Let's take our data and and you'll remember that when we enter a histogram, this is going to have an impact on the histogram. If I go to the insert and say make a histogram, we're going to put this over here. So we're going to say delete this and so so then like I could go into my histogram and deal with that outlier by going into my buckets and saying that I want that outlier to be grouped in my histogram somewhere around the top part, so I might say at 90,000. So so now I've got now I've got and then I could increase my number of buckets to whatever 11 buckets. And so now you've got something that that is manageable, even though you got that huge outlier out there, right? But if I if I remove that and then I put my buckets up here to like 450 or something just so you can kind of see that the data, the extremities of the data, everything is kind of over here and then you've got those outliers way on the right hand side that has that's having that has an impact. So if you think about if you think about where the teeter totter is or where the fulcrum is and this is like a teeter totter, it's at 80, 89,000 right is that midpoint, but this one is pulling it way over because it's like it's like the kids sitting way on the far end of the teeter totter. And so it has an impact due to leverage way out there on the end. So then let's let's take our same data set here and do that calculation, the standard deviation and the variance with our table format. So I'll copy this whole column V. Clicking column V right click and copy copy column V. And then we're going to go on over here to column AI, the AI column, but we're doing it not completely with AI, but AI is kind of helping, I guess, with Excel, but we're still thinking about it here. It's not completely automated yet at this point in time. So we've got our 1000 included down there. We're going to compare the salary to the mean again. So the mean equals the average, select in the entire data set to calculate the average. We're comparing the mean. So we'll take the difference different difference because that mean calculation different is going to is going to take down the salary number because it's mean. So we're going to take the salary minus the meanie. And there we and so now we've got the differences from the mean and you can see they're all negative, right? Except for that one million. So you can say that's so that's kind of an indication. Notice how that gives you a sense of the data more than just calculated the standard deviation in variance right there because you're like, okay, wait a second. You might have seen that one million, but if you were to calculate it this way, you're like, oh, okay, something funny is going on because everything's negative till you get down to that one number that really kind of highlights it, especially when we make the negative numbers read and bracketed as we have here. So in any case, we're going to say then we have to take the squared squared of these equals that number. Carrot shift six to take it to the power of to the power of I have the power of two. Okay, I got a little carried away. The He-Man flashback with the masters of the universe. But then they ruined that one like they ruined everything. I think they did a remake. I'm not even, I don't even watch the remakes anymore of anything. They're purposely ruining everything. All right, it's okay. I won't get into that right now. That's a different story. Table tools, the table design, we've got the table style options and we're going to total the table. Let's over here, let's make the average on this one. As we've seen before, let's make this one the count, the count and let's make this one the sum. It should sum up to zero. So it kind of gives us that double check. And this one is the sum. All right, so now what we want to do is we're going to do our variance calculation. So I'm going to take the squared difference from the mean. I'm just pulling this number down equals that number. And then we're going to divide that by this thing. So I can put the divide by the count, which is represented by an N and that is equal to the 52 that we got in our table up top and then the variance, which is going to be the Sigma squared insert symbols Sigma, which is in the Greek and got Coptic, but I have it in my recently insert closed the thingy enter and then go back into it. Add the two so it doesn't do some crazy font formatting and then go into it, right click format and make it subscript. Okay, so now we've got all the cool symbol all symbology and then we'll take this divided by this and we get our variance and then we're going to take the square root of that, which will give us the standard deviation, which is just a Sigma, which I can go to the insert symbols and just add the good old Sigma and then that's going to be equal to the square root the sqrt function of that and enter. So there we have it again this it being impacted by by so let's put our home tab font group borders and blue now now so we can see that we have to be careful of the variance and the standard deviation when we have these outliers because oftentimes these tools are going to be let's put an underline under the whole of the home tab font group under oftentimes these tools we're going to use them when we're comparing different types of data sets so that's when it becomes you know more clear when I say well the variance is this well that might not be all that useful and to unless you were comparing different data sets and then the variance will be difference relative to the to the other data sets but as we're doing that kind of thought process we got to keep in mind that of course both the variance and the standard deviation will be impacted by the outliers so we're always keeping in our mind when we're trying to do analysis what are we going to be doing with those outliers should I be talking about our sets of data with the mean would it be more useful for for me to be thinking about some kind of middle point such as the quartiles or the median because of those outliers if I'm thinking about something like salaries and what the average salary or what a representative salary would be if I was going to go into a company or something like that and then if I want to be doing comparisons of the spread like standard deviation of the variance and whatnot then again I have to keep in mind what are going to be those outliers impact and how how do I deal with that so I can make comparisons that will represent data useful for decision making.