 statistics and excel mean and outliers got data let's get stuck into it with statistics and excel I mean we'll be using one note here but we'll still talk about Excel you're not first a word from our sponsor yeah actually we're sponsoring ourselves on this one because apparently the merchandisers they don't want to be seen with us but but that's okay whatever because our merchandise is is better than their stupid stuff anyways like our accounting rocks product line if you're not crunching cords using excel you're doing it wrong a must-have product because the fact as everyone knows of accounting being one of the highest forms of artistic expression means accountants have a requirement the obligation a duty to share the tools necessary to properly channel the creative muse and the muse she rarely speaks more clearly than through the beautiful symmetry of spreadsheets so get the shirt because the creative muse she could use a new pair of shoes if you would like a commercial free experience consider subscribing to our website at accounting instruction calm or accounting instruction dot think of it dot com require to but if you have access to one note one the icon left hand side one note presentations 14 16 mean and outlier tab we're also uploading transcripts to one note so that you can go to the view tab use the immersive reader tool changing the language if you so choose being able to read or listen to the transcript and multiple languages tying the transcripts into the video presentation using the timestamps one note desktop version here data on the left hand side our salary data that we are imagining for a corporation for example we're going to do some of our standard calculations on the data set such as the mean or the average the median the max the men quartile one quartile three and then we're going to add an outlier in this case imagining that to be the CEO salary which is far higher than any other salary and then look at what changes with those standard calculations and what implications those changes may have when we're making decisions in the future in terms of what types of numbers do we want to be using to represent our data set for relevant decision making alright so if we look at our data set on the left hand side this is our salary data set we're going to be doing the average calculation these are just our formulas for the average calculations that we've seen in prior presentations here's going to be our numbers that we would be calculating we've got the mean or the average which we could calculate using our formulas up here which means we would basically add up our data set and divide by the number of items within the data set if using excel we can use this trustee average function which would simply be equals average in order to calculate it scroll out of here a little bit so we could see this at one time then we're going to have the median remember that's the one that we're going to pick the one in the middle so if we sort it from the lowest to highest simply picking the one in the middle is the median and if we have excel excel can do that for us obviously we not we need to know what the difference is between a mean calculation and a median calculation and we'll see here that of course there are times when one will be more relevant than another and we have to pick the more relevant one in this case you can see that there are two are fairly close to each other and that would give us an indication in and of itself that there isn't a huge outlier possibly that's really skewing the mean if these two are very different from each other that might give us an indication that that there could be an outlier impact and if there is then we probably want to dig down deeper and see what's going on with it so then we have the maximum this is the largest number in the data set we can get that by just using the function equals max to pick that up and then we have the min that's going to be the lowest number in the data set which we can see right here is the 67 9 the highest number if I go down 84,000 was the 84,000 here we can get to the men with simply equals men calculation and then we've got quartile one which now we're taking like the middle point of that first quartile we're break we're basically breaking the data set taking it from top to bottom right and then breaking it up not just in the middle point but by quartiles and you can use in excel the quartile.exe you have to use one more argument a comma one to get the quartile one to calculate and then we have quartile three because we already have quartile two in the median and here's the calculation for that quartile three you just need a three at the end of it to be picking it up now if we were to do the average calculation using our formula we can sum up the entire all of our cells which would be quite tedious to do in a calculator but in excel we can recalculate this number the average 71 498 by summing everything up which is our most trusty function in excel the sum function and then we are going to divide that by the number of values meaning we can count the number of values now to not do that manually we can use a count function in excel equals count all of all the cells in this table it comes out to 51 and if we divide that out then we get to our average let's see if I pull out the trusty calculator just to check it 3646400 divided by 51 and we get to the 71 498 it's rounded as you can see so there we have that and this is just going to show us the division in excel's taking the cell divided by this cell alright so then if I go to the right here we can look at a histogram here is a histogram of our data which is simply taking the buckets on the bottom and seeing how many of these items fallen to the buckets from 67 9 to 69 364 you know you had between about 9 from 69 364 to 70,000 827 you up to like 15 and so on and so forth and you've got a couple that are still kind of outline kind of outside of this particular histogram but the data set is not a that long either now let's add a significant outlier however so we've got the same data set but now at the bottom of it it jumps up we've got this one outlier which is significantly larger than the rest of the data set and it could be much larger than that even a one million I mean it could be like you know 10 million or something you know they could really skew the numbers so what does that do when we look at our standard calculations so if we do if we do our standard calculations with the same data set but simply adding the outlier now we're at 87 or 89 354 as opposed to the 71 498 so it's a pretty significant change and if that outlier was a lot larger you would have even a more significant kind of change to the average so the outlier you know it will depend on how many numbers are in the data set and then how big that number is relative to the to the rest of the data set and then if I look at the median you could see that the median did not change so that's a huge indication notice that these two have a fairly significant difference so if I just looked at these two numbers versus these two numbers I'm more likely to say over here well maybe there's an outlier because the median and the mean are fairly are a little bit more significantly different from each other than over here so that's going to be an indication that also shows us that if there's something that has an outlier in it we have to we have to ask the question of what is our objective and which number would be best how are we going to deal with that if there's an outlier in it then we might say that it'd be better to take the middle number if I'm trying to for example see how much money I'm gonna earn at a particular organization I probably can't pick the the outliers probably gonna skew that because I would I would assume if I'm an average person I would be somewhere in the middle but I can't take the average of all of them because that outlier kind of skews the whole average therefore you're likely to take the median that often happens with things like home prices for example if there's a million dollar home in the neighborhood or multiple million dollar home but most of the homes are around you know 200,000 or whatever then then that outliers gonna really skew the numbers of what the home may actually cost the other thing that you could do is say well let me take all of the numbers and just trim off the outliers I'm gonna remove the outliers and then take the mean or the average without those outliers in it so those are some strategies that you might take but obviously also realize that when there's an outlier that's another opportunity to kind of be a little bit deceptive with numbers depending on what they're trying to do you know if they're trying if someone is trying to say I live in a very wealthy area and get prestige or something or whatever they could say well the average of my the average home price in my neighborhood happens is higher because they're taking into account outliers maybe right because you know some rich person happens to have a mansion that lives in the neighborhood or something like that and the when in this happens of course in business all the time as well when people are trying to make an argument they're usually going to look for the statistics not to look for evidence to support a hypothesis in good faith unfortunately oftentimes they're looking for numbers to support their argument just like when they do with words that again it doesn't mean the it doesn't mean the numbers are wrong you can't blame the numbers you got you got to blame the person who's being deceptive with the number with the numbers right and then and then look at it from up from a fuller picture and see if you can pick up what is going on if we look at the maximum obviously the maximum has changed now because this is the highest number in the data set so this will give you a clear indication generally that there's an outlier if you take the max and the men there could be an outlier on the small side to so now you can say okay well yeah the now this outlier looks way higher than the average or the median and that's another indication that there's that there's this outlier issue that you're gonna have to deal with in some way in order to come up with a rational conclusion about whatever decision we're making and such as should I work for this place or how much am I likely to make in this place if I go in as an average person the men number did not change because there's no outlier on the men side of things the outliers on the max side of things now if one person like if there was if the CEO tried a bold strategy and said I'm not gonna earn any money unless the profits go up and they take $1 salary or something like that then it's possible you could have an outlier on the small side of $1 right which again would probably skew the numbers the other way not as far not as far as the million dollars but it would skew the numbers that way so then on quartile one were a little bit different because I added a number over here so now the quartiles are a little bit different because when I picked the middle number in the quartile but it's still somewhat similar because that number similar strategy as the strategy of the median and then quartile to it's a same same concept and so there we have those and then the average down here if I take my sum of the average numbers now if I recalculate the average I take the sum which now comes out to that of all the numbers instead of this because the one million increased it significantly if I do account I come out to 52 instead of 51 because we added that one million dollar number in the number of in the how many numbers are here and if I divide that out that gets that's how we get to our 89 354 versus the 71 498 now if I took this data set as well and I made a histogram from it now this histogram I purposely put all this is like all the data from the prior histogram which is kind of over here scrunched together and I put all the buckets in here so that we can get that outlier so that outliers over here so obviously this is not a histogram format that you would probably want to to give to someone else or something but I the only reason I'm formatting it this way is to show this outlier really skewing things so right if you had if you took your focal point the meat the mean it would be over here somewhere and that and that outlier like a teeter totter is weighing way over on the on the right side which is which is why you get this this kind of impact now how can you deal with that with your histogram to me if you wanted to plot the data well there's a couple ways you could do that you could you could take all of your outliers anything that's over a certain dollar amount like say $90,000 you you put in as an outlier and then in excel your histogram would then cut out all you can cut out all these boxes and just include it over here at at the at the 90,000 so that would mean that people looking at your histogram aren't going to get a sense of how extreme the outliers are so again you can see that as deceptive depends on what you're doing because if you if you say I just put everything that's an outlier over 90,000 in the outlier bucket then you're going to get a histogram that shows you kind of the middle point better but that it doesn't tell you how extreme the outliers are which is fine because that might be useful for for some cases you could you could take the outlier out of your data set and then take a histogram without the outliers so but those are some strategies to to put together the histogram in excel which we do in excel if you want to check that out