 statistics and excel misleading histogram get ready take it a deep breath holding it in for 10 seconds looking forward to a smooth soothing excel first a word from our sponsor yeah actually we're sponsoring ourselves on this one because apparently the merchandisers they don't want to be seen with us but but that's okay whatever because our merchandise is is better than their stupid stuff anyways like our CPA six pack shirts a must have for any pool or beach time mixing money with muscle always sure to attract attention yeah even if you're not a CPA you need this shirt so you can like pull in that iconic CPA six pack stomach muscle vibe man you know that CPA six pack everyone envisions in their mind when they think CPA yeah as a CPA I actually and unusually don't have tremendous abs however I was blessed with a whole lot of belly hair yeah allowing me to sculpt the hair into a nice CPA six pack like shape which is highly attractive yeah maybe the shirt will help you generate some belly hair too and if it does make sure to let me know maybe I'll try wearing it on my head and yes I know six pack isn't spelled right but three letters is more efficient than four so I trimmed it down a bit okay it's an improvement if you would like a commercial free experience consider subscribing to our website at accounting instruction dot com or accounting instruction dot think of it dot com here we are in excel if you don't have access to this workbook that's okay because we'll basically build this from a blank worksheet so you can just open up blank worksheet if you do have access to this workbook there's three tabs down below example practice blank example in essence answer key practice tab having pre-formatted cells so you can focus just on the heart of the practice problem the blank tab just having the data so we can practice formatting the cells as we work the practice problem going to the example tab to look at the end result that we will be building we've got our data set on the left hand side we're going to be sorting our data set we're going to be taking the average and then we'll make a histogram we'll talk about how that histogram isn't really representative or it's a little bit possibly misleading of the data set or possibly the average over here is a little bit misleading of a number then we'll look at a similar data set and we'll calculate the average it'll have some similarities but result in a much different histogram we have the same average between these two data sets much different histogram all right let's go to the blank tab in order to build this so now we have two sets of data here we're going to be focusing first on this first set of data we're imagining that we took some tests of a random sample of people to test for how many ovaries they have and that's going to be this set of data if you don't have access to it you can basically build your own set of data if you so choose using some methods we talked about in the past this second data set we will hide for now and then we're going to be using after we build the first histogram so the first thing I would like to do is format the entire worksheet like we do basically every time I'm going to select the triangle I'm going to right click on the entire worksheet and then format the cells we're going to go to currency I'm in the numbers group currency negative numbers bracketed and red I don't need the dollar sign and I don't want any decimals at this point this column has decimals so maybe I'll add them back in when I get there but for now I will remove them as the default so next what I would like to do is hide let's actually make it bold so I'm going to select the entire worksheet home tab font group I'll make it bold because I think that might help for the screen recording you might not need to do that I'm holding control and scrolling in so I'm currently at 225 percent on the scrolling I'm going to hide column C because I would like to just work with the this set of data first so to hide something you could select the entire column and then I'm going to right click on the column that has been selected and then we're just going to hide it now you could tell if something is hidden because it goes from A B C is missing and then we're over to D all right so now we're going to be working with our ovaries data let's go ahead and center this top one just so excel can kind of see that it is indeed a header home tab alignment center I'm going to make it into a table there's no missing cells so I could just select any place within this group go to the insert tab tables group and make a table the dancing ants do their mamba around a one to a 51 I think that looks correct so I'm going to say okay so there we have it I'm going to make the table I've got the table design let's make it like orange this time just to switch things up a bit so now I'm going to sort the table by a to z or z to a selecting the drop down and go in let's go from a to z so this is why I think this data set would be fairly easy to make or make a similar one if you want to go through this practice problem to get an idea of what is happening here because of course if you were to take a sample of people and test for how many ovaries they have another example would be to test for how many testicles they have or something right you would expect that they would have either generally zero or two now again you might come up with some that have one or something like that due to some some circumstance or something but as a general rule you would think that it would be zero or two now you can imagine how the data could be a little bit misleading if you start to if you build a data set based on this so what I'm going to do is I'm going to I'm going to take my column D and make that a skinny so I'm going to put my cursor before between D and E make a skinny D and let's do an average calculation so average so we can use the average function you call it the mean as well equals the average this is our you know baseline number most people think about the first number that we use to represent something is usually the average so I'm going to select the data set by selecting the the drop down right there so it's just the data so the average of that data set it comes out list to add a decimal to this one now I'm going to add decimals as appropriate right so I'm going to go to the home tab alignment and let's go to this decimals right here increase decimals so it comes out to 1.06 so you can you can imagine you know the argument so that could be made that would be misleading on this data set and unfortunately you know when you can't trust the the data sometimes then this is this is the way things can get kind of misleading unfortunately these days in the united states in our scientific community I can actually imagine them using a data like this like that what are the average ovaries of a of a human being is is around one it's around one and it's like well okay maybe but you know you're kind of missing some you're missing some stuff there aren't that I don't think that could you know if you just rely on that one number from a medical perspective you might be doing horrific things you know based on this kind of analysis right there's something something is missing it seems like now if we then make a histogram of the data that becomes a little bit more clear possibly right so if I select the data and I and I select the data here and we go insert and then charts and make a histogram so now we have our histogram and I pull this on over and I'm going to go okay let's get rid of the title of it my buckets let's just check the buckets if I go to the side bar on the buckets it's at 0.95 all right let's keep it at that so now you would expect if it was a normal distribution type of thing the histogram would be tall in the middle and taper off that's what we kind of often see in histograms but here of course it's you've got these two bars on either side and very little in the middle right so so and that gives us an idea of like okay well well the average is one so that you would think that that most people would fall in the middle right in the one but no even though the average is one you know people are falling on either side because obviously what we're really counting here is is you know the statistics related to basically men and women in general right and so it's so that's how things can get kind of misleading now if I compare that to a similar data set let's unhide the cell now I'm going to go from B to D I'm going to put my cursor on B and then drag on over to E as far as you want to go so that the so that the hidden cells between B and D have been selected right click on that selected area and unhide so so now we've got this second data set now this one I've designed to basically come up with a similar average but have more of a closer to a normal distribution so we can see what the two data sets would look like that would come up to these different kind of histograms even though they have a similar average right so if I if I select this let's go ahead and add the decimals on this I'm going to select the entire thing I'm going to go to the home tab numbers and we had some decimals on this so if you wanted to build your own data set you know you're you're basically trying to build a data set that is is around from one from between one and two that has an average around uh one 1.06 so if I was to then enter home tab insert and then enter a table so here's our table and so there's our data and let's make this one like uh I don't know can we make it green uh let's make it like that green I like to have a header column on it so this is just and then I'll maybe I'll wrap the data home tab uh alignment and then we'll wrap it and then center it alignment and center how about that all right so the second data set uh let's go down and put the average of it down here so this is the average of the second data set equals the average brackets and then I'll select my data set and enter so then there it is let's add some decimals home tab number add some decimals now note that if you wanted to build a data set like this by the way and you're saying hey look I'm getting I'm get I got my data set I'm just putting random numbers in there between that are between one and uh two so you might use an average number generator if you wanted to do that that's one way you could do it and then you can adjust the numbers within it if you want the average to get to 1.06 then you can adjust you can pick one of those cells possibly and you might be able to use a goal seek function so let's say this was at like 0.0 let's say this was at uh 0.0 well let's just change let's change this one let's let's pretend this one was at 1 uh or at 0 and so now I'm at 0.1.04 and you're like I want to make it 1.06 so one tool to do that is I can click off the cell I can go into my data up top and you can go into then the forecast group and use this what if analysis so I can say what if and then I want to use the goal seek function so now what I'm going to do is is is I'm going to say hey look I want you to make this cell be be 1.06 and I want you to do that by changing what's in this cell so notice that this cell is hard coded it's been typed in there so that means that excel can use trial and error it's just going to it's going to be like if you didn't know algebra you just kept guessing what number x should be until x gets to the proper number so the answer is right right that's basically what excel is doing so again we're telling excel hey excel make this cell down here be what I know the answer should be 1.06 do it by changing this variable factor that cell which happens to be a hard coded number and I could say okay and then it picks and then it makes the right number right so 1.1 that's what if I just change that one number so that this average function picks it up so that's just some useful tools if you're trying to back back into your data set but in any case now this is an average so you come up with the same average so if we were representing these data sets with just one number it's like well they're the same but if I make a histogram of the second data set selecting the data set and I'm going to scroll down a bit so my histogram will pop up over here insert tab and then charts and then make a histogram and I'm going to pull that over so now you know it's not exactly a you know bell curve distribution because again I kind of just randomly made the number so you expect it to be but in any case it looks a whole lot different than this right it looks a little bit more evenly distributed so these two data sets although they come up with the same average which is obviously the one number that we would usually think about as the first thing we might use the average doesn't give us an accurate picture between the two data sets and it could lead us it could lead us to very misleading conclusions and and again unfortunately in the united states medical community at least some of them not all of them of course you could you could almost imagine them giving you a data set like like the average and then giving you some diagnosis on it i mean if you'd be like okay i don't know i'm not sure i but in any case let's go ahead and make this one green uh uh or let's see if we could change the color of it to match our green data set and then let's make this one uh let's make this one what did we make it let's change the color of this one to like orange or no i wanted the other way around this one should be orange and this one should be green get your color coding right or people are gonna get confused i'm not trying to confuse people this is supposed to be clear we're making pictures to make things clear not confusing so that's the that's the idea and obviously you know obviously the the the problem is of course this data set is only going to have you know zeros and zeros and twos you would expect whereas this data set has a distribution that's going between you know uh zero and and two you know so you could see how those two data sets are distinct let's make this one a little bit smaller maybe looks a little nicer right these two skinnies should be the same if you want these two the same skinny size you could see the width and pixels when i do this but also you could like take one of them i select the entire one home tab clipboard format painter and then just format paint your other skinny so now you've got the same width of the skinnies also you can what the other way you can do it is you could say like this if this was a wide one and i want to make them the same skinnies i can select this skinny hold down control and then select this one that's skinny and then i can put my cursor between either one of them and because they're both selected and to select both of them that are non-adjacent they're not next to each other you have to hold control down i've now let go of control and now i'm gonna then now that'll make them the same skinny as well so that's about it all right