 Statistics and Excel misleading histogram. Got data? Let's get stuck into it with statistics and Excel. You're not required to but if you have access to one note we're in the icon left-hand side one note and Excel presentations tab of 1050 histogram misleading data. We're also attempting to upload our transcripts so if you go to the first a word from our sponsor. Well actually these are just items that we picked from the YouTube shopping affiliate program but that's actually good for you because these aren't things that were just given to us from some large corporation which we don't even use in exchange for us selling them to you. These are things that we actually researched purchase and use ourselves. Here we have a Western Digital WD elements 20 terabyte USB 3.0 desktop external hard drive we use as part of our backup system noting that if you lower the number of terabytes of storage the price will lower dramatically as well. When you're thinking about a backup system you're usually thinking about an online system or an external hard drive system like this or ideally some combination between the two giving you some redundancy. You can also work directly from an external hard drive like this but there are some drawbacks to doing that one being if you use this as your primary drive you're working from it's no longer a backup drive and you're gonna need a backup system possibly another external hard drive and or some kind of cloud backup system and if you're working on something that takes up a lot of short-term memory a lot of RAM as you're working on it such as video editing the external hard drive can slow up the system so you might want to come up with some kind of system where you download the project you're working on to your computer to your C drive or possibly to a solid state drive which is a much more expensive external hard drive as you do the work once the work is done then save the project to an external hard drive such as this. If you would like a commercial free experience consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com where we have many different courses you can purchase one at a time or have a subscription model given you access to all the courses courses which are well organized have other resources like Excel files and PDF files to download and no commercials. As a reader tool you could change the language to whatever language you so choose and either read or actually listen to the transcript in multiple different languages using the timestamps to tie them into the actual video presentations. Desktop version of one note here we have our data our information on the left hand side where we are imagining we took a random sample of the population completely random and either test for or ask them how many ovaries they have. You can imagine a similar type of situation where we take a random sample of the population and either test for or ask them how many testicles they have for example. Now this information on the left hand side we are imagining is sorted by the people that we asked in the random sample so it's not sorted in any way that's particularly useful to us however we could probably derive some information just from sorting through or looking through this list of data. But let's go through our normal procedural analysis when we have our information the next thing we will typically do is sort that information. So if we sort that information now from lowest to highest we end up with a lot of zeros we get a one here and then we have a lot of twos so when we just sort it like that that could probably in and of itself give us some information about what is going on here. Now the next thing that we might do is of course take our statistical stats the normal one being the average. You'll recall that the average will be taking the sum of the entire thing and then dividing by the number of items that are there we can use an average formula in excel which would simply be the average of this series of data but what it's actually doing is adding up all the data and then dividing by the number how many data items that we had and that gives us 1.06 so it's an average like somewhere close to one around one. Now clearly if someone gave you only that data point and said that they knew something about human beings because they took a random sample of human beings they tested how many ovaries they have and they come to the conclusion that human beings have around one ovary 1.06 ovaries right you can imagine that that might be a little bit misleading if you're only looking at that one piece of data and if there's a doctor unfortunately the state of the United States medical area there's a lot of good doctors out there but there's a few of them you can almost imagine them saying hey look we've got an implant an ovary into you here because you're like you're short an ovary should have an average of one ovary right you can imagine i can i can unfortunately i can kind of imagine that happening if you were to then plot this in a histogram though then it would look something like this and this might give you a more a bigger picture of what is happening if you take the average like okay one ovary but if i plot it then now i'm saying okay well in this bucket i've got zero to one and this bucket two to three and then one to two in the middle so you'll you'll note that most people when they see histograms they start to imagine that the histogram is supposed to if you take a larger sample size the histogram will get more and more like a like a bell shaped curve but that's not always the case there's that's only for certain types of data that that sometimes it will start to mirror a bell shape which we'll talk about later but you might have you know any number of shapes that the data might take and that's why you need to kind of look at the spread of the data in this case the central we don't have the center point here and everything's happening happening out on the the sides of the graph which makes sense of course because obviously what we're looking at is a test that's really kind of determining a man or a woman right that's what the test is really looking at so we're really basically getting a spread in between men and women is the general would be the general idea so clearly if there was a medical procedure or something that was said based on having around one ovaries that we need to give you an ovary or remove an ovary or something like that that would be a problematic conclusion from you know a misuse of just looking at one angle of the data this is clearly a very extreme obvious example of this kind of thing but but that's to point out that this kind of thing you know misleading data can happen clearly and you can imagine many many scenarios where you have a much more nuanced set of data where the misleading might not be so overt so clear but there's clear there could be clearly some misleading information going on now if i if i then take a similar set of data so this is a set of data where we're imagining that there's a score or something that we're taking test scores or something and we're saying that we want results between one and two so now your data set doesn't look like all zeros and then and then a zeros and and twos it looks like it's got a spread between ones and twos i didn't actually sort we didn't sort this data but you can see that it's basically got a spread of numbers between one and two so if i looked at this one and i did our average you'll note that the average still comes to 1.06 these two set of numbers if i look at this one calculation our most common famous calculation then we're going to get a very misleading number we're going to say well these two sets of data are quite similar they're not quite similar they they are in one sense because they all they both have an average of the 1.05 but clearly the spread is going to be is quite dramatically different and could lead to a you know hugely different you know outcomes or thoughts about what this data is so this is just simply the average again if i was to plot this on a histogram even though it has the same average it looks like this now this is the graph similar to you know more of what people kind of imagine the histogram to look like because because most people when they see these histograms are imagining a bell curve on top of it notice that that this it doesn't have to be the you know the graphs don't have to come out that way i just want to make that clear uh and we'll talk about bell curves later but the major point is that clearly the spread of this data is much different than the spread of this data even though it has the same uh the same average so keeping this in mind the what this shouldn't do is is tell us oh man statistics is meaningless the average is meaningless the world has no meaning so i'm just going to stop trying to use the tools to derive meaning uh from the world no we just say you know tool the tools such words statistics they can be you know they can be used to mislead as well as as well as to clarify the tools are designed to clarify and so we have to properly use the tools in order to not make mistakes and in order to safeguard ourselves against people lying with the tools again whether those tools be words or whether they be statistics it's the same it's the same thing they're just misusing they're misusing the tool right if if if for example you can you can take the analogy of this versus a verbal argument and when someone is arguing something you can more clearly say kind of see well the thing that you just said is is a lie it's wrong you took a wrong step uh somewhere and and when you but when you do it was to but they usually start with a with a with something that's truth and there was one there was a famous thing where they there was the Monty pythons where they were trying to try a witch and they were kind of making you know they were they were making this whole thing about a witch trial and they said that uh that a witch should they came to the conclusion that the witch should float or something like that because they took this these logical leaps every time uh they made a statement they took this kind of crazy logical leap that doesn't make sense and we get pretty good when using words to kind of see those logical leaps that are wrong with statistics the reason people get frustrated with them is because we're not as good we don't get as much practice with the statistical logical leaps that are being taken so when people take a logical leap where we might be more likely to miss it and then blame the statistics so if someone was to say hey look the average is 1.06 you can't deny that I just took the average with the excel average function and it gave me 1.06 therefore I need to uh I need to I need to implant an ovary into you because you're short an ovary and you need one because the average right and you see that's just like the Monty Python stick where she's a wood floats and therefore the witch should float because she's like because she will burn like wood or something like that doesn't make any you know obviously a logical step uh went awry somewhere in that and so clearly if we were to do a little bit more more testing whether that be a logical step with words or with statistics we'd say well yeah but wait a second there's something different going on here because when I plot out the data I would expect the average kind of like stuff would be following in the middle here and that doesn't seem to be the case everything seems to be on the edges so it doesn't seem that it's true that you know most people have one ovary right that doesn't something something is wrong with this average it's not that the well the average isn't wrong but your conclusion about the average that most people have like one ovary is clearly totally false right so any case that's a