 Statistics and Excel. Wages, data, box plot, or box and whiskers analysis. Get ready, taking a deep breath, holding it in for 10 seconds and looking forward to a smooth soothing Excel. First, a word from our sponsor. Well, actually these are just items that we picked from the YouTube shopping affiliate program, but that's actually good for you because these aren't things that were just given to us from some large corporation which we don't even use in exchange for us selling them to you. These are things that we actually researched, purchased, and used ourselves. Here we have a Western Digital WD Elements 20TB USB 3.0 Desktop External Hard Drive. This is part of our backup system, noting that if you lower the number of terabytes of storage, the price will lower dramatically as well. When you're thinking about a backup system, you usually think about an online system or an external hard drive system like this, or ideally some combination between the two, giving you some redundancy. You can also work directly from an external hard drive like this, but there are some drawbacks to doing that. One being if you use this as your primary drive you're working from, it's no longer a backup drive, and you're going to need a backup system, possibly another external hard drive and or some kind of cloud backup system. And if you're working on something that takes up a lot of short term memory, a lot of RAM as you're working on it, such as video editing, the external hard drive can slow up the system. So you might want to come up with some kind of system where you download the project you're working on to your computer, to your C drive, or possibly to a solid state drive, which is a much more expensive external hard drive as you do the work. Once the work is done, then save the project to an external hard drive such as this. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com where we have many different courses. You can purchase one at a time or have a subscription model giving you access to all the courses. Courses which are well organized have other resources like Excel files and PDF files to download and no commercials. Here we are in Excel. If you don't have access to this workbook, that's okay. You can go back to a prior presentation where we basically started this practice problem from a blank sheet, using a blank sheet from there. If you do have access to this workbook, three tabs down below, example, practice blank, example in essence, answer key, practice tab, pre-formatted cells within it, helping you focus in on the heart of the practice problem. The blank tab is where we started out with a blank sheet with just data on it and we're practice formatting the Excel worksheet as we go. Let's take a look at that example tab to see where we will be going. We started off with our data about wages we're imagining for a corporation or something like that. We sorted the wage data so we can get more information from it that way. Then we created our box and whisker chart which gives us a kind of pictorial analysis of it and now we want to kind of think about this box and whisker in more detail and plot out some of the components of the plot and whisker in an analytical type of format to get an idea of what is happening within it. Now notice we'll talk more about these calculations later, the average, the mean, the median and so on. Right now we just want to do it kind of in relation to the image so we can kind of get a picture or see what the picture is basically telling us in a bit more detail. Alright, let's go back on over to the blank tab. So what we have done thus far is we have created the data. We inserted a table around the data so I can sort it. It's currently sorted from the largest to the smallest. We then in essence basically just selected the data, went to insert and in the histogram area, in the chart area we added the box and whiskers and then we did a little bit of formatting within the box and whiskers and that's basically where we are at this point in time. So now we'll do some analysis of the box and whiskers. So I'm going to pull this out, I'm going to pull this out, put my cursor on it and just move it over to let's say like column G, let's see if I can make it a little bit smaller and fit it over here and so I could probably move this in a bit and still be good. Maybe not, I don't know. Alright and so then I'm going to make, so then let's make column B a little bit smaller. Now note when I start to actually do calculations, I don't want to do something right next to the table because then it looks like it's part of the table, Excel will get confused thinking that these two things are related. So anytime you start to use a table to do calculations, the common practice is make a skinny cell between the calculations and the table you're working with. I'm going to do that by putting my cursor between B and C so it looks like that and then I'm going to just hold down the left click and bring that down so it's a skinny B. Alright so then let's first take a look at the common calculation of the average or the mean calculation. So this is probably the most famous calculation. It's representing the X on the plot. So in other words, we're looking at this X, it's represented right there. We used the numbers to be able to see the numbers on the box and whiskers and you'll recall we did that by adding the data. So you can add that, we can say okay can we figure that amount. So let's do it a couple different ways, we can. Let's do it a couple different ways. I'm going to hold down, I'm going to make C wide as well. So I'm putting my cursor between C and D. I'm holding down the left click, I'm going to make it a little bit wider here. So I'll make that a little bit wider. And then let's say this is going to be the manual calculation. So if I was going to calculate the average manually, I would just add all these up and then divide by the number of them. Now if I was to count them, I'd have to then number them. I can see the numbers because I start with a 2 here. If I wanted to number them, I can add a column. Let's do that, I'm going to put my cursor on A and I want to put a column to the left of it. So what I'm going to do is select the entire thing. There's nothing underneath it. So I don't have to worry about messing anything up. So I can select the entire column and then right click and insert. Now Excel will always insert to the left because that makes sense. Otherwise you would never be able to insert something on this side. So it's always going to insert to the left. So I'm going to say insert and it makes a column. Now if I type in here just I'm going to put numbers and then enter. Then notice it didn't add it to the table. So if we want it to be part of the table, we'll have to add it to the table. Now maybe we don't want it as part of the table because maybe I just want to count of the cells no matter what I put in the table over here. So I'll keep it as is. I'm just going to say 1, 2, I'm going to select those two and now I'm going to use Excel's understanding that that's a pattern by selecting those two. You can't have just one. So I'm going to select two of them because the next thing you would expect is three, right? So would Excel. So Excel can see that. So I'm going to put my cursor on what they call the fill handle. Left click on it. That's that little square. And then notice you could see that running total as I pull it down. So you can see the running total. And so if I was to count them, I'd have 51 numbers, right? And so I can make this a little smaller. If I want this centered, I can go to the home tab and say the numbers, maybe I want those centered. Be careful with centering numbers by the way because, you know, obviously you want the decimals to line up. But in this case, I think we're okay to do that. So if I was to do this like in a manual method, what we would do is say I can use the sum function to sum up all this. Now to sum this up, I could put my cursor right here. So I have that drop down and notice that sums just that part. I could put my cursor on that first one, hold down control shift and the down arrow. Notice when I do that, I'm way down here. My formula is gone, but I can still see the formula and the formula bar. If I want to get back up to the formula, if I just hit enter, it'll take me back up to the formula. But if I want to get back up top before entering enter without using the scroll button, I can hold down control and then backspace, not delete backspace, right? So there's our sum. Now if I don't close up the brackets on the sum, the most famous formula by the way, the sum function. So this is, you have to know the sum function. Then it'll close it up for me, right? It'll do that for me. And then I can say, well, how many of these are there? Now I can see here that there are 51, but I can also use maybe a count function. If I wanted to count it, I could say equals the count and use Excel to do that. So there's a comment, I'm going to put brackets or you can click on it. So there's the count function and I'm just going to say just count these, right? Just count them. And now I'm just dragging down normally. Now I'm just going to hit enter and it'll bring me back up, right? If I just hit enter, then it brings me back up. If I double click on it, I can see the full formula. It closed up the brackets, so I'm just counting everything in there that comes out to 51. If I divide the two out, our mathematical formula for the average is this divided by this. And that gets us to that 71, 184, which is x on the square on this thing. Notice that this line here is a different number. That's the median, which we'll calculate too. Now if I'm going to combine that together, let's combine that together in one calculation now. So the manual calculation, a somewhat manual, a little bit more manual, so you can see what's going on to calculate an average equals the sum. I'm going to say it's the sum and then I'm going to put my cursor so I see that little drop down. And I know that that puts the dancing ants around all the numbers I want. And so I'm going to take that divided by the count. So now I'm combining two of these functions in one cell, right? Divided by the count, double click on it. And then once again, I'm going to select right when I have that little arrow drop down and the dancing ants are around the numbers that I want. And so there's my formula. So that's one way that you can notice when I hit entered, it's say we found a typo because I didn't close up the brackets. So usually you have to close up the brackets. If you don't, it'll do it for you still. You just say, okay, I just close them up and it does it. So then if I was to use the formula for it, I know this is kind of confusing because that was a formula. But if I use the straightforward formula, it would just be equals the average. So equals the average, double click on that. And once again, I'll put my cursor right up top. So I hit that pointer drop down, dancing ants around the number and boom. So that's the easiest way to do it, average, the mean, same thing. So we'll talk more about the manual calculation, the math later on these, but home tab alignment. Let's indent it here. So I'm going to make that a little indent. So we could see you also oftentimes we'll put like a colon so that we could see that this is the data kind of related to it. Now we want to talk about this bottom whisker, which is being calculated at the 67900. So here's the box. These are the whiskers. There's the bottom whisker. The bottom whisker gets a little bit more complicated when you have the outliers, which we will talk about later. But for now we're going to say this is going to be, in essence, the minimum not including outliers. And so it's the bottom whisker. So this represents the bottom whisker, the minimum not including outliers. So if we do that, if I was to do this manually, if I say manual method, we would say, all right, well, the minimum number, if I sort this from top to bottom is going to be right here. I can even sort it the other way. I can say let's sort it from the lowest to highest. There's the minimum number. However, that's an outlier. That's an outlier. So we're going to pick the one that's not the outlier, which is going to be then this number. So that one, and that's where the 67.9 is here. Now, if you wanted to do that with a formula, then I can kind of mark out the outliers. I can say, well, this one's an outlier. And we know that these two are outliers. So it's kind of the minimum of this middle bit down here. So I can use the min function one way we could do it function. And I can say this is going to be equal to another function, the min. So a common function to try to find the smallest number in a set of data. And we can select that. And then I'm just going to have to kind of pick the middle part here. So I'm going to pick this set of numbers. Now that min function could get thrown off when we start to sort the data in a different format. But that's the general idea with the min function. In other words, you can see that if I sorted the data this way from Z to A. So now it's picking up that same range even though I adjusted, you know, the data. So you've got to be kind of a little bit careful there if I if I sort it back and go from A to Z. So now we're back to that 67.9. So going forward, I'm going to keep the data set, you know, sorted this way from smallest to highest here. So I'm going to select these two again. I'm going to go to the home tab. We're going to go to alignment and indent again. So there's the 67.9. So now we'll take a look at quartile one excluding the median. So I'm going to say this is going to be Q1 excluding the median. See if I spelled that right. We'll go to let's go to the trustee spell checker. No, of course you didn't spell it right back to the home tab. Okay, so that's going to be the bottom of the box. So that's going to be the 69.7. So we could do that with a manual method or a formula type of method as well. So if we're going to use a formula, let's actually do that first. I'll put it on the bottom though because that's what we did up top. So I'm going to say this is the formula method formula. So we can use a function and then we'll try to analyze what the function is doing. So I'm going to say equals to get the function. This is going to be quartile type to start in typing quartile. We have the quartile ex see excluding the median is what that means. We'll talk about that shortly or including the median exclusion is the default. So that's what the box and whiskers is using. You could change the box and whiskers by the way to include the median if you so choose. But we're going to use the default here. There's our argument and then I'm going to put my cursor up top and where we have that arrow again selecting the entire data set. Now notice there's one other kind of argument that we need and we get to the next argument when there's more than one by selecting a comma. So now the next bit of information it needs to do this is the quartile. First median value or third quartile. So I'm going to say first quartiles what we want. Then I want to close it up shift nine closing up the brackets to complete the function and enter. We get the sixty nine seven. There's the sixty nine seven. So in order to kind of get that calculation. Let's first also go to the median and then we'll go back up and do the manual kind of calculation. So let's also think about the median which we can also say is quartile to Q2 and this is going to be the middle line of the box. Right. So it's the middle line of the box which is that seventy nine. Let's do that with a formula and do that one. Now there's a couple different different ways I could do formula for Q2. Or we can use the formula for just the median for the median formula. So if I use that same formula to look at quarter two for example equals quartile. So here's our quartile function double clicking on its same function. And then I'm putting my cursor on that arrow selecting the entire piece putting a comma to get to the second argument. But this time we want argument two which says right here that it's the median value. So I'm going to say number two close up shift nine and there we have our argument. So there's the seventy thousand nine. We can also calculate that with a formula using the median formula same kind of thing. Most probably most likely the formula most people will use because most people will call it the median right. It's the median. So we're going to say double click on that form that function. And then this argument will once again be the data we need is the entire data set. And if I don't close this one up I think we'll probably be OK. It'll close up the brackets. If I don't close up the brackets there's the seventy thousand nine again. So if we think about that well what is it doing it's taking the middle number when we take the median as opposed to the average. Which is taking the whole thing and dividing by two. We'll talk more about those concepts later but that's the general idea. If I count this data set we counted it from one to fifty one. If I say this equals fifty one divided by two the middle point is like twenty six right. So twenty six seventy thousand nine hundred seventy thousand nine hundred. So let's make that green as the middle point right. That's the middle point and we'll make that green. And then if I look from number one down to number twenty five not including the median right. That's what it means by not including the median. Then I can say well that's court that's court tile one. What's the middle point there well there's twenty five numbers. So we could say this is you know twenty five divided by two is going to be you know thirteen right. And we're going to say there's the sixty nine seven. So that's going to be that sixty nine seven. Let's make that like orange or something. So that matches this one and these two are the yellow. So if I did a manual calculation of the quartile that's what I would do right. The manual calculation boom it's right there and we'll make that orange too. And then if I did the manual calculation for the for the median or quartile too. Then we would just do what we just did right and say well there's there it is why did I make it yellow. It should be green should be green get your colors get your color scheme to make sense. You're going to confuse people you'll get bad messages. Let's indent these we're going to go into here home tab alignment indent and then I'll indent these home tab. Alignment and indent again. Okay so now let's let's we can do quartile three that's going to be the seventy two eight the box the top of the box. So we can get an idea what this box plot is doing. So this is going to be Q three excluding the median median. So this is the top of the box. So now we've got the top of the box not the whisker the top of the box. So we can do this with let's do the formula down here this is a form the formula method where we're going to say this is going to be equal to the quartile excluding again. And I'm going to select the entire thing and then tell it I'm scrolling down there it is I'm going to say comma the second bit telling it now we want quartile three quartile three or we can just type in a three or select number three closing it up shift shift zero. You have to close it up when you have more complex formulas oftentimes or else you'll confuse Excel simple formulas it'll often just add that bracket for you. So there's the seventy two eight serves the seventy two eight. Now clearly if you were going to count this right you would exclude the median you would be counting you know the numbers on the second quartile. So and picking the middle number which is going to be that seventy nine so so which we said was seventy two eight so seventy two eight. And then you can kind of count them you know find the middle number and there and there we have it so I'm going to say this is going to be let's make this blue so I'll make that blue. Let's make this blue see if I can get that to work and then the manual method would be simply that we exclude the median and we count the numbers counting you know we can count from a hold on a second. We can count from you know one to whatever you're right and then divide by two. Okay so we know that we get it we get it. Okay so there's that let's make that blue. Let's select these two and then indent a home tab alignment indent. All right and now we're on the maximum the top which whisker here so we're going to say okay. So then the max is q four q four top whisker or just say max max top whisker. Okay now the top whisker here would be the maximum amount if there were no outliers but we have these outliers which we'll talk about in a second. So if I so if I was to do this on a manual method we could say okay well these two are outliers. So it's going to be that's the seventy four two is going to be it right so we're going to say and this should be equal to the seventy four two which is the seventy four two. I could do that with my max formula or let's say formula this equals the max brackets but I have to I'm not going to include the outliers right. I'm going to take the middle bit here and then of course it'll take it'll pick pick that up but just to note how that max formula works because that's a common formula. I'm going to use these two home tab alignment and indent. So now we're we're there and so now we're go into the the calculation of what does it mean to be an outlier and this is kind of a heuristic kind of calculation that it kind of through these through these out here. When does it know to kind of throw these as a dot you know on on the outside. Do that little heuristic calculation which is kind of like a convention type of calculation. Let's call this and we need to know the enter quartile range which we might call the IQR which is basically simply going to be calculated as Q. It's going to be calculated as Q. Three minus Q. One brackets so I can calculate the enter quartile range by taking Q. Three this seventy two eight hundred minus Q. One which is the sixty nine seven and that's going to give us that three thousand one hundred. So then I can say OK well the lower outlier outlier limit is going to be a formula of Q. One Q. It's going to be Q. One minus the enter quartile range times I'm just going to put an X for time times the one point five. That's going to in essence the formula and of course these are the dots. These are the dots so I'm going to say the lower limit. Limit is going to be if I do that formula equals I'm going to now I'm going to use our Excel skills here so I'm going to scroll up to the Q. One so Q. One up top sixty nine seven. Notice I can see this happening in the formula bar and now I've got something in the formula but I can't see my actual cell so I could keep working from the formula bar. Most people like to see the formula so I'll scroll back down and see the formula you know within the cell and so I'm going to say minus. I'm just typing in this formula and and put it in the data cells that we can represent for numbers when I can the brackets. It's the IQ are I'm going to replace that with three thousand one hundred but I'm not going to type it in there because I would like to as much as possible when using Excel. Pick that up from my data source my data references so I'm going to be picking the three thousand one hundred which is represented as cell E twenty seven cell E. Twenty seven times which is the asterisk on the keyboard and then I'm just going to manually type one point five because that's kind of a hard coded number. I'm going to close up the brackets so now we've entered our formula QQ one minus the interquartile range times one point five. So if I enter that's going to give us our sixty five fifty. So basically as a convention any of our data that is over like around this line is going to be thought of basically as an outlier and it's going to throw it out. It's going to throw it out into the out into the cold way past even the whiskers where there's no protection from from the mob or anything. The Twitter mob will get them out there and then we're going to go it's it's horrifying. We're going to let's do some indentation home tab alignment and indent and we'll indent this one to home tab alignment indent. Now if I do the same thing for the upper so we'll say this is the upper outlier limit is going to be Q3 and this time plus instead of minus brackets the interquartile range times one point five. And these again are of course the dots. So now we'll say this is the we'll say this is the upper limit limit and this is going to be equal to same kind of thing we'll say Q3. So I'm going to say equals because it's a formula. I'm going to find Q3. There it is. Click on it. So that cell E 19 which I could see in the formula bar. But most people like to see it down here. Right. I like to see it in my actual cell and then plus brackets. I have and notice by the way if you didn't put brackets around it you'd probably be good right because it's going to multiply before it adds. But I think it makes a little bit clearer to have you know the brackets sometimes and we're going to say the brackets. It's going to be the interquartile range is going to be this cell. So the three thousand one hundred represented or shown in E twenty seven E twenty seven. And then we're going to say times with the asterisk hard coding one point five when I say hard coding. I mean just typing in the number instead of referring to a cell in the data. So that's going to be our seventy seven four fifty. So anything above the seventy seven you know around here. So it's going to throw out again out into the cold where where you know you know it's not where you want to be. Let's just put it that way. I don't want to get too descriptive. I don't want to give anyone nightmares but home tab and then alignment and indent. Actually those outliers are probably good for the salary purposes but that it's still bad because they well anyway I don't know. I'm not going to get into it. But let's go ahead and add some blue now just to format ourselves because usually I like to say that the data input for pieces are going to be blue. Right. So I'm going to say I'll just put a blue box. I'm not going to put some blue over the stuff that I already have stuff in. So I'm going to highlight this stuff and then I'm going to go into into the home tab font drop down. And I use this blue right there often times for the data input because that's what the Excel is fun guy used to use. And so I follow the tradition. I think he switched over to green now but I was doing blue for so long and I like the blue soothing soothing. The green makes me think about that worksheets before Excel which gives me which I don't find soothing. That's scary. So then we're going to go down here we can do the same thing I'll put some bright by the way this is the border. So home tab font group borders and now that blue is right there. Boom. I could put the blue right here and then I'm going to add borders around the whole thing. Borderizing it. So notice I can select the whole thing by holding down control. And so I can select non adjacent cells but you have to be careful with the scroll button. Because remember if you're holding down control and scrolling it'll lower the size of the screen. So you have to let go of control and then put the control back on. You have to let you have to hold on to control loosely a loose hold of control too much too much control when people are trying to hold on too tight. Then that's when when problems happen. So then we're going to say this is going to be there. So I just relax just relax man. Let go of the control from time to when you scroll you have to let go home tab font group and borders. Let's do a quick little review and spelling see if my spelling is okay. So there's a so now hopefully you have a general idea of how these box charts look what these numbers mean on the box charts. Now again we'll get into some of these a few of these calculations more will obviously we'll talk about the median and whatnot a whole lot more. But next time we'll add to this and look at this one of visualization of the data compared to compared to like a histogram. So we'll add a histogram to it next time.