 Statistics and Excel. Histograms with different bucket sizes. Get ready taking a deep breath holding it in for 10 seconds and looking forward to a smooth soothing Excel. Here we are in Excel. If you don't have access to this workbook that's okay because we'll basically build this from a blank worksheet so you can start with just a blank worksheet but if you do have access to this workbook there's three tabs down below we got the example practice and blank example answer key practice tab having pre-formatted cells in it so you can get right to the heart of the practice problem blank tab just having the data in it so we can practice formatting cells in Excel as we work the practice problem let's take a look at the example tab in essence looking at the end result we're going to be sorting our data we're going to imagine this is data of wage data for like a corporation for example and then we're going to make multiple histograms from that data set so we can focus in on how to adjust the histograms and mainly on the buckets so we can see what happens if we change the bucket size the bucket size is going to be quite important when you're trying to visualize your data in the format of a histogram so let's go to the tab the blank tab where we just have our data now remember that if you don't have this data you can use some of the methods we talked about before you can look for some data for wages if you so choose you can just type in this data set if you would like to just simply type in the exact numbers we have or you can use some of the methods that we talked about in prior presentations to build a data set so once we have the data in the system I'm going to format the entire worksheet like I do basically every time selecting the triangle to select the entire worksheet right clicking and then I'm going to go to the format cells and I like to go then to the number group we're in the I usually go to currency bracketed numbers are negative numbers are bracketed and read with no dollar sign and then I don't need the decimals at this point because we're dealing with data that's basically whole numbers so I'm going to remove the decimals as the default for my entire worksheet and then say okay so now it formatted it in a numbers type of format with a comma in the thousands position I want to put a header up top and make this into a table and then sort the data from top to bottom so I'll scroll in a little bit I'm currently at 190 holding control scrolling in on the wheel so now I'm at 220 now remember to put a header I need to put a row above this or a cell above this the easiest way to do that is to select the entire row but if I had something over here for example that might not be what I would want to do because it would push this down so let's try the other method of doing that this time I could select one cell right click on that cell and insert and I have an extra step because now it needs to know where which way I want to push the cells and in this case I want to push the cells down and so then I could say okay it pushes these cells down without disturbing the data on the right all right let's go ahead and type in wages that's going to be my header I'm going to tell excel it is a header by say center in it home tab alignment center you might underline it or something like that too I'm also going to select the entire data set and embolden it home tab font group bold it just so it might be easier to see in a screen recording so you may or may not need to do that I'm going to put a table into this like I usually do with my data sets putting my cursor in the data I could put it anywhere in the data because there are no blank areas in it and then go to the insert tab up top go to the tables group and then add the table and the dancing ants are doing their mamba dance around the around the proper area so I'm going to say okay there is our table we now have this nice drop down up top so the next step is often to sort to the data so I'm going to hit the drop down or just tap on it I'm not going to hit it too hard I'm not going to break it or anything it's going to go from let's take it from z to a we're going from z to a and so there is our breakdown of the data so now let's just go ahead and start building our histograms and and then we'll adjust some of our histograms so to select the data one way to do it is I could put my cursor up top and select the data this way and then I'm going to go into the insert tab as we saw in prior presentations the charts and remember we're not looking at the bar column the bar charts we're looking at the histograms here and that will make the buckets right so the whole point of the histogram is it makes these buckets so I'm going to add the histogram we have a default histogram which is usually pretty good I mean they usually they usually you know do the histogram fairly well so if I look at what it gave me here on the default histogram we could say all right uh let's go ahead so here's our buckets down below let's click on the buckets are really the main thing most of the time that we're focused on so let's click on those buckets and then I'm going to go to the uh this symbol and then the access options so we have access by category so we're which doesn't up isn't appropriate for us because we're looking at actual numbers we've got the automatic buckets that are being put in place here and the bucket size then if I if I look here it's kind of hard to see but because it's on automatic and this is grayed out because you're not typing into it but this default is at that 3400 and then nine buckets so you got one two three four five six seven eight nine buckets and the 3400 now we can customize this by changing either the bandwidth or the number of bends either one will will change the the buckets right either either of those will will change the other one right if in other words if I add more bandwidth it's going to change the the number of bends so for example if I go into here and I say I want to make this uh 2000 then it's going to change the number of bends to 15 so now we have 15 bends that went up if on the other hand if I went back down and I say I want the number of bends to go back down to like 10 notice this number up top is at that 1933 it's going to change that to now 2900 and now if I if I want to refresh this to what it was in the automatic I could always just simply go back to the automatic and then it goes back to its default position now you'll recall we talked a little bit about this before with the overflow bend and the underflow bend so we can kind of group together these outliers uh at the far ends so now just note that if I go down here to this number tick mark if we have the categories usually we're looking at our numbers of the categories but if I hit the drop down you can also say if it was a general currency accounting if it's date date time percent function scientific text and so on so in some cases that might be useful you can also take away the comma if you so choose in the buckets if you want the commas gone for whatever reason for formatting purposes but for and then uh negative numbers the formatting of negatives we don't have any formatting uh of the negatives but usually most of your formatting is going to be up top here now you can also adjust this column so this column has been chosen if I click on that column once again I get my information on the right I'm going to go to the tab or this icon on the right for the access options and usually the major options you're looking at are the minimum which is the auto and the maximum so it picked up here from zero uh 230 as as what it's going to be counting up from now then we also said if I close this back out that remember if you're clicked on the chart itself if I click off the chart these two items are going to be gone right so if I remove them they're gone if you're looking for those two tabs you have to be on the chart then you've got those two tabs where we have the add the chart elements so this is one place we can go to kind of add those chart elements the access uh titles and the date labels and so they'll so forth the quick layout options that we have here right so those are pretty nice you know quick adjustments to the chart the formatting of them which you can also kind of do yourself if you want to do your own formatting the colors of the of the graphs up top and then your default formatting this way of the charts which again gives you kind of another formatting you've got your data up top so here's our data if I wanted to edit the data there's our editing of the data closing this back out closing this back out and then you've got your formatting options as well so there's the formatting options for the chart now the other way you can go into that stuff is sometimes you can right click on on a particular item so oftentimes when I want to add the data labels I'll go here and then add data labels so then I can add the data labels like that and I can then format the data labels if we so choose possibly making them a little larger or bolding them or something and then you have your options on the right which once again you've got the axis you've got the axis titles you can see the titles kind of popped up there if you needed those you've got the chart title which is often which you might put into a chart like this because this is usually representing one you know kind of set of data data labels grid lines if you want the grid lines on or off you know you can remove the grid lines and we've got the legend we don't usually need a legend here and this one gives you more more another place to find you know kind of more options so there we have it all right so now let's make another one and let's just do some comparisons and we'll change some of those options the biggest change I want to think about is well if I was presenting this data what would it look like and remember there's two ways you want to think about this one is I want to make the histogram so it represents the data in such a way that I can get an understanding of it and two I want to know how if I start to change the bucket sizes and possibly the heights of these what would that do if someone was trying to manipulate me with or manipulate someone like advertising or something like that using the data to try to get to a certain narrative in other words if someone's trying to argue their point and they're presenting data to argue their point they might manipulate their graphs to more articulate whatever point that they're that they're looking at and if we're trying to be in an honors conversation with in that environment which is often the case we're going to have to be able to see that so that we can so that we can know what's going on so if I select the data again let's go and say we insert another histogram and I'm going to pull this down to the bottom and so now we can say all right let's let's take the title off of this one and let's change the bucket sizes now so I'm going to click on the bucket sizes and say well what would that do if I went into my categories here and if I if I say that I want a very large buckets like 500 or 5000 space between them that's going to change you know the size of the buckets a lot right so now you got from 55 to 60 if I bring that even even larger to like 7000 now it looks it looks quite centered now at this point in time now that might not be a this may not be like a misrepresentation of the data but it certainly kind of gives you a different feel than this chart right because because of the size of the buckets so so you know it you know if you were arguing here that yeah well there's no really if someone was arguing well there's no really outliers here everything is basically kind of everybody most of people fall in the middle you know kind of thing then if you increase the bucket size you know that might give you that kind of sense or or feel you know so that you know that's one thing that they could look different let's make another one and go the other way so I'm going to I'm going to make another one I'm going to insert my data set here again and say let's insert chart and another histogram let's pull this at the bottom and let's say now we're going to make a whole bunch of buckets so now I'm going to say all right let's let's increase the size of the buckets so I'll make this a little wider and so let's go to our data down below on this one and I'll go to my symbol on the right and we'll say well what if I I bring this down this this bend width down to like 500 so now I just have a $500 difference well now the data is getting so spread out that it might not be giving you relevant information it might it's still it's still giving you a kind of a feel of the of where everything is lying here so that gives you a pretty good feel but you can often you could see how you can go too far here right like if I made this $20 then that that's not all that useful right this looks like a mess right if I bring it up to like $30 then then again it's not exactly a useful thing if I bring it up to $100 still not very helpful $200 right now you're getting something that resembles a data set and again that might be useful to some degree but but notice you're getting a whole different kind of feel for the data when I look at it in this format then this format then this format and so you have so you so when you look and this is what leads people to start thinking that the data is just manipulative it's all relative right what's the point but but the point is that you need to look at it from different angles in order to get in order to get a sense of the data so one angle at the data usually isn't isn't the full thing if you're only getting one picture of the data then you might be being deceived right so if I go you know there might be some manipulation going on there should be multiple pictures to try to get a feel of what is going on for the data ideally right so if I select the data again if I insert again and let's insert another one and let's pull this down and let's say that we bring this down here and let's get rid of the title on this one and this time let's let's possibly adjust this side of things so I could say well let's go to double click it on this one and I go to the axis and it's going from zero you know if I brought this up to like a hundred then now you get like kind of a squat type of graph which gives you a kind of a different look and and feel for it there you might say well and again notice that you know like the difference between if I looked at this one you're saying ah that well it's a pretty tall there's a pretty tall difference between you know some of these but if you were to add a larger column size then everything looks a little bit more squat right you're like oh it's not that big of a difference is it I mean it's like that's like only like an inch difference and this was like two inches different right but but that's because the scale is different you have to know what the scale means you know uh on on the left hand side so that's another kind of thing that could happen if I if I was to another one if I insert another one and make another histogram and we'll like we'll pull this one down and pulling it down okay and so now oh I didn't mean to undo undo so now let's get rid of the title on this one and let's click on this side again and let's go over here and say uh if we if we make the minimum let's say we start the minimum at like at like 10 you know so now right so now it started at 10 so now these you know these these ones down here are kind of been wiped out that would be somewhat of an unusual that looks manipulative right to do that looks a little bit funny unless they were really tall uh type of ones right so then but again you can see how that of course would would give a different feel than than this one right you're like oh you know there so now let's try something where they where they we were we removed the outliers so let's add another one let's select the data again select the data uh and then I'm going to scroll down before I insert so hopefully it'll insert kind of down here and then I'm going to go to insert well I still got the data selected but I'm scroll down histogram insert and boom all right so now let's say let's say that I go to this bucket size but I want to get rid of these these outliers right that's another thing that might be useful to do but it also might distort the data a bit right because the outliers you know could skew the data we'll talk more about it kind of later but it could but it also is somewhat misrepresenting the data to some degree as well because they're outliers that's not the norm so let's let's go into the the information on on the right for this for this brackets and I'm going to leave well we could go from from automatic to like let's say we let's say we wanted it at like 2000 as the separation so there's that's a pretty decent separation so you're like all right that looks good but maybe these these are kind of messing it up on the outside and kind of making my graph look a little messy so we could say well what if I will not hear what if I go to my overflow bin and I say that every so it's currently on auto if I make everything that's over like like I'm going to bring it in here like 77000 77000 then then it kind of packs it together right so now you've got this all the stuff that was way on the side is kind of packed in it's kind of nice so on one side because now I've got a now I don't have this data with all these empty buckets in it but at the same time it's a little bit misrepresentative because you're like well how outside were those outliers right and then if I click on this one the other side this one's an outlier on this end let's bring it into like I don't know like 63000 right so we'll say this one 63000 so so now you've got everything over 63000 so you notice that kind of cleans up the data because you're like yeah those are outliers and I just packed them in so that you can actually read the bucket size which makes sense but it also distorts a little bit the look and feel of the outliers so you can see how even when you're trying to be as honest as possible when you're choosing the bucket size it can be difficult and when you're dealing with when you're dealing with people who you know have an agenda meaning they're trying to argue for a particular case you know that going into it then you're going to have to be a little bit skeptical as to how they're going to present the data right so then you got to be like well you know what why did they choose to present the data in this way are they really presenting the data accurately are they giving you multiple pictures of the data or they are or are they trying to kind of distort the data by possibly manipulating things such as the bucket size the outliers and the height of the graph by manipulating the y-axis so these are and this is why again what you really would like to do is say I would like to get a look at the actual data and and do some of this testing on my own so I can make my own you know representations of the data and get an accurate idea of what is what is being said here buy it