 statistics and excel height statistical inference data excel practice problem get ready taking a deep breath holding it in for 10 seconds looking forward to a smooth soothing excel here we are in excel if you don't have access to this workbook that's okay because we'll basically build this from a blank worksheet so you can just open a blank worksheet but if you do have access to this workbook three tabs down below example practice blank example in essence answer key practice tab having pre-formatted cells so you can focus on the heart of the practice problem the link tab only having the data so we can practice formatting the cells in excel as we work through the practice problem let's go to the example tab to see where we are headed noting that we have the data on the left hand side related to heights and inches pretty long data set that we will be working with we are imagining that this is the complete population of our data we'll take some stats on it like the average in the median we'll make a histogram from the entire data set and then we'll take samples of that data set so we kind of already know the answer of the entire population we are looking at and now we want to think about how close samples will get us to be able to make an inference about what the actual numbers are in the data set all right let's get into it let's get stuck in by going to the blank tab noting that if you don't have some of the data sets then you could try pulling data sets to practice with from coggle kaggle.com it's a good resource in my opinion so we have our information on the left hand side let's first sort our entire worksheet which is what I do typically every time noting that the data set is basically has multiple decimals so we have a question of how many decimals out do we want to take the data as we reference our cells I'm going to scroll in a bit and then I'm going to select the entire sheet put in my cursor on the triangle right clicking on the sheet and let's format the cells I usually go to currency negative numbers bracketed and red dollar sign gone and I'll keep the two decimals which will actually lower the amount of decimals so remember the data sets are a little bit longer than two but I think that will work for us two decimals there we have it I'm also before I unhighlight going to the home tab font group let's make the whole thing bold as well all right so the next thing I want to do is I would like to put this into a table but I also want to be able to kind of randomly mix up this data set so remember the goal here is to think of this as the entire data set and then we're going to imagine that we're going to be taking samples from that so let's first just get an idea of the entire data set itself so I'm going to put a table into this go into the to the home tab I'm sorry insert tab and then the tables group let's make a table out of it so that should select the entire data set because there's no missing cells in here this is a pretty extensively long data set whole lot of numbers if here if I go down to the bottom of this thing we're down to you know 25 000 numbers in it so let's was that right it's uh yeah so let's go ahead and say okay now we've got a table within it I could then sort the data I can see it from lowest to highest or highest to lowest in inches so if you want to convert this clearly to feet then you know you'd have to do a conversion divided by 12 and so on to get to feet but the general idea is there it is now if I imagine this as my entire data set then I would use our calculations we saw before I can make a histogram of this and I can do my calculations of the average and the median and so on let's let's first make a skinny b here I'm going to put my cursor and b and put it in between and make it skinny and let's do our normal statistical calculations let's take the average or mean and I can use my average function to do that equals the average brackets and I'm going to put my cursor on the drop down and select the entire data boom there's that this one by the way I might want to make this a little bit thinner and notice I might want to wrap the text up top so home tab alignment wrapping the text and then maybe I'll put a space I double click in here put a space so that it puts the space there I might want to center it home tab alignment and center okay so there's the average then we might take something like the median using my trusty median function we've seen in the past I'm going to do this fairly quick median double clicking on this and selecting the whole data set that's picking the one in the middle we might want the max let's do that one equals the max these are my standard give me the top value and then we might want the min give me the bottom value equals the min we can also take the core tiles but I'll stop here there's the min value all right I'm going to make this blue and bordered which is my typical kind of formatting for the data input areas home tab font group drop down on the bucket if you don't have this blue I find that by going to the more colors you can use a different blue by the way but I like to use this blue right there it's a nice pleasant blue and then I'm going to put some brackets around it home tab font group drop down borders we want all borders so there are our borders now we can take this and enter a histogram from it select in the entire data set and we're going to go to the insert and then charts and drop down on the histogram I'm just going to insert the histogram boom and it just does it for us and we get this nice bell shaped kind of looking histogram now when we're looking at different sets of data we're not always going to get you know a shape that looks like this but certain sets of data many sets of data's will so when we're talking about natural things oftentimes and we're trying to measure the midpoint and how dispersed things are from it such as height such as weight and those kind of things then oftentimes we do get like a distribution like this and we'll and then we'll get into remember that if I see a distribution where I can think of you know a curve related to it that could be useful if I can come up with a function of the curve because then you have a mathematical calculation of it we'll talk about that later but for right now the idea all we want to do is get the idea well this is the entire population we are imagining this is the entire population so now let's imagine that from this population of data this entire population we take samples of it let's imagine that we could not get the entire population but rather could only get samples and see how close those samples will get us to the actual number now clearly in real life we wouldn't have the entire population we wouldn't know the real number that's the point but obviously if we can test a situation where we know the actual number these are the actual numbers of the entire population and then we do our inference testing taking a sample and then see how close it gets us to the average then we're testing the process that we can then possibly use in other situations where we don't know the answer to the entire population but we can use our statistical tools to try to get an idea of where the middle might be and how confident we could be of it all right so what I'm going to do then is I'd like to be able to scramble this data so that I can come up with