 statistics and excel height statistical inference data excel practice problem get ready taking a deep breath holding it in for 10 seconds looking forward to a smooth soothing excel first a word from our sponsor yeah actually we're sponsoring ourselves on this one because apparently the merchandisers they don't want to be seen with us but but that's okay whatever because our merchandise is is better than their stupid stuff anyways like our accounting rocks product line if you're not crunching cords using excel you're doing it wrong a must-have product because the fact as everyone knows of accounting being one of the highest forms of artistic expression means accountants have a requirement the obligation a duty to share the tools necessary to properly channel the creative muse and the muse she rarely speaks more clearly than through the beautiful symmetry of spreadsheets so get the shirt because the creative muse she could use a new pair of shoes if you would like a commercial free experience consider subscribing to our website at accounting instruction dot com or accounting instruction dot think of it dot com here we are in excel if you don't have access to this workbook that's okay because we'll basically build this from a blank worksheet so you can just open a blank worksheet but if you do have access to this workbook three tabs down below example practice blank example in essence answer key practice tab having pre-formatted cells so you can focus on the heart of the practice problem the link tab only having the data so we can practice formatting the cells in excel as we work through the practice problem let's go to the example tab to see where we are headed noting that we have the data on the left hand side related to heights and inches pretty long data set that we will be working with we are imagining that this is the complete population of our data we'll take some stats on it like the average in the median we'll make a histogram from the entire data set and then we'll take samples of that data set so we kind of already know the answer of the entire population we are looking at and now we want to think about how close samples will get us to be able to make an inference about what the actual numbers are in the data set all right let's get into it let's get stuck in by going to the blank tab noting that if you don't have some of the data sets then you could try pulling data sets to practice with from coggle kaggl.com it's a good resource in my opinion so we have our information on the left hand side let's first sort our entire worksheet which is what i do typically every time noting that the data set is basically has multiple decimals so we have a question of how many decimals out do we want to take the data as we reference our cells i'm going to scroll in a bit and then i'm going to select the entire sheet put in my cursor on the triangle right clicking on the sheet and let's format the cells i usually go to currency negative numbers bracketed and red dollar sign gone and i'll keep the two decimals which will actually lower the amount of decimals so remember the data sets are a little bit longer than two but i think that will work for us two decimals there we have it i'm also before i unhighlight go into the home tab font group let's make the whole thing bold as well all right so the next thing i want to do is i would like to put this into a table but i also want to be able to kind of randomly mix up this data set so remember the goal here is to think of this as the entire data set and then we're going to imagine that we're going to be taking samples from that so let's first just get an idea of the entire data set itself so i'm going to put a table into this go into the home tab i'm sorry insert tab and then the tables group let's make a table out of it so that should select the entire data set because there's no missing cells in here this is a pretty extensively long data set whole lot of numbers if here if i go down to the bottom of this thing we're down to you know 25 000 numbers in it so let's was that right it's uh yeah so let's go ahead and say okay now we've got a table within it i could then sort the data i can see it from lowest to highest or uh highest to lowest uh in inches so if you want to convert this clearly two feet then you know you'd have to do a conversion divided by 12 and so on to get uh to beat but the general idea is there it is now if i imagine this as my entire data set then i would use our calculations we saw before i can make a histogram of this and i can do my calculations of the average and the median and so on let's let's first make a skinny b here i'm going to put my cursor in b and put it in between and make it skinny and let's do our normal statistical calculations let's take the average or mean and i can use my average function to do that equals the average brackets and i'm going to put my cursor on the drop down and select the entire data boom there's that this one by the way i might want to make this a little bit thinner and notice i i might want to wrap the text up top so home tab alignment wrapping the text and then maybe i'll put a space i double click in here put a space so that it puts the space there i might want to center it home tab alignment and center okay so there's the average then we might take something like the median using my trusty median function we've seen in the past i'm going to do this fairly quick median double clicking on this and selecting the whole data set that's picking the one in the middle we might want the max let's do that one equals the max these are my standard give me the top value and then we might want the min give me the bottom value equals the min we can also take the core tiles but i'll stop here there's the min value all right i'm going to make this blue and bordered which is my typical kind of formatting for the data input areas home tab font group drop down on the bucket if you don't have this blue i find that by going to the more colors you can use a different blue by the way but i like to use this blue right there it's a nice pleasant blue and then i'm going to put some brackets around it home tab font group drop down borders we want all borders so there are our borders now we can take this and enter a histogram from it select in the entire data set and we're going to go to the insert and then charts and drop down on the histogram i'm just going to insert the histogram boom and it just does it for us and we get this nice bell shaped kind of looking histogram now when we're looking at different sets of data we're not always going to get you know a shape that looks like this but certain sets of data many sets of data's will so when we're talking about natural things oftentimes and we're trying to measure the midpoint and how dispersed things are from it such as heights such as weight and those kind of things then oftentimes we do get like a distribution like this and we'll leave and then we'll get into remember that if i see a distribution where i can think of you know a curve related to it that could be useful if i can come up with a function of the curve because then you have a mathematical calculation of it we'll talk about that later but for right now the idea all we want to do is say is get the idea well this is the entire population we are imagining this is the entire population so now let's imagine that from this population of data this entire population we take samples of it let's imagine that we could not get the entire population but rather could only get samples and see how close those samples will get us to the actual number now clearly in real life we wouldn't have the entire population we wouldn't know the real number that's the point but obviously if we can test a situation where we know the actual number these are the actual numbers of the entire population and then we can do our inference testing taking a sample and then see how close it gets us to the average then we're testing the process that we can then possibly use in other situations where we don't know the answer to the entire population but we can use our statistical tools to try to get an idea of where the middle might be and how confident we could be of it all right so what i'm going to do then is i'd like to be able to scramble this data so that i can come up with like a random sample so the question would be then to be coming from how do we get a random sample now in practice of course you can somehow randomly choose the population of people of people that has its own problems in real life because we have to figure out how exactly are we going to do that if you're just working with the number statistically then the question is well here's my population how could i get a random sample of this now uh we do we've got that random number generator that we could use so what i'm going to do is i'm going to copy this entire thing and i'm going to put that over here and i'm going to paste it just one two three meaning just the values so i've got all the values they've been sorted but all the values are there and then i'm going to put next to it my random uh random number generator i'll just call it rand and then i'm going to then create my my random numbers over here so i can make a random number generator by see he's saying equals r a n d so there's a random number and i'm just going to hit enter it usually puts it in as a decimal if i go to the home tab numbers and add the decimals it's a very long decimal number and then i can populate this all the way down to to match the height data so i have a random number if i double click on the fill handle let me do that again control z i could like put my cursor here and drag it down but the easiest way to get it down because it's right next to this other set of data is to put your cursor on that little fill handle right there and then double click on it and that should take it all the way down to that to that bottom point of these numbers so then i'm going to enter a table around this to kind of connect these two things go into the insert tab up top and we'll go to the tables and i'm going to insert a table so now it should select the entire table because because there's no empty cells so hopefully that picked up the whole thing i could see the endpoint being at that 25001 so i'm going to say okay there it is i'll make this one i'll double i'll make this a little smaller again dragging that in i'm going to wrap the text on on these header cells home tab home tab alignment wrapping the text i'm going to center them and there we have it so now these are kind of connected together so now if i want a random selection i can sort by the random numbers these are all random numbers so every time i click on them they reshuffle so if i sort by the random numbers it's going to give us a a a random sorting to the numbers to the right now the problem here of course is that every time i do something it shuffles again so what i'm going to have to do is get a static number random number over here so now i can just simply copy these i can take these two if i and i can that's my random number generator and now i can paste them right here i'm going to paste them one two three though right clicking pasting one two three because i don't i don't want the actual random function to show up and then if i want to include include the formatting i could actually right click and insert the formatting as well but what i'd rather do is just make another table out of it i could put the formatting on it like this and so then i've got i've got it nice and formatted but i don't really need that second step because i'm going to add the table so i'm just going to click in it go to insert uh tables and add the table again and so boom so now we've got a random number generator that's not going to shuffle around all the time it's actually a very long number i can sort by the random numbers and that should give us kind of a random selection of the numbers on the right and i can select just how many of the sample that i want by by picking however long of a sample i want to be looking at let's start with a simple sample of 10 later we will get into concepts of how large does the sample need to be to provide a certain level of confidence but for right now let's just get the idea of picking the random sample and using our tools in excel to simulate those random samples so we have shuffled the items we're going to say if i just picked the first 10 now that is going to be our random sample we are currently in row number two so i can go from two down to 11 and that'll be 10 items so i'm just going to put my cursor on p2 select down to p11 right click and copy or you can say control c right click and copy i'm going to put that over in s2 here right click and i'm not going to paste them normal because i don't want the formatting instead pasting 123 and then i'll just label it maybe sample up top so so now if i just look at these numbers and to see whether or not they are representative remember that the actual numbers were here so there's the average the mean the median if i represent that data down below just so we can see it kind of side by side i'll put it down here somewhere in s17 i'm just going to say equals i'm going to scroll to the left to find that table and i'm just going to recreate that table i'm going to put my cursor in the average enter and now i'm going to put my cursor in this cell and copy it to the right and it should pull in the relative reference to the right so there's the average and then i'm going to copy it down and it should give us the relative numbers down so here's that middle point that we usually refer to on the average uh 67 99 the numbers we picked up 68 like you can take the difference between the two you could say the average is that so the difference so the average so the diff difference or is going to be equal to this minus this right and the average is the same all the way down so i can copy that all the way down say there's my average and each point on the average difference there's the difference on each point from the average notice that some are over and some are under that's kind of what we would expect if we did just a random sample on the averages and the tendency of course the idea would be that we we start to go towards the middle by doing the random uh samples of of of them right and if i took if i took the average of all the ones we took the average of the sample i can say this is going to be the average of the 10 that we pulled out and we get something we get the uh 6801 which you know it's pretty close in this case to the average now notice 10 out of this whole population is a fairly small you know numbers so we so we could we could run larger numbers unless and we want to do this for a couple different reasons obviously a larger sample uh could give us more confidence which we'll talk more about technically later but also we just want to kind of practice how we can use our statistical uh tools in excel to to to say well what if i wanted to kind of simulate the the idea that i ran uh that i ran 10 10 tests of 10 tests of 100 so now we can up we can up it a lot how can i do that kind of easily in excel there's different techniques i'm going to make this one a little bit skinnier i'll make this skinnier i'm going to put my formatting format by selecting all of this and go to the home tab font group borders i'll make them blue that's just my normal thing and i'll do the same down let's pull these i'm going to highlight these and cut them right click and cut instead of copy and then right click and paste that just moves them to the right so that i can see the average or mean and then home tab font group brackets and blue on that all right so so let's also make a skinny v so one technique i could use i could say well let me just copy the entire thing that has my random generator over here and then and i'll copy it over 10 times and then i'll randomly generate 10 sets of numbers and then i'll go through and i'll i'll pick up and i'll cut off like the extra numbers so let me show what i mean what i'm trying to get at is to make 100 uh i'm i'm sorry to make 10 tests of 100 so i can take this is my random generator of the entire population let me copy those two i'm going to copy l to m and then just simply right click and copy and then i'm still i'm just going to paste it as they are 10 times so i'm going to put my cursor on on w right click and paste i hold on a sec i copied again let me do it again copy these two and then put my cursor right click and paste just normal pasting and then i'm going to do that 10 times there's one i'm going to do it on z control v or paste there's two over here control v or paste i'm going to use the clicks uh two control v three control v four control v five control v six control v seven control v eight control v nine and control v ten so now we've got our data so now what i'm going to do is i'm going to i'm going to make uh all of the center points smaller so i've got these non adjacent cells we're just practicing using excel here i'm going to put my cursor on y hold down control put my cursor on a b let go of control when you're moving around usually but i hold down control so i can select these non adjacent or not next to each other columns so i'll select this one holding down control this one holding down control holding down control now notice i messed up because and if i do that it's not going to work because now i got the cell in here so if i try to manipulate this column they might not all manipulate together so i got to do it again i'm going to say okay i messed up why hold down control hold down control control i'm going to select all these middle ones boom boom boom and then boom boom boom and now when i make them skinny they should all become the same skinny we want them to be the same skinny all right so there we have uh all of our random numbers so now because these random numbers generate every time i can just sort by whatever the random numbers on so the 65 should change if i go from you know a to z so i'm just going to randomly sort these all of these from a to z and they should randomly sort up so i gotta do this 10 times so i think i'm on this one anyone that has that 65 up top so there's that one changed this i think i changed this one so randomly sort each one so i'm on a u randomly sort randomly sort and randomly sort okay so now now what i want to do is copy just the results and then i'll paste them into a table so i'll have my 10 result columns so i'm going to say all right i'm going to put my cursor on just the x the result i'm going to hold down control select the results holding down control letting go of control as i move to the right so i can hold to control again and select all of the columns letting go of control holding control again and then control c or right click and copy so now i've copied even though they're not all next to each other when i paste them they will be all next to each other so i'm going to put them in a in b e right click and paste and then i don't really need the table formatting so i'm just going to paste one two three and there and there we have it so now we have all of our our data and i might call this you know sample this will be sample one tab sample two and then excel will be able to read that even though i have a label in front of it so i select those two cells put my cursor on the fill handle drag it to the right and so now excel can see those i'm going to make this a header i'm going to instead of putting a table in i'm going to go to the home tab font group drop down on the bucket and i usually make the the headers black and white black and then the labels white if i don't insert a table and then i'm going to go to the centering right there so there we have it and then on this data what i want is it to go down to a hundred so let's do a count here and i'll count this let's make this black white and centered and i'll say this is simply going to be one two and so on i'll select those two put my cursor on the fill handle and drag it down and you can see it it's predicting the next number down to 100 so we'll bring that on down to a hundred so there we have it right there and then everything beyond that i want to delete the data it's because we just wanted a random sample of 100 so i'm going to copy i'm going to hold control i'm holding shift and selecting these you can do it this way or you can use the keyboard and hold and then i'm going to select control shift down arrow and that takes me directly to the bottom number and then i simply want to delete all of that stuff so now we've got basically a random sample of our population right so this is just some how we can kind of practice with this stuff in excel and then if i if i select all this data let's just format it like i normally do with the uh the blue so i'll select this thing and we'll go to the insert tab home tab i mean font borders and i'll make it blue all right and then we'll and then we'll do our we can we can take our average of all of them average and i can take so i can look at each of these samples and see how close you know they line up to the actual so but now and now let's take the average of all of the of the 100 samples that we took the average brackets i'm hitting the up arrow and then i'm holding down control shift and up and now i'm just simply holding shift so i don't pick up the header and down and so you can see the formula still up top in the formula bar average b e2 colon to be 101 and enter so now i can if i double click there's the formula if i put my cursor on that i can put my cursor on the fill handle now left click and drag that to the right and it will pick up all of the relative references all the way to the right now because this is my conclusion line maybe i select this and make it like a different color like say home tab font group drop down let's make it blue and white and then you might put like underlines here you might put underline select in this column home tab and font group and center so so now i can compare this to my to the actual this is these are my samples now i took 10 samples of 100 this time let's compare that to the actual data so i'm going to hit equals down here again and i'm going to go back to my and i'm going to go back to that where i calculated the the sample i did it over here somewhere there it is so i want to pick up this table it's in i put it in cell s 17 s 17 enter i'm just going to build that table again putting my cursor on this cell fill handle drag it to the right and then i'm going to drag it down i don't need this middle column because nothing's in it and then let's format this by going to the home tab font group bucket blue and border so there's the actual amount for the population remember again that in this case we have the entire population right and we're trying to look at a sample to see how close if i was to randomly take a sample we get to the numbers that we already know from the entire population so that we can so that we can then apply those concepts to times when we don't know the entire population right so this is the actual kind of answer in terms of the mean and we get pretty close right we got the the 68 67 and so on to to that with a with a sample of 100 so now you might want to represent this sample as a column so just to practice our excel skills here i could then copy this and say i would like to see this in the format of a column as the results of my 10 samples so let's copy that i'm going to go up top and say let's put that i'm going to put it over here in br br right click and i'm going to paste it just one two three because i just want the values not the formulas so so there there it is here and then if i want to take those numbers and flip it so it's on the vertical i can put my cursor in b q right actually i should have copied it let's copy it again let's copy those numbers copy those numbers i'm going to put my cursor in b q and i'm going to right click special because i want to switch it to be it to transpose it so i want to transpose down here here and now it'll put that vertical that's a useful tool to know so if i then select these i'm going to delete these don't need these because i got them vertical now and then again you can you can kind of see there's there's the average uh this is the this is the sample sample average and then this is the actual average let's say and then the difference difference and let's make this a table select or let's make it a a header home tab font group i'm going to make it black and white and then i'm going to go to the alignment and uh wrap the text and then center it and then uh the average i'm going to say this equals and i'm going to scroll down to that average that i got down below way down way down in my table there's the actual average and then that's the same all the way down so i'm just going to say this equals the same number equals the one above it and then if i copy that down it will always equal the one above it putting my cursor on the fill handle dragging down so it always equals the one above it and then the difference this is going to be equal to the sample minus the actual there's our difference which i can put my cursor on the fill handle drag it down i can copy and paste it or i can just double click and that'll take it down so now you've got some that are over and uh and some that are that are under on the average so remember what we did here we took we took samples of a hundred and we could analyze each of these one samples how close they are to the average but then we took the average of each of these 10 hundred uh there's actually 11 of them 11 samples and then uh and then and so these are the averages of the 11 samples and now we're comparing those to the actual and again we would expect some of the averages to be of above and below they're not going to be exact even though we took a larger sample from 10 to 100 this time but we would expect that they would approximate closer towards you know a middle a middle point with this kind of spread this type of data so if i select this and let's put our our tools around now notice you might also let's insert another column i'm going to insert a column between bp and bq so i'm going to do that to insert an entire column it always inserts to the left so if i select bq the entire column right click on the selected and insert it will insert a column to the left now you got to be careful of doing that if there's anything below it that's going to be messed up but there's nothing below here so that's the easiest way to kind of move this stuff to the right so the other way if there was something below then if i undo that uh hold on a second it's thinking if i undo that if there was something like below here i can select just these items i can pull it to the right like this or i could i could cut right click and cut which is more efficient and paste it to the right like that or one more time i could select these cells and then i want to shift these to the right by right clicking let's it's now i got to get rid of the dancing ants so i'm going to double click select these cells right click and insert and then i can say i want to shift them not down but rather to the right so then i can shift them to the right so a bunch of different ways you can do the same thing which will be useful depending on whether there's data to the bottom or to the right of it for example so then i could say let's take the average of the averages right i could say well i did i took the 100 tests and i got the average for each test let's take the average of the averages and so that gives us the the 67.99 which is which is pretty which is like right on basically you know we can add more decimals because this is all this is actually longer decimals than two so it's not going to be perfect but you know you get you get pretty close with that because that's pretty that's a lot that you know that we kind of come so let's make this font group and brackets and put some borders around this and then now of course if you wanted to you can also insert histograms from this information we could make like 11 histograms for each of the data sets of 100 data points we could make a histogram as well of the averages of the results but there's only 11 of them so that might not be enough data to do that but you know if you chose like the entire sample let's say sample 11 here and i'm going to hold down control shift down to the bottom and then i'm holding down shift to not pick up the total so that i and then i scroll up so i just want to go down to 101 not 102 and then i'm going to hold down control backspace taking me back to the top i know i'm using a lot of keystrokes here but i'm just trying to point out that if you have a large set of data that's going to be more efficient then then like scrolling down and then scrolling up although you can do either one even with a hundred data points it's not too bad to scroll down but when you get really large data sets then it's useful to use the keys strokes so then we can go to the insert and we can go to the charts and insert a histogram so now you've got a histogram of that last sample uh man i deleted the of that last sample so you could you could see it kind of approximates the actual histogram right here's the action you could say here's the actual histogram and then here's our sample that last sample uh of of 100 so it might be let's let's put uh the sample of 100 if i did a couple of these let's say we did number 10 control i'm gonna put in my control shift down shift up and then control backspace and then i'm going to put this one next to it over here insert charts histogram boom so here's here's another one of sample 10 so right so and then let's just make one more to get an idea we got a different spreads of the data of 100 shift down and uh and uh shift up and then control backspace and then i'm gonna scroll to the right shift to the lu scroll to the right and then skip to the lu and then scroll to the right that's a song skip to the lu my darlin i'm not sure what the lu i think that's a instrument but whatever uh then we're going to go to the insert tab and then charts drop down histogram and so here's another histogram and so you can get an idea of you know when i when i when i pull these what's the spread of each of these of the of each of these samples of uh of 100 and and notice one of the questions we come to is well how close are two we to the center point that's what we've been looking at and also you know what is the characteristic shape of the spread uh also kind of similar to the actual data set that's another kind of question that we would like to be able to know and it can help us to then also think about how confident we are that like the actual center point and and the distance you know the distances between them is is uh is correct which we'll get into more technically later but for now we want to just practice using our excel tools as well to kind of think about how we can create some of these random samples and manipulate uh uh some of the data and then and then do our calculations on them