 Statistics and Excel. Normal Distribution Heights of Baseball Player's Data Example. Get ready, taking a deep breath, holding it in for 10 seconds, looking forward to a smooth soothing Excel. Here we are in Excel. If you don't have access to this workbook, that's okay because we'll basically build this from a blank worksheet. But if you do have access, five tabs down below. Two green example tabs in essence answer key. The two blue practice tabs having pre-formatted cells in them so you can get right down to the heart of the practice problem. The blank tab having a blank worksheet with just our practice data in it so we can practice formatting the cells within Excel as we work through the practice problem. If you don't have access to any data, resources you can check out are Kaggle.com, K-A-G-G-L-E.com. You can also try to create your own randomly generated data using a tool we looked at in prior presentations which can be found under the data tab. Analysis and data analysis. We also showed how to get this analysis tab if you don't have it in your toolbar in Excel. Let's go to the example tab to get an idea of what we will be doing. We're working with a bell curve or normal distribution. This time we're going to look at an example of baseball players. So we're talking about baseball players and we have some data related to them and we want to think about the height data. So first we're going to take our data set and by the way you might be able to get something similar to this on a Kaggle for example to get a data set. And there's a lot of data out there for baseball players. Baseball is one of those types of sports that has a lot of statistics related to them. So in any case we're taking our data here. I'm going to put it on a separate tab because I might want to actually sort this data. In this case we're going to look just for the pitchers, the pitching position and drill down on that. Then I'm going to take that data and create our data from there so that we can use it to then calculate the mean, the standard deviation and so on. And then we'll see if it does approximate a bell curve type of scenario. If it does, which we hope it will here of course, then we'll actually plot out a smooth bell curve. And we'll practice plotting out a bell curve where we can draw a line to see the upper limit, the lower limit and a between type of scenario with this data. So we'll get kind of fancy with the bell curves. It'll take a couple presentations to get to there. Alright so let's go to the blank tab. Now so I'm going to remove this Kaggle here. We're going to imagine that we have downloaded this data set. So we have this information, the name, the team, the position, the heights and the weights and the age. Now we're imagining maybe that we want to be a pitcher when we grow up or something like that. When I grow up, I still have time. I'm still growing. No I'm not but I don't know. I could be a pitcher. How tall do I have to be in order to do that might be a question so we can see if it's within our range possibly of possibilities in the world. So we'd like to sort this by pitchers then and then maybe look at the heights for the pitchers and see what's the kind of body type of pitchers based on this data. So to do that first I might insert a table. Now I'm going to make the top row bold so that when I insert a table Excel will be able to know that the top bit is a header. So I'm going to make it home tab and bold. Then I'm just going to select one cell within this whole set of data. That should be enough because there's no blank cells in this data hopefully. So when I just then go into the insert tab up top tables group and insert a table. Hopefully it picks up the dancing answer dancing around the entire date table and they have their like it's like a rain dance but it's a table creation dance. And then I can just say OK and then it makes the table they magically danced the table into existence those dancing ants and then of course I can sort it by the heights. So I'm going to sort it by Z to a and so there we have it now it's useful to look at the data set say is there anything that's that's like an outlier here like they left a blank data set or something like that. And here that would not be useful to our data set or would throw us off we can sometimes see that by selecting the drop down up top and see if there was a zero here. Then you would say well the zeros they must have not entered anything or if it says none or blank. Then we can remove those and the reason oftentimes you might want a data set or one reason you might want a data set in a separate. Tab is because you might start filtering in that way if you didn't want like the low ones. Now you're filtering and if I was to do something on the right I might have some hidden columns. So if I if I filtered out some of these large ones you can see the columns are now go from one to five. So if I start doing something on the right hand side and I filtered then that could cause me problems. So I'm going to say all right let me just let me just unfilter that but I'm going to keep my data in a separate tab this time and just copy over what I need. Now this is also an area where pivot tables can kind of be useful sometime to grab the data but in this case I can just grab this column of data quite easily. So what I'm going to do is I'm going to add another tab I'm going to say another tab and this first tab I'm going to say double click on it to name it. I'm going to say it's my blank data tab and then the second one I'll double click on it and I'm just going to call this one my blank tab. And then I usually right click on it and I'm going to make it white a different color. So now there's that's how I do those tabs. So then I'm going to select the whole column D put in my cursor on the drop down arrow control C or copy dancing ants doing their copy Mambo dance. They all their dances look the same by the way but they're totally different if you zoomed in because you know obviously if you wanted a copy they'd have to dance different than you know doing the other stuff. But if we paste it we're going to paste it one two three paste it one two three so there we have it. Now once again I might want to put a table just around this data so I can sort this data I don't really want to filter this data because I'm going to be working on the right hand side. I still like to have that sort in field so I can take it from top to bottom or bottom to top. So I'm going to go to the insert tab tables insert a table dancing ants doing their table dance table dance boom and then I'll make this a little smaller. I'm going to wrap the title home tab alignment wrap it wrap it wrap it I say because that's what I want you to do today. That's my wrap. Here we go. All right so then we're going to then we'll do some calculations here so I'm going to say the mean the standard deviation I'm just going to call sd we could have sigma which would be more formal but it's easier just to put sd for now. And then I'm also going to do the median and maybe even the mode maybe sometimes the mode if you're in the mood for the mode. Then sometimes it's useful. I'm also going to select this entire thing and let's do some formatting before I get I'm getting ahead of myself. Slow it down hold your horse hold your horse for crying out loud the horse needs a hug. That's why you need to hold it right click we're going to format.