 Statistics and Excel, Bell Curve, People's Weight Data Example. Got data? Let's get stuck into it with Statistics and Excel. You're not required to, but if you have access to OneNote, we're in the icon left-hand side. OneNote Presentation, 1626 Bell Curve, People Weight Example tab. We're also uploading transcripts to OneNote so you can go to the View tab, Immersive Reader Tool, change the language if you so choose so that you can either read or listen to the transcripts in multiple different languages tying in the timestamps to the actual video presentations. OneNote desktop version here in prior presentations. We've been thinking about how we can represent different data sets both with mathematical representations such as the average or mean, the median, the mode, quartiles, and pictorial representations like the box and whiskers, the histogram. The histogram being the primary tool we visualize when thinking about the spread of the data, being able to use descriptive terms to describe the spread of the data on a histogram such as the data is skewed to the left, the data is skewed to the right. We then wanted to think of formulas to graph lines or curves that may in certain cases approximate the actual data sets. Whenever we can do that, it would be great to do that because the formulas allow us to have more predictive power over whatever the data set is representing. We've talked about different kinds of data that might conform to different curves in the past such as a uniform distribution, binomial distribution, Poisson distribution, exponential distribution, for example, we're continuing on with the most famous of them, the normal or bell curve type of distribution. Remembering that not all data is going to conform to any of these types of distributions. We could have a data that just has no pattern that could be represented by a simple curve but many things could be represented in nature and when we're thinking about bell curves a lot of things in nature like heights, like weights, like if we're thinking about how close estimates are to a particular thing these are often things that might be represented by a bell curve. Now, if you don't have any actual data to practice with you might want to check out Kaggle.com to get some data sets to practice with in Excel or you can actually create your own data sets using a tool in Excel that we have talked about in prior presentations. So this time here's going to be our data on the left where we have height data and weight data. One of the primary differences with this data set then in prior data sets that we have looked at is that we have a whole lot more data this time which means that if we're talking about something that will conform to a bell curve it's more likely that if I create a histogram, the pictorial representation of the data then it's going to look more bell shaped because we have a whole lot more data. If we have less data it still might be something that should conform to basically a bell shaped distribution but we don't have as much data to kind of represent it's not going to look like as much of a bell shape even though it might be clustered more jaggedly in a bell shape type of position. And also we're going to do a little bit different graphs in terms of graphing our bell curve to ask questions such as what if we're talking about the area under the curve at the top of the curve or at the bottom of the curve or in the middle. Alright so this is our data same kind of thing that we've done in the past we want to then look at the data and say does this data conform to a bell curve type of distribution if it does then we can plot the actual bell shape distribution and ask questions about it from there. So we're going to take the mean of the data which is simply the average of the data this would be the formula in Excel taking the weights we're looking at the weights here all of the numbers divided by the count of the number and the average in pounds is 127 pounds on the average the standard deviation then is calculated this way standard deviation for the population in this case is just going to be all of the weights and that gives us 11.66 a measure of the spread of the data the median which is the one in the middle calculated thusly with this formula equals the median of this data if we were to sort the data from bottom to top then it would be the one in the middle picking the one in the middle which is 127 pounds about that's quite close to the mean which is an indication that this data might conform to a bell curve so at this time we're saying hey look it's weights so that's something in nature I'm already thinking this might conform to a bell curve based on my intuition there then the mean is the same as the median that's another indication if I take the mode which is the one that's going to be the number represents or comes up multiple times I'm looking at this formula just the single mode note that it's a little less likely if we didn't have so much data that the mode would be useful because we have decimal points so that means that if we didn't have a whole lot of data it would be less likely that we got a whole lot of multiple occurrences of the same exact measurement because we're pretty detailed on the measurements whereas if we didn't have the decimal points then it would be quite likely that the mode would be very useful because we would have multiple of the same number more likely even with a smaller data set but because we have such a large data set in this case then we even the mode is still relevant and it's pretty close to the mean which is another indication that this might conform to a bell shaped curve so if we were to plot this I think I plotted this down here it looks like this so this is a histogram so we have that middle point over here and you can see that it doesn't look like a smooth curve because it's still a histogram because we have a lot more data than it's looking a lot more bell shaped than some of our other curves remember that even if we were looking at weights and we had a lot less data then it still would conform to like packed up in the middle but it would look a lot more jagged due to the fact that we don't have as much data to be representative so we're saying okay it looks a lot like it's going to conform to a bell curve now so now we're going to say alright well then let's plot this thing out and look at the smooth curve or the graph that we can make based on our formula or plotting of the x so I'm going to say let's plot this out using our norm dot dist we want to take the x's now where am I going to start the x's this is our common questions we've run into in prior presentations we're looking at pounds so I might say well why don't I start at zero pounds and go up to a number of pounds like 500 or something it's unlikely they're going to hit 500 pounds that would be quite heavy of an individual but it's also unlikely it's going to go down to zero so we probably don't want to start at zero so where do we want to start well we know that if we go for standard deviations that's going to take the vast vast majority almost a hundred percent of the data even though it goes on forever the bell curve in theory on the tails but the vast majority of the data will be in there so let's do the four standard deviations which is what we've done in the past so the lower bit then is going to be the standard deviation of 11.66 times four four standard deviations 46.64 minus the mean will take me to the lower end so I'm going to say minus 127.08 and that gives us our 80.44 so I don't need to go down to zero pounds I can go down to 80 there are not many people that are 80 pounds