 Statistics and Excel. Normal distributions, heights of baseball players, data example. Got data? Let's get stuck into it with statistics and Excel. Although we'll be using OneNote this time, but we'll talk about Excel. You're not required to, but if you have access to OneNote, we're in the icon left-hand side. OneNote presentation, 1620, normal distribution, heights of baseball players, data tab. We're also uploading transcripts to OneNote so that you can go to the View tab, Immersive Reader Tool. You can even change the language if you so choose being then able to either read or listen to the transcript in multiple different languages, tying into the video presentations with the timestamps. OneNote desktop version here in prior presentations. We've been thinking about how we can represent different data sets, both with mathematical representations, including the average or mean median mode, core tiles, and with pictorial representations like the box and whiskers and histogram, noting that the histogram is the primary tool when we're trying to envision the spread of data, allowing us to use descriptive terms to describe the spread of the data on a histogram, such as it's skewed to the left, it's skewed to the right. We then wanted to link, look at formulas that would give us lines or curves that might be able to approximate different data sets. If we can approximate data sets with a line or a curve that has a formula related to it, that would be great because it might give us more predictive power over whatever the data set is representing into the future. We talked about different types of formulas, lines, curves, such as the uniform distribution, binomial distribution, Poisson distribution, exponential distribution, we're continuing on with the most famous of them, that being the bell curve or the normal distribution. So last time we looked at it with relation to its most common application for students and instructors, that being grades. Now we want to look at another area, this being heights, heights specifically of baseball players, and we're going to even narrow it down a little bit more to say pitchers. Now remember that when we're looking at the bell curve, just like all the other kind of distributions, the goal is to say, is the data that we're looking at actually going to be conforming to a bell curve at least approximately, and if so, then we might be able to apply the bell curve. So we might first look at the data as we have in prior presentations with things like a Poisson distribution and whatnot to see if it conforms, and then we'll go from there. Now usually the bell curve will often be represented in things like in nature, when you're looking at heights, when you're looking at weights, when you're looking at, you know, any kind of measure of how long like trees and stuff that you would think that a bell curve might be there, and if you have errors or things like that, you're trying to measure how close a bunch of estimates are to a particular point. These are types of things that often follow a bell curve distribution. So if I'm looking at, in this case, heights of baseball players, I have in my mind possibly an intuition that because this is something in nature's heights, that it might be around a bell curve type of distribution, but I'm going to take the data and look at it a little bit more deeply to see if that is something that would be a confirmed thing. In practice, you can also think, well, what are we doing here in terms of our scenario? We might be imagining that maybe we want to grow up to be a pitcher or something, and we want to see how tall would we have to be, what's the range we would have to be to be a pitcher, for example. So to get our data, if you don't have any data sets, then you might use the tool Kaggle.com, K-A-G-G-L-E.com to get practice data sets. You can also create data sets in Excel using a tool that we looked at in prior presentations and have done in our Excel presentations as well, so that you can get some data to work with. So this is this data that we are starting with. So we've got the name, we've got the team, we've got the position, we've got the heights, we've got the weights and the age. Now our focus here is simply on the height. So if I'm taking this particular data set, we might want to apply a table to it, and then we might want to sort it, for example, by the heights. Now these heights are in inches, so we have the heights in inches here and we're sorting from top to bottom. So the highest is 83, so if I divide that by 12, you know, 83 divided by 12 is like 6.9 feet. And so that's the column that we're going to be then focused in on. So I'm going to copy that over here. And oftentimes in Excel, you might do this in a separate tab, because sometimes your data sets might have, say blank cells in it, or they might have some outliers that you basically want to trim off that you might use filters or sorting. So therefore, if you put something next to it in Excel, and you filtered it, then you're going to be working with some missing columns maybe or rows. So sometimes you might want to take what you need and put it on a different table. You might also sometimes use things like the the pivot tables to do that. But we got our data now, so we just want the height data, which we can now sort from lowest to highest, highest to lowest. We'll do some of our normal stats on it, that being the mean or the average, which would mean we would just say equals the average of this data, and we get to 73.7. So what's that in feet 73.7 divided by 12 is about 6.14 feet on the average. The standard deviation is 2.3, which is calculated with the standard stdev.p calculation of this data. Then we calculated the median. That's the one in the middle. If we sorted all our data, picked the one in the middle, we would get to the 74. That's the median formula in Excel. Note that the median is close to the mean. That is an indication that this might be conforming to a bell curve type distribution. In this case, we also took the mode, which is the one that's going to be repeating. We're looking at the one that repeats most. This would be the function equals mode, and we're looking at the single mode here. I just want one number, and that's the 74. That also ties out or is close to the mean and median, which further indicates that this might be a bell or normal type distribution. Note that the mode this time is more likely useful than our prior presentation because