 Statistics and Excel. Bell Curve batting average comparison part number two. Get ready, taking a deep breath, holding it in for 10 seconds, looking forward to a smooth, soothing Excel. Here we are in Excel. If you don't have access to this workbook, that's okay because we basically built this from a blank worksheet, but we started in a prior presentation. So if you're starting from a blank worksheet, you may want to begin back there. However, if you do have access to this workbook, there's six tabs currently down below. The two example tabs in essence answer keys, the two practice tabs having pre-formatted cells so you can get to the heart of the practice problem. The two blank tabs is where we started with a blank worksheet and are continuing at this point in time. Quick recap of what we did last time. We went online to find some resources for baseball statistics. We downloaded them in a CSV or comma deliminated type of format, and then we put them into our blank data so that we can sort the information populated in a table, filter it, pick up what we need, and then copy it over to our blank tab looking at just basically the batting averages. We converted the batting averages to a decimal format so that we have whole numbers with them now. So now we have the 60 representing basically the 60% here for 2019-20 and 2022. So we have the two years so that we can compare the two. We did our standard data calculations, the mean standard deviation median mode for 1920 and 2022. Looking at the differences between the two, we noted that the mean is similar to the median and mode for both years, which would indicate that we could have a bell shaped curve which might help us with our comparisons. So now we are constructing our bell shaped curve starting with the 1920 data. We plotted from negative 5, which is unusual. You might say why would we have an X of negative 5? The batting averages would need to be between 1 and 100, but four standard deviations goes down to negative 5 here, and so we want to have all of the data that we could on the bell curve so we have a nice smooth bell curve and we can kind of double check that our data adds up to 100 as we did here when we did our norm.dist calculation. This set of data basically adds up to 100% now and that is useful to be able to see that, and in part it's doable because we converted the stats which were in percent or decimal format to whole numbers. Okay, so now let's also do our Z score, which is a nice thing to be able to look at, especially when we're comparing two different years, because when we're talking about people's batting averages, we might say, well look at the things were a lot different in 1920 than 2022, but we can see where they were relative to the field at that point in time relative to the mean, and that's going to give the Z score how far away from that middle point. So I can go to the home tab, font group, let's make this black, white, let's center it and do the Z score. So the Z score is going to be equal to brackets. So I'm going to take the X value, which is a negative five in this case, minus the mean, the distance from the mean for 1920 data, close up the brackets and divide that by the standard deviation. So there we have it and enter. Now I want to copy that down, so I'm going to double click on it. Anything that's not in the current field I'm working in, then I want to make absolute. That's going to be these two in column J. So the J2, I'll put my cursor in there, F4 in the keyboard, dollar sign before the J in the two, put my cursor on J3, F4 in the keyboard or dollar sign before the J in the three, enter, back on it, double click on the fill handle to bring it on down. So there we have it. So if we're looking at like a 32% batting average, then that's 3.08 likelihood for that exact area and it's 1.06 on the Z score or 1.06 away from the middle point, the middle point being right here at around 24 to 25. That's where the Z score hits zero because of course that is the mean for the 1920 data. So then we can also take a look and try to compare this to our actual data. So what I'm going to do now is pick up our actual data to compare to the actual data. We could take these percentages and multiply them times the count over here. I'm going to calculate the count, the count equals count and then I'll take my data for 1920. So it's just going to count all of the data and enter. So there's 629 count over here. This is going to be equal to the count and we'll count this data and here we have 822. So we could multiply these percentages times the 629 and get numbers that we would think would basically represent if I had that sample using like the bell curve data or I could try to convert my data into percentages so I can compare and that's probably the better way to go here. So what I'm going to do is call this the frequency. So this is going to be the frequency and I'm going to say let's make that home tab. Let's say actual actual data frequency and let's wrap the text home tab alignment. Wrap the text center it and make it black and white. And so there we go that should work. I'll check the spelling hopefully later. And the frequency we're going to see how many times in this data set do we get a count of X. So X being in this case above 5 and less than or equal to 4. If I was talking down here, which would be more common for batting average, right? It would be up to one up to and including one from zero zero to one. That's what the frequency is going to give us. It's our buckets. So I'm going to say this equals the frequency which is a spill array, the data set. I'm going to pick over the data which is over here and say boom, there's our data and then comma. And now we want the bins array, which is going to be this information control shift down and control backspace. There we have it and it closed up the table and enter and it spills it on down. So there we have our accounts. It's going a little far here. So I'd like to remove that last one or actually it has a number in it. So maybe I should make this go down one more.