 Statistics and Excel. Bell curve batting average comparison. Get ready, taking a deep breath, holding it in for 10 seconds, looking forward to a smooth, soothing Excel. Here we are in Excel. First, a word from our sponsor. Yeah, actually we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But, but that's okay whatever because our merchandise is better than their stupid stuff anyways. Like our crunching numbers is my cardio product line. Now, I'm not saying that subscribing to this channel, crunching numbers with us, will make you thin, fit and healthy or anything. However, it does seem like it worked for her, just saying. So, subscribe, hit the bell thing and buy some merchandise so you can make the world a better place by sharing your accounting instruction exercise routine. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. Well, if you don't have access to this workbook, that's okay because we'll basically build this from a blank worksheet. But if you do have access, currently five tabs down below, the two example tabs in essence answer keys. The two practice tabs having pre-formatted cells so you can get to the heart of the practice problem. The blank tab, blank worksheet so we can practice formatting cells within Excel as we work through the practice problem. Let's go to the example tabs to get an idea of what we will be doing starting with the data tab where we will be pulling baseball statistics information in from an online resource, building them into a table so that we can sort and filter the table, pulling what we want from the table, in this case batting averages, and putting them into a separate tab so we can focus just on those items that we want. We'll do some of our standard statistical calculations, the mean, the standard deviation, the median, comparing two years of batting average information, probably won't be doing 2020. I think we'll do 2022 and then 1920. And then if it conforms to a bell curve distribution as we expect it would, then we'll do some of our graphing of the bell curves, plotting the relative bell curve information. And since we have two sets of data, this is what's different in part from some practice problems we've done in the past, one being we're taking the data from an online resource and filtering it, and two, that we have this comparative information that we can look at the differences and analyze a comparative information and make some charts from that comparative information. All right, let's go to the blank tab over here and we're going to start by pulling our information in from an online resource. So I just looked up statistics, baseball statistics for 1920, and I found a website which is baseball reference, and I'm going to scroll down and say I don't want the team standard batting. What I want is the player standard batting. So a player standard batting, and then over here it says that we can download this. Where did it go? Ah, here it is. Now I'd like to download it as an Excel worksheet, but it only gives me like 500 lines. So it's going to give me all the data if I convert it to a CSV file, which is often the case we have to convert from a CSV. So I'm going to say give me the CSV file. It looks kind of weird because it's just comma deliminated information as we saw way back when we were just practicing uploading data. What I can do is copy this whole thing though. I can say I'm just going to copy this because all the data is there and then I'm going to convert it from a comma deliminated data set to a table. So I'm going to just copy all of that, control C, and go to my worksheet. I'm going to put that in A1, right click and just paste it. I'll paste it. Can I paste it? Does it matter which way I paste it? I don't think so. Match, keep source, let's match it. And then I'm going to make it a little bit smaller here so we can see what's going on. Now it's separated by commas now. So what I want to do now is go to my data up top. I'm going into the data tab and then we're going into the group of data tools and then we're going to go into text to columns. Well, this notice that only column A is highlighted, even though the data is kind of spilling out over the other, all the data is in column A. So I'm going to select the text to columns and then it's deliminated here, characters such as commas, so that looks correct. So I'm going to say next, and I don't want the tab to be the deliminating factor. I instead want the comma and you can see it changes the data down here to more of a table type format. So I'm going to say, all right, next, that looks good. And I think that's good. So I'm going to say let's finish it, finish it. So there we have it. So now we've got our data broken out here. We've got more than the stuff that we need because we really just want batting averages. I'm going to make the top part of it bold, home tab, font group. Let's embolden it so that the system knows I want to create a table from it. When I insert a table and they can see that that will be the header. Then I'm going to go to the insert. I'm just going to select one cell in here. Be careful with this table though, because there's some blank spots. There's some unfilled areas which can throw off the table. So let's see if I can insert the table. If it'll pick up the entire range, insert tables, group, insert a table. Now let's go all the way down. So does it pick everything up? Is it doing the dancing ants doing the whole lapping the entire place or not? So it looks good. So I'm going to say, OK, table has been inserted. So we're going to mainly just focus on, in this case, the batting averages. So we're going to go over here to the batting averages. But first, let's add another set of data. I'm not going to add the data to the right this time. I'm going to add it below, because I might filter the data. Now it might even be better to put the second data on a whole other worksheet. But I'd rather put it, but I want to keep just one other worksheet. So I'm going to put it below instead of next to. So I'm going to say control shift down. Oh, that's not down. Control shift down to get to the bottom of the table. I'm going to put the second data set right here. So I've done a similar search for 2022. Same thing, baseball references. So not the team, but I want by player. So here are the players. So that looks good. The standard batting. Here's the dropdown. I'm going to convert it to CSV files. There they are. And then I'm going to pick up all of this data in the comma deliminated format. All of this great information that we can work with and do comparisons about and talk about who's the best and who's the worst and like who's in between not even worth talking about because they're like in the middle. You've got to be either good or bad. If you want people to be talking about you, maybe it's better not to have people talking about you anyways. I don't know. But I'm going to paste that here and then I'm going to go into the data up top and then we'll go into the data tools and the text to columns and then once again deliminated by the commas. Next. And I want it comma deliminated. Looks good. It looks like it's doing what we want down here. Next. And okay general. Okay. Finish it. Let's make the top bit bold so that we can then hopefully excel. We'll know that that's the header home tab font group bolden. And then I'm going to put my cursor somewhere in the data and go to the insert tab and I'm going to be picking up the table and the table here. So let's insert the table and boom. So now we've got our two data sets for 1920 and 2000. I'm going to insert another one more row above it insert. So I can say this was 2022 data and let's do that up here too just so I can make sure I know which data is which which is which insert. And I'll say this was 1920. Those are the two years we did. Am I right? Am I right? Okay. I think so. Okay. So there's our data. So now what I'd like to do is just get the batting averages over. So I want to pick up the batting. So I'd like to get the player's name as well possibly. So do I need their age or any of this other stuff? No, let's just get the batting average. So I'm going to put my cursor from C and I'm going to scroll over. Now you possibly might do this with pivot tables as well, but I'm just going to do the good old normal table. I'm going to right click and hide. So now we've got our batting averages. Now when we sort the batting averages, I'm going to make column wider so we could see the player's names. Notice that we might have like some outliers with the batting averages. So if I drop it down and go from Z to A, all these zeros and the batting average, when you're looking at the actual data sets, then that might not be applicable. You might not want the zeros and the batting averages because that might be indicating that these people didn't have any at bats or anything like that. So maybe we trim off the zeros and on the high side as well, you could say, well, one or 100% possibly that's because they had a very low at bats. They might not have that many at bats. So these are kind of skewing our data if they had one at bat and they got hit. So oftentimes we might say maybe it would be better for me to trim off the extreme numbers and so I can filter by hitting the dropdown and say maybe I should trim off all of the zeros. Maybe I'll keep everything else and maybe I trim off like the 100s and anything that's a blank that doesn't have anything in it. Then I want to remove those as well. So I'll trim down my data to that thinking that maybe that's a more better representation of the people in the data set. So now notice what happened is it hid some rows because I filtered. I filtered those out so it goes from two to six. That's why I don't want to have anything to the right of my data set because if I have this filtered it's going to squish up all of the stuff I do to the right because it removed these rows. That's why I'd rather copy this information now that it's filtered and paste what I want to deal with in another tab. So I'm going to open up another tab. I'm going to double click on it. I'm going to call it blank two or let's call this one blank data. This is the blank data. It's not blank data but it's the data for our blank worksheet. And then I'm going to copy this. I'll select these two and then I'll say control shift alone. So it goes down to the bottom. I'm going to right click and copy it and then go to my blank worksheet, paste it. I'm going to call control shift and scroll in a little bit and a one. I'm going to place it just 123. All right. So there we have it. And then the top bit. I'm just going to put in boldened home tab font and in boldened and then I'll put a table around this table. After notice I've removed those top bits. Like if I go to the last one, you might think, you might think, hey, maybe it's still pulled in those ones, but it didn't because like if I copy this control C and you look at the dancing ants and I unfilter and I say, remove the filters or select all actually that's not where the filter was. Remove the filters. You can see the dancing ants are not around these ones because we filtered them out. So the filtered ones are not included here. So now I'm going to insert a table, insert table and then I'm going to say, okay. And then batting average. This is for 1920. Let's say. So there we have it names a little bit larger. Let's do the same thing for the 2022 control shift down. And so now I want to pick up the 2022. There's my batting averages. I'll do the same thing. I don't want these blank numbers and I don't want the zeros and I don't want the hundreds because those are probably skewed stats possibly because of the number of bats or whatever it. So here's like this one even has a 0.04. That's, you know, pretty low. You know, maybe we remove the 0.04 and the 0.05. And I'll keep the ones after that. And then I'll go down to the bottom and this one goes up to 0.75. Let's get rid of the blanks 0.75 is pretty out there. It's probably probably probably one person that didn't have that many at bats, but I'll keep it and we'll say, okay. So then I'm going to copy these two. Control shift down and then control C copy. I'm going to put this on the blank tab and we'll put it up here in D one. Right click and pasted just one, two, three. Select in the top bit. Home tab font group data or bold. And then I'm going to put my cursor somewhere in the table and go into the insert tables and put a table around it. The dancing ants doing their table dance. It's not like a not like a table dance. It's a dance that makes a table. They're not like dancing on the table. The table like makes rain. It's not like, you know what I'm talking about. The table dance. But in any case, this is going to be batting average for 2022. I think it was right now. Note that you can see these data sets are in a decimal format. You can put them in a percent format. But sometimes when we do our norm.dist, it's useful to see them as whole numbers. So I'm going to multiply these times. I'm going to insert a tab here. Insert. I'm going to multiply these times a hundred. So this is going to be, I'll say it's the same thing. Batting average. Batting average. B. A. 1920 again. Batting average. 1920. But I'll say multiply times a hundred. This equals this times 100. And then let's add a decimal. Home tab. Number group. Actually, do I need to add decimals? No, I keep it like that. I'll put an H after it for whole numbers. Okay. And then, and the reason it won't let me do a decimal is because it's in general format. So let's actually select these cells. Let's select the entire worksheet. Actually, I'm going to select the entire worksheet. Right-click. Format the cells. And I'm going to make them currency. Negative numbers. Bracketed. No dollar sign. And let's add the two decimals. So now I'm going to say, okay. So now we can see them a little bit more uniform numbers. Let's do the same thing over here. I'm going to call this BA 2022 H for whole numbers. That's what I was thinking. Or I'm not even sure that's a W. But it's going to be this times 100. Okay. So now we've now we're representing it. Not at 0.25 or 25%. But rather 24.6, which represents the same thing. 24.6%. Okay. So we can compare the data now this way. I can look and sit and look at the look at the data comparisons for the two. I can try to sort them and whatnot and look at them this way and compare them. But it might be useful to then build some tables with it. Now, another thing I could do is make a histogram with my data thus far. So I can select this data set. And so notice one data set is longer than the other. That also causes some difficulties. But that's what statistics is kind of forward to kind of help to deal with these kind of difficulties. I'm going to say control back to get back to the top. Let's insert charts and make a histogram. So I'll say this is the histogram for the 1920. So 1920. And you can see we have some outliers. This one we maybe could have trimmed off possibly. But you can see that it's hovering around the 22. So we'll say, okay, it looks like it's somewhat symmetrical. So I'm going to say that probably could lend itself to a bell curve that we'll generate. And then this data over here, let's do it for this one. 2022 control shift down control backspace back up to the top insert charts histogram boom. And so we have that one. And this one has all of these buckets out here because we had some more kind of outliers on the high end. Let's see if I sorted this from Z to A. Yeah, we had these. So probably should have trimmed off maybe the 75 out there. But in any case, you could see it's somewhat still kind of a bell shaped type of curve. So we're going to try to use our analysis of the bell curve to analyze these data sets. Because that'll help us to give us uniform things like the Z scores and whatnot that we can make comparisons with. So 2000, let's say this is going to be 2022. So I'm going to say, all right, so now we'll see if we can start to build some stats with this stuff and our tables. So if I'm going to go back on over, I'm going to say, all right, let's do, let's make a skinny H. And we're going to do our calculations for like, let's say this is going to be our data and we're going to have 1920 and 2022. I'm going to make this general formatting so I don't have a number format. Home tab numbers, drop down general and then selecting the top bit because I want it to be a header, home tab, font group, black, white. I'll center just these two maybe, home tab, alignment and center. So we're going to do the mean, we'll do that standard deviation, the median, possibly the mode. I don't know if that will be helpful, but let's try it. We'll say this is the mean equals the average for the 1920s. I'm going to say average of this information, control shift down and enter. So now we've got 2417. Let's put a difference in here, difference just for the fun of it. We're going to say home tab, font group, black, white, center it. So then we'll do the average here, average for 2022, which is this data, control shift down, enter. And then the difference, here's the difference. So we're going to have an average, a little bit different average standard deviation, standard deviation. We're going to say for the population and I'll pick up this data set, control shift down and enter. So there we have that. Let's do it again. This standard deviation for the population, this data set, control shift down, holding control backspace. There's our calculation and enter. The difference equals this minus this. Okay. And then we've got the median, middle point equals the median. And we're going to say this data from here, control shift down, control backspace back up and enter. And it's pretty close to the mean. That's an indication that the bell curve might give us, might be a good tool to use. And then we're going to say the median on this one is going to be this data set, control shift down, control backspace and enter. Pretty close to the mean, which is usually an indication that the bell curve is something we can use. There's the difference between the two and the mode equals the mode, single mode. And from here on down, we got 25, which again is pretty close to the mean. Another indication that the bell curve might be useful mode, single mode here, control shift down, backspace, enter. So again, that is close to a little bit further off, but still relatively close to the mean. So it's an indication that the bell curve might be a tool to use. And the difference between those two is zero. All right. So then I could say, okay, let's start with our data for for 1920 and start to build our bell curve. So we'll build our table to create our bell curve. So I'm going to make a small M skinny M. I'm going to see I need my X's and my P of X's. And then these tables are getting in the way again. You need to get way out of here. Get out of here. You're in my way. Like it's like a pet little Chihuahua under my feet all the time. It's like I'm trying to walk. You're trying to kill me. This thing's trying to kill me. Keeps on walking under my foot. And then they anyways. Home tab font group black, white, let's center it. Now we need to know the starting point. So where should I start the X's at? You might say, well, if we're talking about averages, which we're representing in whole numbers, I could just go from zero to 100. Because you would think that would be the entire spectrum. But let's do our thing where it's usually the four standard deviations. So I'm going to say, let's do the number of standard deviations. Let's take it for standard deviations out for both data sets, because that will encompass the vast majority of the data. So the lower X and then the upper X would then be what? The lower X for 1920 would be the mean times the standard deviation, or I'm sorry, plus the standard deviation times four. So standard deviation times four minus the mean. The mean minus the standard deviation times four, four of the standard deviation below. Alright, enter. Notice it goes below zero, which in reality isn't going to happen, because you can't have a negative batting average. But I'm going to keep it because once we graph the curve, it'll show all of our data points, which gives us that kind of double check that we have all of our data in there. So I'll keep it there. And then we're going to say the upper is going to be equal to the mean plus the standard deviation times four. Enter. Boom. And then let's do on this one. This is going to be for 2020 the mean minus the standard deviation times four. And this is going to be equal to the mean plus the standard deviation times four. So there's our upper and lower X's. So let's go down to the lowest X because I want it and then up to the highest. So for both of these. I'm going to go from five to 50 to 53 so that we have X's that will line up under both data sets. So I'm going to say the X is going to be at negative, let's say negative five. We're not going to have negative batting averages, but I'm going to start it there so that we cover the whole data set. And then I'm going to go from to negative four, negative three. I'm going to select that group. I'm going to bring it down to 54. So I'm going to put my cursor on the fill handle. Take it on down till we get to 54, 50, 54 right there. Boom. I don't need the decimals on this one. Let's go to the home tab number group. Remove the decimals. Get out of here decimals. You're not needed or welcome in this on these cells. And then we can do our P of X. So this is going to be equal to the norm dot dist tab. The X is going to be this comma the mean we're looking at the 1920s. So we're picking up this one and not not the that's the mean. And then f four to make it absolute dollar sign before the J and the two so I can copy it down comma the standard deviation. We want this one f four dollar sign before the J and the three comma. Do we want it to be cumulative? No, false or zero. Therefore closing it up enter putting my cursor on it. And we're going to just double click to let's make it a percent home tab number percent if I adding some decimals. And then I'll put my cursor on the fill handle and double click copying it down. So notice all of this data is now selected and adds up to 100% because we went, you know, all the information is kind of within the four standard deviations. That's what's kind of nice about picking that up even the negatives because then you get that full kind of curve or pretty much the full curve the full data. Okay, so I'm going to stop it here and we'll continue on with this next time building our curve and then doing the same for 2022 and then looking at the differences between them.