 Statistics and Excel correlation baseball statistics. Got data? Let's get stuck into it with statistics and excel. You're not required to but if you have access to one note we're in the icon left hand side one note presentation 1670 correlation baseball statistics tab. We're also uploading transcripts to one note so that you can go to the view tab immersive reader tool change the language if you so choose and be able to either read or listen to the transcript in multiple different languages using the timestamps to tie into the video presentations one note desktop version here thinking about correlation where we have different data sets to see if there's a mathematical relation or correlation between them in other words are the dots and the different data sets moving together in some way shape or form and if there is a mathematical relation or correlation between the different data sets the next logical question would be is there a cause and effect relationship causing the mathematical relationship or correlation and if there is a cause and effect relationship the next logical question would be what's the causal factor that's causing the causal relationship which is causing the correlation or mathematical relation between the different dots and the different data sets we're not going to be looking at a baseball statistics we're going to be pulling our stats or imagine they pulled from when we did this in excel this baseball reference website we're not advertising for them but we're just getting our data there so we can imagine going to the website as we did when we worked this in excel and you can check this out in excel if you so choose and we have the option of downloading the excel but it's limited so we were able to transpose it to a csv file which is comma deliminated and then simply copy this entire thing paste it into excel and change the formatting from a comma deliminated formatting to a table which is a common theme oftentimes because oftentimes data sets might be in a csv or comma deliminated file when we pulled it in we get something that looks like this we have of course baseball stats now note that baseball stats are similar to job stats so baseball is great because you have a whole lot of stats in baseball due to the nature of the game but it's a job for them and many of the concepts we apply when analyzing different baseball players can be applied to different jobs as well we're going to try to break down what are the essence of the jobs what can we measure in the job how can we use ratios and we can apply some of these statistical analysis to judge performance of one person to another and of course compensation you would think should be based on performance based on these kind of this type of analysis so we've got the age we've got all of these stats up top we're going to be focusing in on those pick like the age and the batting average so the batting average over here represents how many times someone gets on base so it's kind of a versus how many times they were at bat so if they're if they're at bat they're hitting if you're not familiar with baseball they're trying to hit they're trying to get on base whatever what whether that be first second third or a home run and and they have a lot more likelihood of getting out because there's many different ways that they could get out with a pop fly striking out grounding out being thrown out and so on and then we're going to compare that to the age so as they get older the hypothesis might be our older players is it correlated that older players are going to have a lower batting average right that would be the hypothesis so we'll check that out and so i'm going to take my data here and we're going to focus in on just those those components so we pulled in the name and we pulled in the age of the players and then we pulled in the batting average and then we're able to sort them sort them here by these by this table now note when we look at the batting average the next common kind of issue that comes up is uh there might be some batting averages that shouldn't be in our data set possibly because maybe they didn't have that many at bats so maybe they had one at bat and they got a hit their batting average would be a hundred percent then one out of one however that's not a useful stat uh generally because it's because it's going to skew the data and really they didn't have enough at bats to really have a judgment so we could we trimmed down the data set here so that we're we're picking we're trimming out those that were very high and very low because the likeliness would be that they didn't have a lot of at bats and therefore they have outliers on the batting averages so keep that in mind when you're looking at large or a lot of data that you want to think about how can i adjust the data so that i can get to the heart of the meaning of what i am looking for now if i was to plot this