 statistics and excel perfect negative correlation got data let's get stuck into it with statistics and excel first a word from our sponsor yeah actually we're sponsoring ourselves on this one because apparently the merchandisers they don't want to be seen with us but but that's okay whatever because our merchandise is is better than their stupid stuff anyways like our crunching numbers is my cardio product line now i'm not saying that subscribing to this channel crunching numbers with us will make you thin fit and healthy or anything however it does seem like it works for her just saying so yeah subscribe hit the bell thing and buy some merchandise so you can make the world a better place by sharing your accounting instruction exercise routine if you would like a commercial free experience consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com you're not required to but if you have access to one note we're in the icon left hand side one note presentation 1718 perfect negative correlation also uploading transcripts to one note so you can go to the view tab the immersive reader tool change the language if you so choose be able to either read or listen to the transcript in multiple different languages using the timestamps to tie in to the video presentations one note desktop version here thinking about correlation where we have different data sets to see if there's a mathematical relationship or correlation between them in other words are the different dots and the different data sets moving together in some way shape or form if there is a correlation or mathematical relationship between two different data sets the next logical question would be is there a cause and effect relationship that's causing the correlation or mathematical relationship between the two different data sets and if there is a causal relationship the next logical question would be what's the causal factor that's causing the causal relationship which is causing the correlation or mathematical relationship between the two different data sets. We're now thinking about a perfect negative correlation which is not something we typically look at when thinking about correlations because usually it's not a perfect correlation but more of a trend but we're gonna first start off with the perfect negative correlation we looked at in a prior presentation the perfect positive correlation so the example that we will look at then is going to be distance and distance traveled versus the distance remaining so you can imagine of course if you're going from point A to point B then you can think about the full amount of distance that you have to travel versus the distance that is remaining how much distance have you traveled versus the distance remaining to travel and those relationships between those two distances will have a perfect negative correlation as you cover that distance so we're gonna build our data set of this time imagining in Excel that we used a random number generator so we're gonna imagine that we had random numbers that were generated from 1 to 100 and so the formula would be equals random between let's say 0 and 100 between 0 and 100 and that's going to give us our data set on the left that we can imagine that we could just create and that of course is not being would not conform as our positive or prior example did to a bell curve but it would conform more to if we were to try to graph a curve with it a function to it to a uniform distribution so then once we have the distance traveled if we say that the total distance is 100 then we're going to generate our distance remaining by taking 100 minus the 94 so if I take each of these data points and take 100 minus each of the data points then the two of these will add up to the to 100 and you can see there's a perfect relationship between the data points they add up to 100 that's going to be showing to be a perfect negative correlation relation so but we'll imagine here that we don't know that yet we have these two data sets and possibly we're trying to test them out to see if there's any relationship between them so what could we do to do that we could graph each of these I could say here's the distance traveled I would do this in Excel by selecting the entire thing entering a histogram and looking at the data points in the histogram and this is just numbering in the buckets how many of those items are falling in the histogram now note if we did this an infinite amount of times given the fact that we did a random number generator the histogram you would think would tend towards a uniform distribution or a straight line as opposed to the prior example when we looked at a distribution that happened to have a bell curve type of distribution now the fact that we have more of a uniform distribution as opposed to a bell shape distribution or any other kind of distribution does not necessarily mean that there's not a correlation between this data set and the other data set but it might give us some idea of what's happening in general with the data so the second one you can see if I select this whole data set the distance remaining and make a histogram again you get a relationship which if we did this infinitely would tend towards a straight line or uniform type of distribution so the fact that these both tend towards possibly a uniform distribution doesn't really tell us that there's a correlation or not but we just want to show that you can have when we you're looking at these correlations each of the data sets could have you know different types of distributions that may or may not lend themselves to some type of curve like a bell curve or Poisson distribution or uniform distribution and so on so let's now take the mean and the the standard deviation so if I take the mean of my data sets we're just going to select the entire data set if we were in excel and take the average of it which is the mean all of the data sets divided by the number of items we'll take the standard deviation for a sample this time that would be the standard deviation dot s for the sample we would select the entire data set and pick this function to get our standard D for both the distance traveled and the distance remaining now just looking at these two points it might not give us a whole lot of information as to whether or not there's a relationship between these two it could but the next thing we could do is say let's do our formula by the way we could of course plot this and that would give us very a good indication of what's happening but before we do that let's do our formula so here's our good old correlation formula we have been working on basically taking the z score of the first data set times the sum of the z scores of the second data set so we're gonna take all the z scores meaning we're gonna take each data point minus the mean divided by the standard deviation of one data set same thing for the second data set multiply them together sum them up and then divide by n minus one so where you can see as we saw in the prior presentation that this has heavily to do with kind of the z score to help us out with what is happening with the two different data sets so this is gonna be the the distance traveled and this is gonna be our the z of the distance traveled so if we were to think about the z it's gonna be in this case 94 minus and then I'll pick up the mean which is gonna be for this one the 53 so minus 53.92 about there's rounding involved here divided by the standard deviation 28.64 and that's gonna give us our about 1.4 here and then we're gonna and we're gonna do that all the way down so we'll do the z score all the way down for the 65 give us the z the 14 will give us the z and so on and so forth remembering that the z kind of tells us how close it is to that basically the middle point to the mean we're comparing to that middle point okay so then we're gonna say let's do this for the remaining distance same kind of thing we're gonna take each point and do the z score in this case we have six minus we're gonna take six minus the mean this time is 46.08 divided by the standard D 28.64 and that's gonna give us the negative 1.4 about so there we have that one and I'm gonna say okay we're gonna do that all the way down we're gonna do that all the way down and so there there we have it you can see a pretty heavy relationship between the z scores here right right the z scores seem to be tying out which could give us some indication that there's some kind of correlation so then I'm gonna say this is gonna be that's my z score calculation and then we multiply these together so we did this for each data point now I'm just gonna multiply the 1.4 for example times the 1.4 on the negative so we're gonna say 1.4 times negative 1.4 or let's do that again 1.4 times 1.4 is negative 1.96 and we do that all the way down that means we have almost our entire numerator all we need to do is sum up this outer column so I'm gonna go over here and say let's do that let's sum up the outer column and we sum that up it comes out to 204 then the denominator is n minus 1 and that's gonna be in which is 205 that's gonna just be counting all of all of the data points how many data points were there there are 205 data points that we made and less one that gives us 204 notice the formatting that I'm putting this into this formula into this is kind of a useful tool especially in Excel because it allows you to kind of be able to map out a formula like this in something more of a table which is another way to visualize the data so to do that I said this is the denominator I'm gonna say the denominator is n minus 1 colon or you could call it denominator and colon and then we're gonna say the n and I'm gonna pull this into the inside indent it so that we have our sub calculation less 1 and then I indent it twice and I put the answer on the outside leaving us with the numerator on top denominator on the bottom and then of course we can divide the numerator and the denominator boom boom and that will give us one a negative one this time which indicates that we have a perfect negative correlation now if we plot this out then of course we'll also see that we have a perfect never negative correlation because it's gonna be a straight line to plot this out all we did is select the two data points distance travel distance remaining when we do this we're gonna use a scatter plot and we always have if you by default it'll put the distance traveled on the x-axis which we typically think of as the independent variable but you can plot it either way because sometimes you might know not know whether it be dependent or independent in our in this case we probably would think of basically the distance travel as us physically kind of doing something so we probably think of that as you know like the the independent variable right and as our distance traveled goes up as this gets higher the distance remaining goes down so if we go 20 miles let's say the distance remaining is at 80 if we travel more we go to 40 miles then the distance remaining has now gone down to 60 so you've got this negative relationship it's a perfect negative relationship because all of the points of course fit on that exact line now you could plot it the other way and you might say well what if I reverse this distance traveled on the wide distance remaining on the x will that make it a positive relationship no it'll still be a negative relationship so it'll look it'll look like this and so you can still see you know the relationship and it still has a negative relationship this way and so you could plot either way and again over here you could say well you know you might think here like the the causal factor the thing that is is the traveling that's why you might think of it as on the x as the thing that's causing the other one the distance remaining to change possibly in this case but again you could see the relationship as a perfect negative relationship whether you graph it on the left or the right so you can see like if you were to look at it this way you can see okay if I had 20 miles remaining that means I drove 80 miles and if I have 40 miles remaining or I drove or I want whatever we're doing 40 miles remaining we had 60 miles so it's going down and if I have more miles remaining more miles remaining that means we have less distance travel but again usually we would think about it this way most likely we would be saying okay if I went if I drove 20 miles the distance remaining would be 80 if I drove 40 miles the distance remaining would go down to 60 if I drove 60 miles the distance remaining is down to 40 and so on now we could double check this using our super cool analysis toolkit and if this isn't open by default in Excel so if you don't have the analysis toolkit you could turn it on in the options which we do in the Excel if you want to do that but we can open up that analysis toolkit and we can use the the well we could do the correlation up top the correlation will give us something like this so I didn't do the correlation but the correlation will give us will spit out an answer that will tie into the perfect negative correlation so I could use Excel just to give me that negative correlation without doing the whole calculation like this but doing the calculation like this will probably give you better intuitive understanding and if you're building a bigger worksheet then this will be a dynamic worksheet see how you can see what's happening with the Z scores and whatnot that gives you more understanding than than just speeding out that the correlation is negative one but you can use that of course to double check your answer or possibly to give you a preliminary understanding of what's going on you can also do the descriptive statistics which is quite common which is nice and it'll give you this general information about the two different data sets so distance travel distance remaining gives us the mean gives us the standard error because we picked up this the confidence level which we're not focused in on right now median mode standard deviation sample variance kurtosis skewness range minimum maximum sum count and the confidence level so this is a great tool but again it's not dynamic it doesn't change when you change like the data so if you're constructing like a dynamic worksheet then you might use this as the starting point to give you to give you an understanding about the data sets before you build whatever you're building and or as a checkpoint at the end to kind of double check the calculations that you've made