 Statistics and Excel Poisson Distribution Roller Coaster Line Example Got data? Let's get stuck into it with statistics and Excel First, a word from our sponsor Really? Yeah, actually we're sponsoring ourselves on this one because apparently the merchandisers they don't want to be seen with us but that's okay whatever Because our merchandise is better than their stupid stuff anyways Like our trust me I'm an accountant product line Yeah, it's paramount that you let people know that you're an accountant because apparently we're among the only ones equipped with the number crunching skills to answer society's current deep complex and nuanced questions If you would like a commercial free experience consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com You're not required to but if you have access to OneNote we're in the icon left-hand side OneNote presentation 1536 Poisson Distribution Roller Coaster Line Example We're also uploading transcripts to OneNote so you can go into the View tab Immersive Reader Change the language if you so choose Either read or listen to the transcript in multiple different languages Being able to tie into the video presentations with the timestamps OneNote desktop version here in prior presentations we've been thinking about how to represent data sets Both mathematically using formulas like the average or mean quartiles the median As well as pictorial representations like a box and whiskers or a histogram The histogram typically what we visualize when we're thinking about the spread of data using terminology such as It's skewed to the right or it's skewed to the left We're now looking at families of curves that can be represented with equations Which could give us more predictive power if those curves could approximate our actual data set In prior presentations we looked at the simple curve of the uniform distribution We're now talking about the Poisson Distribution It's more complicated but it's not poison It's not that complicated It's Poisson Poisson Distribution So we have usually this is going to be related to like line examples When you're talking about oftentimes business type of scenarios So that's what we will be using here Imagining that we're waiting in line for a roller coaster ride So x equals the arrivals during one minute So we're trying to show how many people are arriving within a one minute time frame That's the time span roller coaster ride That's what we're dealing with And then the mean is what we're generally going to need If we're going to be using the Excel functions So let's imagine that in practice We were trying to determine the data related to how many people are showing up within a line And then we'll see if there's any observations We can make about it to see if it fits a Poisson type of distribution In which case we might be able to use a formula To give us more predictive power about what might be happening in the future in relation to the line Now in Excel we can have a random number generator here To approximate the Poisson distribution with an element of randomness in it So if you want to check out Excel you can do that We'll work this problem in Excel as well But instead of just using a random number generator We're going to have to give Excel the mean In order to create random numbers in accordance with the conditions of a Poisson distribution And so these are the numbers that we get from that random generation I think we generated like a thousand numbers in the example in Excel Although we didn't put all those numbers here in our practice problem in the one note So if we had all of our numbers what do these numbers mean? Well it means if we're sitting there with our stopwatch Seeing how many arrivals are happening during each one minute time interval We're waiting as each one minute passes and saying Okay this time and that one minute four people arrived In another one minute time interval three people arrived In another one minute time interval two people arrived In another one minute time interval six people arrived And so on and so forth I'm going to skip over here to the left now Once we have our data we can put our data into our bins or buckets So these are the arrivals that have shown up And we're going to say okay let's count the arrivals This represents the number of arrivals The people that have been showing up Within the time intervals which we said was once again the one minute So now we're going to say of the thousand counts that we had How many times were zero people arriving How many times was one people arriving two people And we're just going to count these items over here In Excel we can we might call that a frequency And you might use an equation like this to do it Now note that you also might say Why don't I use a count if function You could you could say this equals count if Meaning if you find zero in this number of fields over here Excel I would like you to count the number of times a zero shows up However sometimes when you use these random number generators It doesn't give you a whole number or something like that So sometimes that formula doesn't give you the right result So the frequency often picks it up more accurately I have found which is a spill array function A little bit more fancy frequency And then we're picking up the array over here And then the data bins here And this actually just spills out the formula So this means that I believe we did a thousand X's a thousand one minute samples To see how many people arrived and zero people arrived In sixty two one minute intervals One person arrived in one hundred and seventy one One minute intervals Two people arrived in two hundred and forty three one minute intervals And so on and so forth So if I total this up this number down here Should tie out to if I sum up this column The number of of samples that we had If we did this a thousand times the number of random number Generations that gives us a double check That everything is is properly calculated This is going to be the percent of the total So if sixty two people Sixty two times zero people arrived Then that means that sixty two out of the thousand times Sixty two that divided by a thousand or six point two times Is the percent That that zero people arrived out of the thousand So we can represent this as a percent of the total Right so if one seventy one Times one person arrived in the one minute interval Out of a thousand one minute intervals I could take one seventy one divided by a thousand And we would get the seventeen point one and so on So then we can say okay well if I look at the mean of the data We can we can calculate the mean And I could just use my formula in Excel to do that And that would be an average formula So I would take the average of this data set The whole thing I'm just taking the average of the whole data set Adding them up and dividing by the by the number of events Which was a thousand and then the variance This is the variance using the population formula And the variance using the sample formula And the point that we want to come out to is the mean Of course is close to the mean that we had when we started out Now remember that if you were doing this in practice You were sitting there with a stopwatch counting the arrivals You might not have any you might not know the mean You might have an idea of it but you wouldn't know what the mean is The reason we started with a mean here is because I need the mean In order to to generate these random numbers In accordance with a Poisson distribution So so notice when I take the mean over here Obviously it's going to be somewhat close to but not exact To the mean that we put in when we generated the random numbers The point we want to look at however is that if I took my data And also did my common calculation of the variance of the data Which would be this formula then if it's close to the mean Then that is an indication that it might be following a Poisson distribution So if I look at this if I looked at my data And I said okay is this giving me any predictive power On how I'm going to regulate my lines at the roller coasters And what not and how many people are going to show up Well then I can start to say let's analyze the data with the mean of the data Let's take a look at the variance of the data if those two are similar Then possibly I can approximate what is happening here With a Poisson distribution representing it with a formula Which will make it easier to kind of make predictions into the future We can also plot this information so if I plotted the frequency So we're plotting the frequency here So here's the numbers zero up to 29 Remember that when we're looking at a Poisson distribution The idea is that it could go infinitely up in this direction It's not going to in practice because it's not like an infinite number of people Are going to show up in a one minute time interval That's very unlikely but in theory it's like going up to infinity That's why you will typically have a skew to the right So this isn't exactly bell shaped It's going to be basically skewed to the right is the general idea Now if I plot this data we're saying hey this looks kind of like a Poisson type of distribution And we know the mean is equal to the variance So that's giving us more evidence that this looks like a Poisson type of distribution This is the same thing but instead of representing the numbers in terms of the count The frequency column we're looking at the percent of the total So you can see you can in essence represent the same concept with the percent column As opposed to the and you get the same kind of distribution look and feel Quick look over here if I was to remember calculate our If I was to think about the mean and the variance calculations You'll recall that if I take all my data sets and subtract out the mean Which is this number we're looking at the distance between each of our data points The number of arrivals that showed up in a one minute interval Minus the mean and there's our difference And if I was to add up all the differences I don't have all my data But if I added up all the differences in a thousand samples it would come out to zero So you'll recall that we had to do something to get our number of spread Which is usually the standard deviation and the variance We squared it so we squared it so this is the difference squared Making all the numbers positive but also a lot larger as they are now squared Summing that up that gives us our two five five three So the squared sum of the difference and then if we divide that by the count Which was the thousand we did a thousand samples that gives us our variance Just to show you how we're getting the variance over here Another format to get the variance and then if I take the square root We get the standard deviation Okay so then let's now say okay well it looks like it might be a Poisson distribution So now if I take my same data the number of arrivals I'm going to use the mean This time I'm going to use the mean that we calculated in our actual data Two point three seven three And I'm going to use that to generate our Poisson dot dist Function so now I'm going to say let's take my same number of arrivals could be Either zero one two three four the number of people that arrive in a one minute interval And I'm just going to now estimate it not with my actual data But with a Poisson dot dist because I've now come to the determination That it might follow that function which gives me more predictive power And so all I need with a Poisson dot dist You have the X which is going to be the one the number of arrivals zero one two And then comma the mean all you really need is the mean And the mean we have over here at the two point seven three And then comma the cumulative the cumulative argument is going to be If it will be cumulative you're asking What's the what's the likelihood of say like zero to three people would be cumulative Up to three which would add up these percents Whereas if you say it's not cumulative which we're doing here false or zero Then you're going it's going to say what's the likelihood of just one person arriving Just two people arriving in this case not zero to two people And then we get our percent likelihoods this percent likelihoods Is is kind of similar to our percent of total column we got over here We're not getting the actual frequency because we didn't actually tell Excel How many tests to run we just use the Poisson distribution to give us this percent This percent column and then however many tests we run whatever the sample is We can multiply times the percent to get the actual numerical values right So that's going to be our Poisson formula If I look at the difference between what we got here with the with the Poisson Versus over here I got two point six point two seventeen point one twenty four point three Versus six point five two seventeen point eight twenty four point three So this would be another indication if I'm doing this in practice I'm like okay I got my my data set and look like the mean is close to the variance Which means it might be a Poisson distribution the actual graph looks kind of like a Poisson distribution If I plot out a Poisson distribution and look at what I got in my actual data set Compared to the Poisson distribution in terms of percentages it looks pretty close Which means that the Poisson distribution might give me some good predictive predictive power So so this is another I'll I'll go over this in a second let's first go to the right here So if I if I take the mean of the Poisson distribution obviously it's going to be that two point seven five And I can get that mean by and so remember when we when we calculate the Poisson We put in that it was two point seven five but if you look at this column here This column is taking column x times column p of x So if I multiply those two columns out all the way down so this is like this one is like two times point two four three Gives us about point four nine and then I sum up this column That's one way that we can get the mean over here so we get the calculation of the mean And we can also do that with a formula which is a sum product formula Just to get an idea on the Excel formula this is going to do the same thing So it's kind of a fancy formula that could be useful from time to time Some product and then we're picking up the two basically arrays And that will in essence do the same thing right we're taking the sum product of this column and this column So we'll test that out in our Excel presentation if you want to test out the formulas And then the variance if I take this and then this is x minus the mean on this column And then this is going to be squaring that number that's going to give us this column And if I sum up that column that's one way that we can basically calculate the variance And I can also do that with a sum product formula as well which we'll do in Excel if you wanted to do that And put everything into a formula here Notice it's a little bit kind of weird you might be saying well that looks a little bit strange to calculate the mean and the variance Notice it's a little bit different than when we actually had our data over here Because when I actually had my data then we're working with the data Whereas instead of working with like the percentages of the totals Whereas when I calculate the Poisson distribution over here We're not looking at like a sample of a thousand data units We're looking at the percent of the percent likelihood So we're basically saying this is the percent likelihood that one person shows up in the one minute time period So in any case those are going to be those items And then we can ask questions like What's the likelihood that less than that you have less than three arrivals So the likelihood that you had less than three arrivals If I look at my data set now you have to be this is equal or less than So if I went to my arrivals over here I can say well you got 6.52 of zero people 17.8 of one, 24.3, 22.12 So we come out to that what we get over here 70.75 about because those numbers were rounded And then you can also use the Poisson distribution using the cumulative in order to calculate that So you can say well it's going to be the Poisson distribution The X is going to be the number that we want which I just hard coded as three And then comma the mean is going to be the mean that we calculated up top or the one we started with at the left And then comma and then this time the one represents cumulative true or one So now it's counting up to that point zero, two, three instead of just giving us the result for three itself And then here we've got another question that could come up between if we want to be between say two and five Now that's a little bit tricky because in practice problems you want to say okay are you including the two or the five So you have to determine if it's between two and five and you're not including the two or the five Then you're just talking about the three and the four which would be 22.12 plus 15.09 So in like a book problem you have to be very specific about whether or not they're including the two and the five So we put in between the two which means we're not including the two or the five in this case Or you can use the Poisson distribution, this one gets a little tricky because you might say okay how is that going to work Because if I look at my Poisson distribution if I say the cumulative function If I say for five would be how many times it would be this up to five So but how do I get between two starting at two? Well I have to go from this up to five and then subtract out the amount that goes up to two Right so I have to do a subtraction so I could do that with two Poisson distributions in one formula Poisson distribution and then we're going to go up to four in this case because we're not including five So that's where it gets a little tricky because this isn't including five so I'm going up to four Comma the mean and then comma it's going to be cumulative minus we're subtracting another Poisson distribution here Which is going to be going up to two and then the mean and it's also going to be cumulative Alright so if we graft these two things together then these are our graphs two different formats of the graphs So this is going to be blue is the actual Poisson and then the the orange is from our data set So you can see they're not exact in other words if I looked at my actual data set that's what we generated over here That's going to be in essence this chart that we grabbed that we got with our data set or this one with the percentages And then we compared that to the data that we got from the Poisson the actual smooth curve that we got from our formula And they're pretty close so in practice we can say okay we were going to gather our data we then took our data and we said Does the mean equal the variance if it equals the variance it might be a Poisson distribution we then grafted It looks kind of like it might be the shape of a Poisson distribution we then calculated the Poisson distribution Looked at the difference between it and the shape and it looks pretty close and then we can graph it on top of each other And say hey look these two things look they're not exact but they line up fairly close and therefore if I can use the Poisson distribution Then it's going to give me more predictive power into the future because I can just plug numbers into an equation That I have an equation for as opposed to a random outcome of numbers that I have no real way to extrapolate into the future That would be easy without you know more complicated kind of methods