 Statistics and Excel, Poisson Distribution, Potholes and Road Example. Got data? Let's get stuck into it with Statistics and Excel. You're not required to, but if you have access to OneNote, we're in the icon left-hand side, OneNote Presentation, 1546 Poisson Distribution, Pothole and Road Example. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us, but that's okay, whatever, because our merchandise is better than their stupid stuff anyways. Like our, trust me, I'm an accountant product line. Yeah, it's paramount that you let people know that you're an accountant because apparently we're among the only ones equipped with the number crunching skills to answer society's current deep complex and nuanced questions. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. We're also uploading transcripts to OneNote so that you can go into the View tab, Immersive Reader Tool, change the language if you so choose, be able to then either read or listen to the transcript in multiple different languages using the time stamps to tie in to the video presentation. OneNote desktop version here. In prior presentations, we've been thinking about how we can represent different data sets using both mathematical calculations like the average or mean, the median, quartiles, and with pictorial representations like the box and whiskers and the histogram, the histogram being the primary tool we envision when thinking about the spread of the data, and we can describe the spread of the data on a histogram using terms such as it's skewed to the left or it's skewed to the right. We're now looking at formulas which will give us a line or a curve, which if we're lucky, will approximate the actual data sets in certain scenarios that we are working with. If we can approximate our data sets with a line or a curve, that will generally give us more predictive power about and over whatever the data set is representing. So in prior sections, we talked about the easiest line or curve, which is a uniform distribution. We're now looking at the Poisson distribution. In a prior example, we talked about a Poisson distribution as it applies to kind of it's one of its most common applications in business settings, which is a line waiting situation. And we were thinking about how many people might show up within a certain interval of time. However, you can also go over distances. So in this case, we're going to think about a pothole situation, and we're going to be thinking about how many potholes are present in not a space of time, but rather a distance in miles. So that's what we'll look at this time. We'll kind of do it in a similar fashion as we did with the line waiting situation where you might imagine that you first are going to actually count the potholes. You might be looking at data that you have had in the past about how potholes are occurring and look at that data. And then once we examine the data, we're going to say, Hey, does this data match up to any basically common curve? Can we put a curve in place that would simulate this data? If we can, that could help us with future decision making in terms of how much maintenance we might want on a road, for example. Now in Excel, you can actually generate the results of a poisson distribution as if it's a random experiment. So we're imagining that we're going out, we're actually going to every 100 miles of road and counting the potholes in the road, or we might be looking at past the data that gives us this similar kind of information of how many potholes are in the road over time. So if we went out and we actually counted the potholes, then we're going to say that, for example, in the first 100 miles, there were 18 potholes and the second 100 miles, we counted 26 potholes and then 21 potholes and then 26 potholes in the 100 miles of road. Now we generated these numbers from Excel, but we're imagining that this is simulating kind of like the random generated cards that we talked about in our card playing example or the random dice or coin flipping. And so these are going to be all of our sample data. Then we can organize our sample data into bins. So once we've counted all the potholes that are located in every 100 mile span, then we can say these are going to be the bins that we're going to put them in. How many times did we have zero potholes and the 100 mile span? How many times did we have one pothole and the 100 mile span? How many times did we have two potholes and the 100 mile span and so on? We're going to call that a frequency distribution. In Excel, you can use the formula. This is an array formula. Now you might think that you can use the count if formula. You might say I can say equals count if you find this zero. We're going to tell Excel if you find that zero and this set of numbers. And I think we generated 1,000 numbers, which would represent 1,100 miles tests that we had and counted the potholes. Every time you see a zero in there, then give me the number. Count them. But the count doesn't always work great when we use these random number generators because sometimes I think the number is not exact. And so the frequency distribution, although a little bit more fancy in a spill array function usually picks up all the numbers. So we're going to use that. We're going to say the frequency of the array, the set of numbers, and then the bin array. I'm going to select all of these numbers and then it'll actually spill out the frequency. So down here, we had eight potholes in the 100 miles one time and all the 100 mile counts that we had, we had nine potholes two times in the thousand. I think we did it a thousand times of 100 mile counts. We had 10 potholes five times and we had 17 potholes 40 times. Now we can also, if I go down to the bottom of this, I didn't include all of the data, but I believe that in Excel, we actually took this data set down, I believe 500. So we are envisioning that we had 500 counts of 100 mile spans of road counting the number of potholes in each of those spans of road. Now this number here is representing our bins, representing the number of potholes. In theory, we could have like an infinite number of potholes when we're looking at a Poisson distribution type of situation. But obviously in practice, you would think there would be somewhat of an upper limit of the number of potholes that you're going to be finding in any 100 mile span. So if I was to add up all of these numbers here, then you would think that the total down here would get up to 500 because that's the number of data sets. I might have said a thousand before, but I believe it was 500. We did 500 counts. So if this ties out to the number of counts we did 500, that's evidence that our bins have been calculated properly. So then we can take the percent of the total. So for example, how many, what's the percent of times that we found nine potholes in our 500 counts of 100 mile spans, right? So I could say that would be, we found two times that that happened. So two over 500 is going to give us our point 4%. And if I scroll down, we could say what's the percent likelihood that we had 20 potholes in the 100 mile span. Well, apparently that happened 56 times out of 500. It seems like, so that's 11.2% that likelihood that we had the 56 potholes. So that's going to be our data set, which we can represent then in terms of the frequency, the count, as well as the percent of the total, which is there. So then if I took some standard calculations, if I took the mean of the data set, for example, and the mean is just simply taking the average of this data set, so the Excel formula would be equals the average of this data set, we have a mean number of 20.14. And if I take the variance of the data set, so now I'm looking, this is the var dot p, and then dot as for a sample and a population just to practice both of those of this entire data set, we get to 20.49. So one of the characteristics that's interesting with a Poisson distribution is that the variance, if it was a perfect Poisson situation would be equal to the mean. So if it's approximating the mean, the variance and the mean are pretty close, then we're going, oh wait, this might be like a Poisson kind of distribution situation, in which case we might be able to use a line in order to represent the data. So then, so we had the mean potholes are, before we go there, no, we can also plot the data. So if I plot this data, so this is a plot of the frequency, so we're taking the frequency here, and we're counting, so how many times did we have, you know, the 16 or the 20, you know, potholes, the highest one is 50 something, 55 or something like that. So that's the count that's happening here. So we had, again, the frequency represents the number of times it happened in the 500 tests that we ran. And if we're looking at 20 potholes, we had the 56, right? That's that one that's going way out there. It looks a little bit kind of like an outlier a bit. And if I look at, I can also do it on a percent basis. So if I take the percent, the percent of the total, notice you get in essence the same shape here, but now we're looking at it as a percent of the total. So that's going to, so now we're saying, hey, that kind of, that doesn't look perfectly like a Poisson distribution, but it looks like it might be a little skewed to the right, right? It might be looking kind of normal. Poisson sometimes looks like a bell shape, but it's slightly skewed to the right, generally, is the general idea. So we're thinking maybe it would be a Poisson. So now we could say, what if I took that mean, so now I'm taking the mean that we got here and the miles of 100. And I did an actual curve using the Poisson distribution. So this would be like the perfect representation, as opposed to our approximation with our, with our data set, if it wasn't a perfect line, right? So I can take my data set here, as if we're seeing how many times the potholes would show up representing zero potholes in 100 miles, one pothole, two potholes in 100 miles. And then if I do a Poisson calculation, which is this formula, Poisson dot dist, the x is representing the zero, the one, the two, and so forth. The mean is now going to be the 20.14 we got from our example. And then do I want it to be cumulative? No, I don't want it to be cumulative. I want you to give me the percent likelihood of each of each of these numbers. What's the percent likelihood that we get nine potholes in the 100 miles 0.27. And then I can compare this to what we actually got over here. Now remember, the Poisson distribution isn't giving us an actual frequency, the number of potholes, it's giving us basically the likelihood of the number of potholes, right, which is this column kind of represents this column. So I can compare this column, then if I subtract this column to this column, here's our differences. So I can look at those differences and say, does that look pretty close? It looks like it's pretty close to a Poisson distribution. An exact Poisson distribution as the data that we generated. And so then I could say, all right, we could then of course ask questions if we had our Poisson distribution, such as what's the likelihood of having zero to five. So if I said zero to five, let's pull out the trusty calculator. And we can say that would be, now I would have a cumulative from here to here. It's still pretty low. I've got 0.006. I can't count it here because I needed an actual other decimal to pull it up. But the idea is you could generally sum it up if I had another decimal to do it to get to the point 0.06, which would be the sum this way, or we can use the Poisson dot dist formula. This way I could say Poisson dot dist, the x is now going to be the five, the mean is still the mean over here 20.14. But now I do want it to be cumulative. So I put a one instead of a zero, and that'll basically take sum it up. So you got that nice Poisson distribution to give you the cumulative up to that point, which would be a common kind of question. What if I want seven to 14? So if I go from seven to 14, here's seven, and then down to 14. So I could add those up, I could sum them up being careful to say, well, are we including seven, or are we not including seven in a practice problem, or in any situation you have to be pretty careful and say, are we including those two ends, or are we really only saying that we want to include the eight to 13? But I could then add those up here. I won't actually do it, but you get the concept of it here. And then if I did this with a Poisson distribution, it's a little tricky. Because because what I have to do then is I have to say, well, the cumulative function will take me up to the upper limit, which I'm saying is 14, which I'm going to assume we're including the 14 here. So that would go up to the 14. And then I'd have to subtract out up to the lower limit, which we have six. So I'm assuming we're including the seven. So the lower limit would be six. So it'd be Poisson dot dist x would be 14, the upper limit, which I'm including the 14 in this case, and then comma and the mean would be the same mean over here. And it's going to be cumulative with a one minus Poisson dot dist up to six. So I'm not including the seven, because I want I don't want to subtract out the seven I want the seven to be included in the range we're talking about comma. And then the mean is going to be this one again, and it's going to be cumulative so we can do the calculation that way as well. Now this is a histogram plotting the and we're doing it with a bar chart by the way in Excel, but it's a histogram kind of format, we're plotting the different the the p of x, which is the perfect Poisson distribution curve histogram versus this graph over here, which was our actual data. And you could see they line up pretty close. You got that weird outlier right there. But they line up pretty close. This is the same thing with a line chart, as opposed to like a histogram type of chart. And so now if I'm looking at my data, and we're saying, Okay, these potholes, we counted all that can we get some predictions about these potholes. Let's count all the potholes that come up in every 100 mile, and see if we can analyze that data and see if there's any trends to this that might help us with our maintenance policy or something like that. And then we counted the potholes, we noticed that the mean is close to the variance. So we're saying, Hey, that looks like it might be a Poisson distribution. It looks kind of like it might be a Poisson distribution too. And then we actually plot out the Poisson distribution, compare it. And it's like, Yeah, the difference on their data versus our data is pretty close. So maybe we could use the Poisson distribution, let's plot them together on top of each other. And we could say, Yeah, it looks pretty much like the Poisson is approximating our data. Therefore, maybe going forward when we make decisions about how many potholes might be showing up in any 100 mile radius or something, we can use the Poisson distribution to make some approximations about that and plan accordingly. Notice that if the Poisson distribution did not correlate well, couldn't approximate the actual data, then we're left with a problem. Because then we have to say, Well, how am I going to figure out how many potholes are going to, we have to do something different to kind of extrapolate what we're going, what the data is going to mean going out into the future.