 Statistics and Excel. Poisson distribution. Potholes and road example. Got data? Let's get stuck into it with statistics and Excel. You're not required to, but if you have access to OneNote, we're in the icon left-hand side. OneNote presentation. 1546 Poisson distribution. Pothole and road example. We're also uploading transcripts to OneNote so that you can go into the View tab. Immersive reader tool. Change the language if you so choose. Be able to then either read or listen to the transcript in multiple different languages using the timestamps to tie in to the video presentation. OneNote desktop version here. In prior presentations, we've been thinking about how we can represent different data sets using both mathematical calculations like the average or mean, the median, core tiles, and with pictorial representations like the box and whiskers and the histogram. The histogram being the primary tool we envision when thinking about the spread of the data, and we can describe the spread of the data on a histogram using terms such as it's skewed to the left or it's skewed to the right. We're now looking at formulas which will give us a line or a curve which if we're lucky will approximate the actual data sets in certain scenarios that we are working with. If we can approximate our data sets with a line or a curve, that will generally give us more predictive power about and over whatever the data set is representing. So in prior sections we talked about the easiest line or curve which is a uniform distribution. We're now looking at the Poisson distribution. In a prior example, we talked about a Poisson distribution as it applies to kind of its most one of its most common applications in business settings, which is a line waiting situation. And we were thinking about how many people might show up within a certain interval of time. However, you can also go over distances. So in this case we're going to think about a pothole situation and we're going to be thinking about how many potholes are present in not a space of time, but rather a distance in miles. So that's what we'll look at this time. We'll kind of do it in a similar fashion as we did with the line waiting situation where you might imagine that you first are going to actually count the potholes. You might be looking at data that you have had in the past about how potholes are occurring and look at that data. And then once we examine the data, we're going to say, hey, does this data match up to any basically common curve? Can we put a curve in place that would simulate this data? If we can, that could help us with future decision-making in terms of how much maintenance we might want on a road, for example. Now in Excel, you can actually generate the results of a Poisson distribution as if it's a random experiment. So we're imagining that we're going out, we're actually going to every hundred miles of road and counting the potholes in the road. Or we might be looking at past data that gives us this similar kind of information of how many potholes are in the road over time. So if we went out and we actually counted the potholes, then we're going to say that, for example, in the first hundred miles, there were 18 potholes. In the second hundred miles, we counted 26 potholes and then 21 potholes and then 26 potholes in the hundred miles of road. Now we generated these numbers from Excel, but we're imagining that this is simulating kind of like the random generated cards that we talked about in our card playing example or the random dice or coin flipping. And so these are going to be all of our sample data. Then we can organize our sample data into bins. So once we've counted all the potholes that are located in every 100-mile span, then we could say these are going to be the bins that we're going to put them in. How many times did we have zero potholes and the 100-mile span? How many times did we have one pothole and the 100-mile span? How many times did we have two potholes and the 100-mile span and so on? We're going to call that a frequency distribution. In Excel, you can use the formula. This is an array formula. Now you might think that you can use the count if formula. You might say, I can say equals count if you find this zero. We're going to tell Excel if you find that zero in this set of numbers. And I think we generated a thousand numbers, which would represent a thousand 100-mile tests that we had and counted the potholes. Every time you see a zero in there, then give me the number. Count them. But the count doesn't always work great when we use these random number generators because sometimes I think the number is not exact. And so the frequency distribution, although a little bit more fancy, in a spill array function, usually picks up all the numbers. So we're going to use that. We're going to say the frequency of the array, the set of numbers, and then the bin array. I'm going to select all of these numbers and then it'll actually spill out the frequency. So down here, we had eight potholes in the 100-miles one time. In all the 100-mile counts that we had, we had nine potholes two times. In the thousand, I think we did it a thousand times of 100-mile counts, we had 10 potholes five times. And we had 17 potholes 40 times. Now we can also, if I go down to the bottom of this, I didn't include all of the data, but I believe that in Excel, we actually took this data set down, I believe 500. So we are envisioning that we had 500 counts of 100-mile spans of road counting the number of potholes in each of those spans of road. Now this number here is representing our bins, representing the number of potholes. In theory, we could have like an infinite number of potholes when we're looking at a Poisson distribution type of situation. But obviously in practice, you would think there would be somewhat of an upper limit of the number of potholes that you're going to be finding in any 100-mile span. So if I was to add up all of these numbers here, then you would think that the total down here would get up to 500 because that's the number of data sets. I might have said a thousand before, but I believe it was 500. We did 500 counts. So if this ties out to the number of counts we did 500, that's evidence that our bins-