 Statistics and Excel. Poisson distribution formula and graph. Got data? Let's get stuck into it with statistics and Excel. You're not required to, but if you have access to one note, we're in the icon left hand side. One note presentation 1520 Poisson distribution formula and graph. We've also been uploading transcripts to one note. So if you go to the view tab, immersive reader tool could change the language if you so choose being able to then read or listen to the transcript in multiple different languages using the timestamp to tie in to the video presentation. One note desktop version here in prior presentations. We've been thinking about how to describe different data sets using both mathematical descriptions, such as the average or mean quartiles and the median and pictorial representations of data like the box and whiskers and the histogram. The histogram is what we usually visualize when thinking about the spread of data, which we can further describe by saying the histogram is skewed to the right or skewed to the left. For example, now we're thinking about families of curves that might be useful to represent different data sets. And if we can represent a data set with a curve that has a mathematical equation related to it, that would be great because it might give us more predictive power about whatever that data set is related to. So last time we looked at the easiest kind of curve or family of curves, the uniform distribution. Now we want to look at a little bit more complex one, this being the Poisson distribution. So it's not that complex, it's not like it's poison, and we're not poisoning you here. It's a Poisson distribution, the emphasis on the second bit there. So similar kind of concept here though, if we look at the data and we see that it's characteristic of a Poisson distribution, if we can get the formula, even though it's more complex of a formula, then again, that formula can give us predictive power. Now oftentimes, in a business setting, the Poisson distribution is related to like lying waiting situations, which can come up quite often in different kind of business settings. So it can also be something that could be distributed over space as well. We might look at an example like that in the future, such as how many potholes are in so many miles of road, might be an occurrence that happens to follow a Poisson distribution. And if we recognize that, then again, it might make us a little bit easier decision making. So let's first think about the conditions that would be present in order for a Poisson distribution to happen. So when we're thinking about the examples, we're thinking about real life examples that might have a distribution of occurrences similar to a Poisson distribution, in which case we can simulate those occurrences with the function and the curve. So an event can occur any number of times during a time period. So if we think about our classic lying waiting type of situation, we're trying to think about how many people might arrive in a certain interval of time. That's what we're usually thinking about. Now, obviously you would think practically there's an upper limit to it, but theoretically an infinite amount of people could arrive at any time interval in a lying waiting type of situation. So events occur independently. So we're not talking about events that are going to be dependent upon each other. They're going to be independent type of events. So the first event doesn't affect the second event. So for example, if you're talking about a card game, and you were to pull one card out of the deck, then and then pull a second card out of the deck, the second card odds would be dependent on the first card that was pulled out of the deck in some way, right? Because now there's 51 cards in the deck. But if you put the first card back in the deck, now you have two and you shuffle it, right? Now you have two events of pulling the second card, which are kind of independent. So we're talking here about a Poisson distribution where the events that are happening in a lying waiting situation, people coming into the line are independent. So the rate of occurrence is constant. That is the rate does not change based on time. So we're not going to have a rate change on the time. So if we're thinking about a lying waiting situation, we might be thinking about that situation during a certain interval of time such as rush hour, where like the rate is constant during a certain interval of time, or if it was a restaurant like lunch hour, or if it's a roller coaster, like in the morning or something when it's most busy or something like that, the probability of an event occurring is proportional to the length of time period. So that's what we're looking at here. We're trying to think about time frames and how many times an event is going to happen within a time frame, which we'll get into in more depth in a bit here. Here's the formula for it. Now note this is a kind of a complicated looking formula, but you don't want to be too intimidated by the formula because in Excel, what we want to do is basically use the functions in Excel. And what we really want to be able to understand is when would a Poisson distribution be applicable and how to use the function in Excel. And the idea would be similar to what we had with the uniform distribution, which is that even though the formula is complex, if I know what the formula is, I can plug the numbers in and get an approximation based on the curve that will be representative of the actual data in real life. Now to actually draw the formula, remember that you can always go into the insert tab, and you can go into the equations up top. And then you can go into your ink equation, and you can draw the formula out here. And so and so I think this is the easiest way to do it. If you want to kind of represent a formula, go boom, I'll just do this quickly because we've seen in the past. And then this equals lambda is kind of hard to do lambda. It's got it. See, I got it. You picked up lambda with that crazy mess, lambda. Any case, I won't draw the whole thing out, but you can you can plot it out this way as we've seen in a prior presentation, and we do do that in the Excel presentation if you want to check that out as well. But you can use that any Microsoft product will basically have a have that similar kind of a feature to actually draw a formula. So why is this here? I don't know what this is doing here. All right, other other things. Notice if I look at this formula, we've got we've got then the lambda to x times e to the negative lambda over x factorial. So let's take a look at some of those components of the formula just so we can get an idea of what's included in it. Now e is a constant, it's kind of similar to pi. So it's got that number that goes on forever. It's around 2.71828 and so on. So so that's going to be part of the formula, a factorial. What that means if I have, for example, five factorial, that would be like five times four, you know, times three times two. So if I put the calculator, what this would look like is five times four is 2020 times three is 63 times two is the 120. And that's the factorial. So that's what the exclamation is representing in the formula. And then we have the e in excel, how can we get to that e where you could type in that that but it's approximation. So if you can get much closer to what e actually is since since it goes on forever with the equals exp and then put a one here and that'll give you this number. So we can use that if we were to use the equation, although we won't be using it much, because we're going to focus on the excel function, which will do this for us. And then the factorial, if you wanted to do a formula, a function in excel for the factorial, it would be equals F A C T. So if I put in equals F A C T five, it would give us the 120. So we have those. So if I wanted to type this into excel in just a formula without using the function, we could do so using the factorial for the for the exclamation and the exp for e. Okay, so other items note that the mean when we think about the mean in prior presentations, we saw the mean often represented as a mu or sometimes an x bar. Here it's often represented with a lambda. So when we're talking about a Poisson distribution, you might see a mu, but you'll often see the lambda. Now remember that when you're typing these in, it can be kind of hard to find them sometimes. So if you're trying to represent this, you could put a u, but it's hard to find a lambda. But if to get a mu or a lambda, you're going to go into the symbols up top. And then notice these are in my my favorite symbols because I've been using them. If they're not up there for you, you can go to more symbols. And then you want to go to the Greek and Coptic. And I usually put this side to just plain normal text. But I think it'll work there as well. Anyways, we're going to go, we're on Greek and Coptic. And then you can find your lamb as it should look like this. And this once you find them, they'll be in the recently found areas and you can add them fairly easily. So the variance, the variance is sigma squared that we've seen in prior presentations, you could find the sigma there in the same Greek area. And then the squared is a subscript that we talked about in a prior presentation, if you need to do that for the poisson, the mean and the variance are equal. So so that's one of the characteristics of a poisson type distribution. So for example, if you were to generate your data, and you did your calculations of the variance, and it was somewhat equal to the mean of your data, that's one indication that hey, this might be be be following a poisson distribution, in which case I might be able to approximate the data with a formula, which has a function in Excel, which might give me predictive power about, you know, the what's going to happen in the future. Alright, so let me let's give it let's get some example of some data here to get an idea of what's going on. If I want to use this poisson function in Excel, then we just need that mean or the lambda mu or lambda. So we're going to say here it's five. And then the rows when I type in the rows in Excel, I'm going to put 70 rows. So then our data, if I if I plot this out, I'm going to say we have x on the left. And then we've got x of P, which is going to be our function on the right, which is going to be our poisson dot dist function. So, so we're just plotting it at this point in time, we'll take a look at more specific examples in the future. But again, usually oftentimes, you're looking at a line waiting situation, which would be something would look something like this, if it was a line waiting situation, you might say, whatever the time interval is, let's say it's a minute or something like that. We're saying what's the likelihood that four people will arrive within that time interval. So in this case, we have 17.55% likelihood that four people arrive within the time interval of a minute. What's the likelihood that five people arrive within the time interval of a minute. And then you can answer questions such as what's the likelihood that somewhere from one from zero up to six people arrive within the time interval of a minute. And that would be the sum of all of these 10.44 plus 14.62 plus 17.55. That's, that's kind of the idea of the outcomes that we'll get. But right now, let's just graph it. So up top here, notice, I have the sequence formula up top. So we'll do this in Excel, the sequence formula, when you actually just put the X's, I'm going to put X's from zero on down to 70. I could use that by just formula, copying the formula bar down or the auto fill down. But you can also use the sequence formula. And that's useful, because if I put a sequence formula in this cell, saying that I would like to have the number of rows be 70, plus one, because I want to start not at number one, but at zero. And then I'm going to say comma, comma, start at number zero, start at number zero. Then I can change this cell, and it will change my spill array over here, which is great. Now the arrays have pros and cons to them. We'll talk more about that in Excel, but that's a useful formula to know because it'll allow you to increase or decrease the number based on a data input formula. And then the Poisson distribution for each of these cells is going to be equal to poisson.dist. The X that we're picking up is going in the first case will be zero, then one, and so on. If you did a spill array, you can pick all of them up and let it spill out as we are doing here, comma, and then the mean, which is represented by lambda here is going to be five, and then comma, do we want it to be cumulative or not? Now this argument has to do with am I trying to add up everything up to a certain point, as we said with what's the likelihood between of one, zero to six people coming, or do we want just the likelihood of one individual spot like the likelihood of just zero, one, two, and so on. Here we're just going to put likelihood of each individual spot. So we're going to put a zero to represent that, which I believe would be false as well. You will go into that more in Excel if you want. And then if you want to look at that presentation, and then that gives us our data set, you can see that it that it tapers off towards the bottom. Now remember, when we looked at our requirements that in theory, it could go on forever, right? Because because you could have an infinite number of people come in in theory. But in practicality, it's going to generally taper off. And as we as we go, so if I want to approximate this, there's actually a tool within Excel, which will give us a random number generator, which will play with will practice with an Excel, which will give me a random number generation based on the mean, the mean number here. So in other words, similar to what we did with a dice ruling, where we said I would like to get a random number generation between one and six for every 1000 rolls of the die. You could say I would like a random number generation from Excel, but I would like it to be such that it follows a Poisson type distribution, meaning the data has a certain mean. And and the data, you know, follows these conditions, these conditions happen to be met, which often could happen in real life, right? If these conditions are met, and you know the mean, then you would think that data is going to give you a Poisson distribution. So there's an element of randomness here from these numbers that were generated from Excel, but they're representing also a randomness of a finite number of numbers, instead of like the infinite number of numbers that we think about like in theory. So so so then so this would be kind of like, as if I went with a stopwatch to my line waiting situation, and actually counted how many people show up, for example, in whatever interval I'm talking about, right? I went there and counted that 15 people showed up, and then 10 people showed up in the next time interval, three people showed up in the next time interval, and so on and so forth. So we might first actually pick up this data to look at it to see if it actually does approximate a Poisson distribution. So we're imagining we actually gathered this data from our line waiting. Now we're going to do our bends, we're going to count them and say how many of these numbers are 012? How many of these numbers represent zero people showing up in the time interval one person, two people, three people, and so on. And then this is a frequency distribution. In order to get that the frequency distribution formula in Excel is equals frequency brackets, the data array, which is all of these numbers, the actual data comma, and then the bends, which are all of these numbers. Now note that you might say, I don't why am I using a frequency? I could use a count if formula, meaning I could say count if you find in this data, this number one zero one two, but with this random number generator, the count if sometimes doesn't pick it up, because I think maybe some of these random numbers aren't exactly whole numbers sometimes. So you might try to make these numbers whole numbers, you could round them to be whole numbers. But I find that this frequency distribution picks up the proper numbers quite well. And you can tell because if you sum up all the numbers down here, it should sum up to 1000. Now remember that I only went up to 30. Like you it could be possible that one of these numbers are more than 30 people showed up, if that's the example we're looking at, because it because the upper limit is infinite. But it's unlikely that that's going to happen, given given given the characteristics of the Poisson distribution. So it's in theory possible, but it's likely that you're going to so there's no really upper limit. But in theory, there's going to be basically an upper limit for the most part, right? So then we can look at the relative frequency the relative frequency is any of these numbers divided by the total. So for this number 12, it's 12 divided by 1000. And that's going to be 1.2%. This is 14 divided by 1000. And that's 1.4. By the way, this 1000 down below is a check figure, because I generated 1000 numbers, I didn't I didn't paste them all here. So you don't see the end of it. But I generated 1000 numbers in Excel. And so they all showed up, I can tell in our boxes and our groups and our bends, because it adds up to 1000. Now this is the difference between the relative frequency we got in our example, and the Poisson distribution. So here in the Poisson for one, I got 3.37 for two 8.42. And here for one, I got point one and point one. And then for three, I got I got 1.2. And over here for three, I got 14.04. And and then 3.3 for five, and 17.55 over here, and so on. Now I don't think these are exactly tying out because I actually changed, I changed this one to to the lambda of five. And when it's because I was I was changing the data sets over here, whereas over here, I've got the data that was generated with a lambda of 10. So this difference column isn't isn't really helping us out right now. But if I changed in Excel, I'll change this to 10. And then and then I'll change the rows to match as well. And then when we look at this difference column, it should be a closer difference column. But in any case, if I was to plot these these two on the same graph, and again, they don't look the same at this point, because I use the different midpoint. But you could see the the the two distributions one one of them is generated from the data that we observed. And when we look at the data that is observed, you can still say, Well, that might follow a Poisson distribution, it looks kind of like it's following a Poisson distribution. It's got that characteristic look to it, meaning it's a little bit skewed to the right. But it's but close to bell, you know, bell shape depends, we'll talk more about the the the shape of the curve as you change as you change these numbers in our excel presentations, but it's got somewhat of a bell shape, but it's always going to be slightly skewed to the right. And so we see that and we can also as we'll see in future presentations, look at some other characteristics, such as is the data we're looking at following these criteria. And then we can also see whether or not with the data set, the the mean is equal to the variance or similar or close to and if that's the case, then we're going to say, Okay, maybe a Poisson distribution is relevant. And then we can actually do our Poisson distribution, which will give us predictions, predictive power into the future more so than taking past data, right? So in other words, if I looked at this data, and I said, I don't think this data conforms to any line, whether that be a uniform distribution, or a Poisson, or any other distribution we'll look at in the future, if that's the case, then it's going to be difficult for us to make predictive power into the future, because we're gonna have to use more complex modeling to do that. But if it does, many things do conform to some of these families of curves. And if it does, then that equation is going to be easier for us to make predictive power in the future. Now in Excel, we'll also graph these two things together. So you can see how to put these two items on the graph together, the one we did with the Poisson, and the one we did with the actual data. And again, the Poisson would be fairly close to the actual data. If we change the Poisson data here to be 10, and the number of rows to be equivalent, which was like 30, I think, or but at least, and then we would then we would come up with these two that would be somewhat on top of each other, which again would be another pretty clear indication that the data set would be approximated in a Poisson distribution. And then we can possibly use that to make predictive power. If it was a line waiting situation, possibly, you know, about how many people we need at certain time periods and so on and so forth.