 Statistics and Excel Exponential Distribution create and compare sample line weighting data to exponential distribution. Got data? Let's get stuck into it with statistics and Excel. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay, whatever, because our merchandise is better than their stupid stuff anyways. Like our crunching numbers is my cardio product line. Now, I'm not saying that subscribing to this channel, crunching numbers with us, will make you thin, fit, and healthy or anything. However, it does seem like it worked for her, just saying. So, you know, subscribe, hit the bell thing and buy some merchandise so you can make the world a better place by sharing your accounting instruction exercise routine. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. You're not required to, but if you have access to OneNote, we're in the icon left-hand side. OneNote presentation, 1580 Exponential Distribution create and compare sample line weighting data to Exponential Distribution tab. We're also uploading transcripts to OneNote so that you can go to the View tab, the Immersive Reader tool, change the language if you so choose. Be able to either read or listen to the transcript in multiple different languages, tying in the transcript to the video presentations using the timestamps. OneNote desktop version here and prior presentations. We've been thinking about how we can represent different data sets both numerically with calculations like the average or mean, the median, quartiles, and pictorially with things like the box and whiskers and the histogram. The histogram being the primary tool we envision when thinking about the spread of data and we can then describe the data on the histogram using terms such as it's skewed to the left, it's skewed to the right. We're now looking at those families of curves which have functions related to them, which often approximate data sets in the real world. If we can approximate a data set with a curve, that would be great because it gives us more predictive power with the actual formula. We've been looking at different types of curves in the past that are often represented in real life scenarios such as the uniform distribution, Poisson distribution, binomial distribution. And now we're looking at the exponential distribution remembering that as we saw in a prior presentation, it's often related to the Poisson distribution. And in practice in a business scenario, we often see the Poisson distribution in line waiting situations, asking questions such as what's the likelihood of so many people arriving within certain time intervals like one minute or one second. And we also could do it over space or distances such as how many potholes are in so many miles of roads or what's the likelihood of so many potholes being in so many miles of road. And then if we ask the question for the exponential distribution, we kind of flip things on their head asking what's going to be the average interim time between the next arrival, for example, in a line waiting situation. Sometimes it's a little bit more difficult to kind of envision the exponential distribution and how it fits into like a line waiting situation. So now we're going to do a problem that is going to be similar to what we did in the past for Poisson and binomial, meaning we'll try to generate some random numbers that will be equivalent or similar to us actually going out and testing and collecting the data so that we can then compare that actual data to a Poisson distribution, which will be the smooth curve. So we're going to start off and imagine that the mean arrival time in hours is 10. We're going to imagine there's 10 arrivals in a line waiting type of situation or a meeting, how many people are going to show up in a certain amount of time. We're going to say the mean number of arrivals or customers to a restaurant or something like that is going to be 10. Note that this is generally the question that we ask when dealing with a Poisson type of distribution, which is what's the likelihood that so many people might arrive in some certain time interval. In this case, an hour, the average arrivals we're saying in an hour are going to be 10 in this case. Now, whenever we're dealing with time, we always have to ask, are we going to be thinking about this in terms of hours and minutes or in seconds. In this case, let's break it down to down to the arrivals in minutes. So if 10 people are arriving in an hour, I can take the 10 divided by 60 and we're going to say about 0.166 people arriving in a minute interval. That could be a little bit of an abstract feeling number because now you're saying, okay, that's obviously less. You can't have less than a whole person arriving, but obviously it's an average, the concept of the average. So then we're going to kind of flip this around to go from the Poisson type of question to an exponential question, which is the inter-arrival time in hours. Let's first think about it in hours. So now we're thinking about how much time on average we'll be passing between arrivals. And so if we have 10 people arriving per hour, I can take 1 over 10, and that would give us about 0.1 hours. So now we're looking at hours in terms of a fraction of an hour. So, you know, 0.1 hours. If we looked at it in terms of minutes, I can say, okay, if there's 0.166 people arriving in a minute. So I can then take my calculation and say 1 over 0.166 it goes on forever and then a seven is about six minutes. So we have about six minutes of the interim time between the arrivals. Now what we want to do is imagine that we're actually going out there and counting the intervals between arrivals. So we're out there at the restaurant or wherever the line is, and we're actually counting the time between the arrivals. So one person arrives, we've got the stopwatch going, and then we're going to count to how long it takes for the next person to arrive. We're going to try to generate this with a random number generator, which is a little bit more complicated because you can't just use the normal random number generator, and we don't have the Excel functions to give a random number generation similar to what we had with the binome.dist and the Poisson distribution. So we'll make up our own kind of formula here, which is going to be this. So it's going to be equal to negative lm, which is a natural logarithm. I won't go into that in detail, but note what we're trying to do here is generate the randomness of the numbers that still has a random element to it, but it follows kind of the conditions of what would be present in a natural setting that has a exponential distribution kind of relationship to it. So then we're going to say one minus the rand. This is the normal random number generator that we've seen in the past for coin flips and whatnot, and then we're dividing that by this mean arrival rate in minutes, the 1.66 on forever. So don't worry too much about that formula. Just the concept is we're imagining that we're going out there. We're actually testing this out with our stopwatch, and this is a random number generator that gives us an approximation of what would happen in real life if our line situation followed a Poisson distribution, which then you would think also follow when you look at the interim times and exponential distribution. So for example, this first one we had the first customer came in and the enter arrival time was 14. And then the next the next enter arrival time was 1.16 minutes. And then this is in minutes, by the way, instead of seconds as our prior example. And then the third, it took 2.83 minutes. And then the next arrival 5.81 minutes 4.15 minutes 1.1 minutes 0.15 minutes 0.07 minutes. And notice the trend that you kind of see when you're looking at this you had this fairly large one here. And then you had a lot that are pretty low. Your five is kind of big, but it's not too large. You've got a lot that are fairly low. 6, 3, 4, 7, and then it jumps up to 10. So now you had an interval between customers of 10, 12 fairly high, and then it goes all the way back down to 4, 12. It popped back up 4, 3, 9, 4, 14, jumped up to 14. So you've got a lot that are kind of in the lower range and then it jumps all the way up to 21. Which is a pretty, you know, we haven't seen anything that high for some time, right? And then you've got a lot of lower stuff and that's what we would generally expect oftentimes in these kind of line waiting situations and which is why it gives the kind of the character of the curve sloping downward that we've seen with that exponential distribution. So if I took the mean of this data, if I just took the average of this data, we're now getting to 6.49, which is fairly close to 6 because we used that 6 as our kind of interim, that was our mean of our data set that we kind of put into our calculation with this .166 over here. So the mean is kind of what we would expect with our randomly number generated numbers. And so then what I would do in Excel is just copy this whole thing because this cell has this random number generation in it. That means that these cells are always going to regenerate randomly. So what I'm doing, I would just copy the same data over. These numbers are different because you could see it basically juggled the numbers around. But the idea is that this is still being generated with this function, but now we hard coded the numbers so they don't change because of that random number generation tool. Okay, so now we can count the frequency of these items. And we can do our standard kind of frequency type of, let's do it this way, where we would say, okay, here's our bands. So these are representing the minutes between arrivals now. So if I look at these numbers, what's the count that we have? Zero minutes on up to the 40 minutes between the arrival times. And note that you can't really use a count if function to do this because these numbers over here are now not whole numbers. So we have to use the frequency, which are going to give us the bends. So here's our fancy frequency, which we're just going to take the data array over here. And then we're going to say the bends, which are over here. That's going to be our array function that'll give us the frequency. And then it'll put these items into our buckets. So in this case, then we have the number one. So one minute, how many times did we have the one minute? And in our data set on this side, 51 times. Two minutes or up to and including, you know, two minutes, we had 30 of those three minutes. 31, four minutes, 19, five minutes, 20, six minutes, 18, seven minutes, 14, eight minutes, 14, nine minutes, 18, 10 minutes. And you can see it starts to go down as we get up to those higher numbers. So there was a lot of them you could see that are on the lower side. So the time between arrivals, we tend to have a bunch that are on the shorter side when things are following this exponential distribution. And then we've got a few that take a lot longer in the intrams. And that is how you can kind of imagine what's happening with our curve. So then we're going to say, okay, I can also represent this in terms of a percent of the total. So these frequency bins, if I add them up, should add up to the number of counts that we did over here, the number of customers that we looked at and saw the intram time, which was 300. So that looks correct. And so I can divide each of these then by the total of 300. So 51 over 300. Whoops, hold on a second. 51 over 300 gives us the point gives us the point one seven or 17%. So there's going to so we can represent this as a percent then as well, which is what it's going to be represented as when we do the actual exponential distribution. And so that's a that's showing that calculation. Okay, so then I could do it. I can do it this way. X equals the arrivals during one minute and let's this time use our actual exponent dot dist. So now I'm going to do the same thing, not using our randomly generated numbers, which represent us actually going out there with us with a stopwatch. But now we're just going to do the smooth curve using our exponent dot dist, where I'm just going to take the X here, we're going to take the lambda. And then we're going to take the cumulative it's not going to be cumulative. So we put a zero. So now we're going to plot this out with our actual curve, which is similar. Notice it's giving us it's giving us the percentages, right? Because when I use this curve, I'm not going to get an actual frequency because because we're looking at the percentages. So then I'd have to, you know, if I looked at this one, what's the likelihood that we have the one minute. And then if I did it 300 times, you would think the 300 times the point one four one one would be the actual frequency, you know, of it. So this is that's why you need the percent that we have so we can compare over there. And so so this is what we get when we get the smooth curve or the curve generated from our function, right? And you can compare these out. So if this is the one this is versus one, two and two, three and three, four and four, five and five. And so you could see there's some what similar. And so if I was to plot this out, this is the enter arrival times from our actual data set plotting this out in a histogram, which looks like this. And you could see it kind of it's approximating the shape that we would expect. It's not perfect, of course, because we didn't generate we only generated 300 numbers. Here's here it is with another another type of graph. And then if I looked at it in comparison to the actual the actual curve, which is the blue curve in this case. So the blue curve is a nice smooth curve compared to the random generated curve. We can see that it approximates what we would expect from from the exponential distribution. So and so the the general idea with these line waiting situations like why does that happen? And you could see why it kind of happens here is because you've got these the times are often short. The intervals are often short, but then you have some of those intervals that are the that are the long intervals, right? And that's what's given it that characteristic type of shape, which often happens in these line waiting situations. So if you were in a if you so if you saw the poisson distribution and a line waiting situation, then again, oftentimes you would think that if you took the exponential the time between that it would follow this kind of exponential characteristic shape as well.