 Welcome. This lecture is about creating random variables for use in simulation studies. The method we use you already know of as Monte Carlo sampling. As you know in simulation we have to have a way to generate values from a particular random variable distribution and the way we do that is with Monte Carlo sampling. This numerical technique goes back pretty far. The name Monte Carlo is the code name applied to a piece of the Manhattan project that was top secret and involved these techniques. In fact, the increasing popularity of Monte Carlo methods is what spurred the stimulus for the development of pseudo random numbers and pseudo random number generators that we talked about in the last lecture. Because there was so much in terms of using Monte Carlo sampling it totally exploded and using real random numbers from on a magnetic tape or entered in some way into the program that was used to generate the sampling methods just wasn't going to cut it anymore. It ended up being too inefficient and so it was actually the popularity of Monte Carlo sampling that caused all the work that's been going on in pseudo random number generators. We know that the word simulation is sometimes used interchangeably with Monte Carlo sampling, but in this course and in particular in this lecture I want to maintain the distinction between simulation and Monte Carlo methods. Simulation is experimenting with a model of version of reality and Monte Carlo methods are particular branch of mathematics and it's a mathematical technique used to solve mathematical or statistical problems. And specifically it's a numerical technique and algorithmic technique. And when you, the term Monte Carlo simulation actually is a distribution sampling. It uses repeated sampling from the distribution of interest to determine properties of some phenomenon. Well, probably because we're going to be summing or getting an average getting parameters, getting estimates of the distribution. And as we know, simulation is a broader technique and especially discrete event simulation, which is what we're studying in this course involves a time element. And as definitely distinguished from Monte Carlo sampling, which is a moment in time. It's a cross section, but a moment in time that doesn't move over time. It's a distribution that's being sampled from repeatedly, but the distribution doesn't change. Sometimes the term Monte Carlo is used in a sense of employing randomness or probabilistic activity in order to solve problems that might be deterministic in principle. Sometimes we might know the distribution and therefore we can sample from it, but if we want to solve for the mean, the parameter of the distribution, it may not be possible or it may not be easy to do it using expected value like we would when we can do it. So if the mean is intractable, it's very easy to get an estimate of the mean by many, many, many repetitions of sampling from the distribution of interest and then summing them up and dividing by the number in the sample, just like you do it in any field test. We could call it a probabilistic and numerical approach to solving something that should be solvable analytically. And actually, one of the earliest uses of Monte Carlo methods was to empirically get the approximate value of pi. And of course, today, starting back from the advent of large computers, Monte Carlo methods have been tied to computer programs, unlike the historical Monte Carlo methods in the 1700s, let's say, or the 1800s. These are some examples that are trying to show you a distinction between simulation and Monte Carlo methods and then Monte Carlo simulation, where simulation is said to be drawing one pseudo random uniform variate on zero one to simulate the tossing of a coin, one coin. If the value that you get is obviously it's a value between zero and one, if you get a value less than or equal to point five, let's say the coin is heads, the value is greater than point five, let's say it's tails. This is definitely a simulation and we've done things like this in class, but it's not a Monte Carlo simulation. Now, Monte Carlo method might be an example of that might be let's take a whole box of coins, pour them out and then figure out the ratio of heads versus tails. And that might be a Monte Carlo method of determining what happens in repeated coin tosses, but it's not a simulation, it's repeated sampling. A Monte Carlo simulation, perhaps, drawing a large number of pseudo random variables from the interval from uniform zero one, assigning values less than or equal to point five as heads, greater than point five as tails, and it's like repeatedly tossing the coin. Personally, as you know, I wouldn't call it a Monte Carlo simulation, but rather Monte Carlo sampling and we use it in simulation all the time, but I got this from Wikiwans and I wanted to be true to the source. So what are Monte Carlo methods? These are techniques that we can use to generate data. We, in order to do this, what do we need? We need a random number generator, which we know we can get. We need the cumulative probability distribution for the distribution that we're interested in sampling from. It works with the cumulative probability distribution. We'll see how in a minute. We always assume that we have a steady reliable source of uniform zero one random numbers. So we assume we can get a string of zero of random numbers between zero and one. And there are various ways we can get them. Of course, we're more than likely nowadays to get it from a program that generates pseudo random numbers, but we could be using a table of random digits. We could be using any probabilistic device like a roulette wheel, maybe even a deck of cards. Illustrate Monte Carlo methods. We're going to use the inverse transform technique. It's easy to understand, easy to demonstrate. It may not be the best technique around, especially in terms of efficiency, but it's definitely works and it's easy to understand and easy to explain. First, you need the cumulative probability distribution function, like the graph that you see on the slide in front of you. You use your random number generator to get a random number between zero and one. Whatever your, for whatever reasons, whatever your justification, decide how many decimal places you want and you generate a number between zero and one. Use a random number generator and a uniform, apply a uniform zero one distribution. And you can see the blue lines on that graph. Well, a random number between zero and one would be a value on the y axis. Wherever that lands out, we hang a horizontal line to the curve and then hang a vertical line down. So that then we can pick out the value of x that corresponds to that uniform zero one random number. Very, very simple, easy to understand process. The next step is that's x. That's really, that's the whole thing. X is the sample value. And then you repeat over and over again until you have the sample size that you're interested in. So here's how we would use the inverse transform method to sample values from the exponential distribution with a particular parameter. The cumulative exponential distribution is on the graph in front of you. Again, the vertical axis is f of x, or in other words, a number between zero and one. It's a cumulative probability distribution. It's going to be a number between zero and one. And the x-axis is values from this exponential distribution. Maybe it's times, maybe it's interarrival times in your simulation. The formula for the cumulative distribution function of the exponential distribution is listed there. One minus e raised to the power of negative lambda x. And this is, so how do we figure out x? Obviously, we don't want to do it with graph paper. First of all, we won't be as accurate as we could be. And secondly, well, we just don't want to use graph paper. We will want to manipulate the formula in order to be able to solve for x. That's the objective, solve for x. We enter this algorithm with a uniform zero one random number and we exit the algorithm. The output from the algorithm is the value of x and then we go and do it again. So the first step is that's the cumulative distribution for the exponential. The second step is to take the one to the other side. But if f of x is a number between zero and one, then one minus f of x is also a number between zero and one. So that's where we put the random number in. And now the random number is on the left side of the equation. The right hand side is pared down to just e raised to the power of negative lambda times x. And step four, step five are all intermediate algebraic steps to turn everything around and just solve for x. And we do that in step six. And now we have something we can use for repeatedly. We're coming in with a stream of random numbers between zero and one. We want to generate a stream of exponential random values. So each random number that comes in goes into that formula and out comes an x. This is to remind you of the relationship between the Poisson and the exponential distribution. The Poisson is a discrete distribution. The exponential is a continuous distribution. And we run into these a lot in discrete event simulations. Suppose we know that on average four customers arrive at our system, whatever it is, per hour. Well, that's almost exactly fits the definition of a Poisson process. Discrete events in a continuous interval. So this four must be lambda, the parameter of the Poisson process, which is not a function and doesn't change. It's a stationary Poisson process. So we look at the units. If the units on lambda is customers per hour, then that works for the Poisson process. If lambda is four, then one over lambda is one over four. And the units on that are hours per arrival. So the average into arrival time is a quarter of an hour, 15 minutes per arrival. That would have to be the mean of the exponential distribution. That's why generally, especially in simulation, when we use the Poisson and the exponential so much, we'll often use different designations, different notation for them. And we'll call lambda the mean of the Poisson distribution and mu the mean of the exponential distribution. But the notation is not uniform. So you really just take a look at the units, listen to the words. You have to know what's being talked about here. In this example, we're using the inverse transform method to sample from a distribution that's discreet. That's a customer arrival distribution. And it's an empirical distribution. In other words, we're just using the data as it is. And we're calling that the distribution. We've collected some data and we found out the proportion. We found out how the data is distributed. We're not giving it a name. We're not trying to fit it to any theoretical distribution. We're just using it as is. So here in this case, customers arrive. And these are 10-minute intervals. We have a probability of .4, 40% chance of getting zero customers in any 10-minute interval, 25% chance of one customer, 20% chance of two customers, 15% chance of three customers. That's the distribution. It's a discreet probability distribution where we're enumerating, listing the possible outcomes, the probability of each outcome. And in this case, because we need it, we're also indicating the cumulative probability where we just add one to the previous one. The probability, the cumulative probability of zero is .4. The cumulative probability of one includes the probability of one and the probability of zero. So it's .4 plus .25 is .65 and so on. So the cumulative probability of three includes all of the other probabilities. And that's the totality, that's one. With this discreet empirical distribution, the cumulative distribution looks like steps. And that's typically what you'll see. But when you plot it out, the x-axis contains the value, the x-value, number of customers arriving. The y-axis is the probability, the cumulative probability. So it's a number between zero and one. And we can do the same thing. If you use the inverse transform method, you generate a random number, uniform 0, 1. Figure out, look at it on the graph. Hang a horizontal line till it bumps into the curve. Then hang a vertical line to get the x-value that you got. So for instance, if the random numbers that we generated are .09, .54 and so on. Well, for .09, anything between zero and .4 will be zero. So that's a zero in the first time period. In the second time period, .54 gives you a one. .42, slightly above .4, also gives you a one. In the fourth time period, we get a .8. And that matches the x-value of two, two customers arriving in that time period. And then in the fifth time period, we go back down to zero. And so we've pretty simply used what would the exponential look rather complicated. But in this distribution, since it's discrete and we're just looking at a fairly simple distribution, it doesn't present any intellectual challenges for us at all, at least I hope not. But we're going to look at something soon that will give us a challenge. All right, one of our favorite distributions, the normal distribution. We know so much about it. It should be easy as pie to sample from the normal distribution and generate normal variants. Let's see. First of all, before even going further, we're going to forget about the normal distribution and stick to the standard normal because it's been studied. We don't need to derive any formulas. We don't need to create it all over again for every single pair of mu and sigma. The standard normal is usually called Z. Z is distributed as normal with a mean of zero and a standard deviation of one. Great. But of course, that's not really what we want to generate values from. Still, if we can generate the Zs, that's great because we could take any Z and transform it back into an X, solving for X again using algebra. So I guess that's a side note so that just to say when we're talking about the normal distribution and sampling from a normal distribution, we're really talking about the standard normal because it's a very easy transformation to go back and forth between what we'll call X and Z. The probability density function for the standard normal distribution is given in the middle of the slide. It's rather complicated, but it's not that terrible. It's one over the square root of 2 pi times e raised to the power of negative one half times X squared. The formula for the cumulative probability distribution of the standard normal is a little bit more complicated and not something we would really want to work with algebraically to solve for X. In fact, it's pretty frightening. Using the graphical solution I suppose would be an answer because the graph is a regular S graph typical of symmetric equals median equals mode distributions like the normal. But again, using the graphical approach is very inefficient and time consuming and it's hard to generate a computer program to do it. So do we really want to use the inverse transform method here? Probably not. Thankfully, there are a lot of other alternatives. I'm going to show you one on the next couple of slides. This method of sampling is based on the central limit theorem. We know from the central limit theorem that if you take the sum or actually the average of variables that are drawn from a distribution even if the distribution is not normal as long as it's relatively symmetric it won't take that long for the sums or the averages to take on the shape of a normal distribution. The example that we see here on this slide is tossing a die which is basically a uniform rectangular distribution. We know that when you toss a die you'll get a number of either one or two or three or four or five or six. They're all equally likely so the probability of any of those outcomes is one over six. But if you generate let's say two tosses of a die and your outcome is the sum of the face of the die or three tosses or four tosses, eventually you start to get something that looks more like a triangular shape and then eventually if you do that even more it'll start to smooth out and look less like a triangle and more like a typical bell shaped curve. Well as you know it's not enough just to know that you're creating this normal distribution by sampling from let's say a uniform distribution but it's nice to know you can do that. But you also have to know what the mean and the standard deviation is because what we really want is the standard normal distribution. So we at least have to be able to figure out what are the parameters of this normal distribution that we're counting on having sampled because we're summing the values generated from a uniform zero one distribution. Well as far as the mean goes, if for any uniform from A to B distribution, we know that the mean is one half A plus B. So for the specific case of the uniform zero one, the expected value or the mean is going to be one half zero plus one or one half point five. We know that the expected value of the sum of two random variables is the sum of the expected values. The expected value of x plus y is equal to the expected value of x plus the expected value of y. And by extension, if we have more than two random values from these random variables, then we can add those as well. So the method goes like this. Let's say we have 12. What would happen if we have 12? Well for random number from a uniform zero one distribution, the mean is point five and one half. So what happens if we have 12? Well, it's one half plus one half plus one half 12 times. So in other words, the mean will be six. If we sample repeatedly 12 times from a uniform zero one distribution, we'll have something approaching a normal distribution. We'll have an approximation of a normal distribution with a mean of six. Now what about the variance? Using the same tools that we did for the mean, the variance of a uniform zero one distribution reduces to one over 12. If you, when you sum variances or when you get, if you want the variance of a sum, you can also sum the variances just like we did with the mean. That's a nice feature. So for two, it would be two times the variance for 12. It would be 12 times the variance. That's one good reason for using 12. 12 times 112 is one. If the variance is one, the standard deviation is one, and we're halfway to a standard normal distribution. Finally, here's where we draw it all together. What's the actual process for using this sampling method to generate values from a normal distribution? Well, the first thing we do, we start with our typical uniform zero one generator. We get 12 of those, add them up, and then subtract the number six. Why? Because we need the mean to be zero. The variance is already zero. The standard deviation is already zero. So once we're finished with step two, we have a value for generated from the Z distribution, the standard normal with a mean of zero and a standard deviation of one. Step three is then simply using the transform algorithm to go from a Z to an X using whatever the parameters from the normal distribution that we're interested in that we started out with. And that's it. It's done. It's very, very simple. It's not great, but at least it's simple, easy to do, easy to understand. Okay, so what's so not great about it? Well, the fact that we're limiting ourselves to 12, summing 12 random numbers in order to get a value from a normal distribution, even if it's a decent approximation, we're limiting the values that it can take on. And sometimes we do want to sample from very low probability values in the extremes. And this won't be very amenable. This method won't work very well for that. And in addition, I mean really 12 random numbers for every normal random variate, we're going to run through our random number generators period very, very quickly. So then in addition, we need a pseudo random number generator that doesn't start cycling too quickly. What do we use all of this for in simulation? We use this in order to generate values that are used as input to the simulation. If we have customers coming into a bank, we need to generate into arrival times. If we have a machine shop, we need to generate the time until a machine goes out of service for whatever reason. So these are the inputs to the simulation. This all assumes that we know the distribution we want to sample from, and that's a whole other lecture, and that's what's coming up next.