 Welcome to a lecture on the random numbers we use in simulation models. The notion here is to put randomness into our simulations. Why do we want randomness in our simulations? Because there's complexity in the real world. Things aren't constants in the real world, so they shouldn't be constants in our simulation and the implementation of our simulation model. If we know the distribution that our input data follows, great, we can sample from this distribution. And in a later lecture, we'll be looking at how we know that the distribution is what it is or how we can figure it out. Right now, all we know is we need randomness and we want to figure out a way to get that randomness into our model. Whether you're sampling from a normal distribution, a uniform distribution, an exponential distribution, all of this sampling of random variants has to be based first on the generation of a random number, and we're going to see how that's done. In a previous lecture, the one on simulating by hand, we used devices that were truly random to put randomness into the simulation, tossing a die, putting chips labeled 1 to 10 in a hat and drawing randomly, that sort of thing, to create the artificial randomness and that mimicked the randomness in the real world. We could have used other ways of modeling randomness, true randomness, for instance, a roulette wheel. Just as a little sidebar, if you flip a coin and consider that random or if you toss a die or a pair of dice and consider that random, why might it not be truly random? Whenever you have human intervention, you're introducing the possibility of bias. We're not even talking about bias on purpose. Human arms get tired. A robot arm does not. Generally speaking, when you have a truly random number, there are some real physical process that generates the randomness, let's say radioactive decay, and basically we want a process that results in true randomness and that can be measured. This sounds like something we don't want to do on our own and indeed it is. It could get very messy. It could get expensive. We really don't want to do it from scratch. We want to use other people's truly random numbers. If you look at a book that you may have sitting around, you may have used in the past in probability or statistics, pretty much it's a sure thing that one of the tables at the back of the book will be a table of random numbers and you probably used it before in other classes. What are these random numbers? How did they get here? Where did they come from? The truly random numbers in this sort of table were compiled by the RAND Corporation in the 1940s, generated by an electronic simulation of a roulette wheel and that's where the randomness came from and indeed it was attached to a computer for the measurements and the values were generated, but they didn't stop there. They tested and filtered them before the final version that you see when you look at the table. RAND made these random numbers available to people on punched cards initially and then on magnetic tape. The table of random numbers at the back of your book comes from a 1955 book which was published by RAND with those random numbers. What you see there on the screen is a clip. It's a clip of a piece of the random number table and you can see that the random digits are grouped in units of five digits and it looks two-dimensional, but it's really not. It's one long stream of random digits. For instance, if you start at the top left, we've got seven, three, seven, three, five, four, five, nine, six, three, and so on across the row and then in row major order. These are just a bunch of digits that were tested and found to be random. We're generated by a random process. They're arranged in a two-dimensional format just to make it easy to publish in a book, but it's really just a stream of truly random digits. So we could put true randomness into a simulation model and indeed into other programs using these digits because they were generated by a truly random process. We don't even have to enter them from a keyboard. We could probably get them in a file. I'm sure they're available in digital form somewhere, but we still don't do that. We would still rather have a formula that computes something that's almost random rather than using these numbers that we know are random for sure. So that's something we may want to think about. We're going to talk more about it on the next slide as well, but for now, just think about how we access these random digits in a program. If you know anything about coding, whether it's simulation coding or any other program, if you have a file, you read from the file. So let's say we read a digit at a time from the file or we read, let's say, a set of 10 digits in sequence from the file. We use it, we discard it, we go and read again, use it again, discard it, and go and read again, and so on. I don't know if you know this, but reading, the reading instruction input from an external source in any program is one of the most expensive instructions. It takes the most time, it takes the most computer time, and therefore it costs the most, because we're paying for computer time in one way or another. So we don't want to use truly random numbers, even though we could and we have them available. If you're paying for computer time, it's a very inefficient use of your time and your money. Let's look at something that might be almost as good and sometimes even better than truly random numbers. This is what we actually use in simulation programs in Excel in just about any calculator or program that uses random numbers or random digits. They're not truly random, they're pseudo random numbers. Even if the function that generates them is a software you're using, even if it's called RAND, it's not really random. These are called pseudo random numbers. Anything that's generated by a formula can't possibly be random because you can predict what the value will be. If you look at the diagram, you can see that how a pseudo random number generator works. It outputs a random number and then it uses that random number as input into the same formula in order to generate the next number and the next number and the next. There's a stream of random numbers, each one generating the next one in line. How does it all start? It starts with an external random number. This external random number that starts the whole thing is called the SEED. It's used once as input to the random number generator, which basically serves to boot it up. In effect, we can say that this particular SEED created the entire stream of random numbers that comes after it. It's nice to know this because especially in simulation, we very often want to actually duplicate the random number stream. If we're testing two systems, for example, we're modeling customers in a bank. If we use the same random numbers, we basically have the same exact customers going through two different alternate universes and we can take a look at the outputs at the metrics and see what happens. The control that you have with pseudo random numbers or in a simulation model as opposed to in the real world is something definitely to be desired. Truly random numbers are inefficient and more expensive to use because of the time and cost, as mentioned earlier. Many simulation programs are fairly large and take a very long time to run. Some of them even have to be run on a supercomputer. Of course, it's still doable, but we can't always assume that we're going to have unlimited time and space for our programs. And by space, of course, I mean storage space, which also costs money. Pseudo random numbers are faster, which not only means that, let's say, I don't have to pay for that lengthy computer time, but also I don't have to wait around such a long time for my simulation run to complete and to produce results. In addition, as long as I've saved my seed, I can duplicate, not merely replicate, the pseudo random numbers that are generated, and I can repeat what I've done before in simulation that's a tremendous advantage. Think about it. If I have the same model and the same random numbers, I'll get exactly the same thing I got before. Why might I want to do that? Let's, here's an example. Suppose I'm planning to renovate my hospital emergency room. Wouldn't it be nice if I could take the customers that I had yesterday, do the renovation, and then run the same customers with the same illnesses through the emergency room tomorrow, after the renovation, and look at the differences. Okay, I certainly can't do that in the real world, and as we know, it's very inefficient to do that with truly random numbers, but I can do that easily with pseudo random numbers using the same seed for both systems so that we're looking at the same customers and the same illnesses. In fact, everything that's modeling complexity that's using randomness in the simulation model will be exactly the same. I can do that by this feature of reproducibility. As long as I have the same pseudo random number generator and the same seed. Of course, with truly random numbers, I can save them and use them again, but again, as we mentioned before, that's inefficient and it has its own issues. A disadvantage of pseudo random numbers is that every pseudo random number generator has what's called periodicity. There's a finite period during which every random number does generate another new random number, but after a while, it cycles back to the beginning and you have the very same stream of random numbers all over again from that point on. Certainly that's a problem, but it's especially a problem if you're not expecting it, if you don't know to look for it and to cut off your random number generator when it gets to that point. That could be a huge problem, of course. So you want to see what a pseudo random number generator looks like, huh? Here's one. It's simple to use, simple to understand. It's also pretty much an old one. It was used as early as 1946 by John von Neumann for cryptography. It's pretty old and primitive, presumably not what we're using in our calculators, in our programs, in Excel, because there are better, more efficient ones that are available and easy to use too. The nice thing about this, it's simple to do by hand and simple to understand, and it still represents the kind of thing that's done by just about every pseudo random number generator. This is called the middle square method. Let's see how it works. We start with, as always, a random number of a certain number of digits. In this case, what we're looking for is an even number of digits, call it N. You can even take the initial sequence of random digits from the random numbers table, if you like. Take this random number, do some arithmetic. This is what all the pseudo random number generators do. They take a random number as input, apply some arithmetic operations to it, output a new random number in the sequence, and then go start all over again. In this case, what we do is squaring it and then adding zeros to the left to bring it up to two times N number of digits. Let's take a look at the example. Suppose we use, as our random number seed, the wonderful one that many people use on their combination locks, 1111. It's maybe just a bit better than 0000. But suppose we use 1111. What do you do with using this pseudo random number generator, this method? You square it. When you square it, you get, let's see, one, two, three, four, three, two, one. That's seven digits, and we need two times N digits. N was four. We need an even number of digits. We want eight. So you add a zero on the left. It's not going to change anything. If this is used to represent a value, it's not going to change it. But at any rate, it doesn't even matter because what we do is we extract the middle four digits, two, three, four, three, and we end up with the same N we had before and we could start the process all over again. The more digits you use, obviously, the better off you'll be, actually, if this is used for cryptography. And once again, this is not here to display for you so that you can actually program it yourself, although it might be a nice exercise. It's really just to show you an example of how pseudo random number generators work. It's one that's simple, easy to do by hand, easy to understand and easy to explain to other people. This is the one we're using as an example. We're not going into any others, but you can certainly find them online. There are a lot of good sources for pseudo random number generator programs. What should a good pseudo random number generator have? This is pretty much common sense. I think if you look at this list, you may have been able to figure all of these out on your own. But let's take a look. Naturally, the pseudo random numbers should be random, or at least they should pass the test of randomness. They're not going to be truly random, but we want them to act as if they are random. How do we know they're not truly random? Well, we can generate them. If you can predict what the output will be from a pseudo random number generator, then clearly the output is not random. It's predictable. There is some kind of relatedness. However, as long as the pseudo random number streams pass the tests of randomness, of not being related and not being biased, we're going to be able to use them and we'll be happy with them. We'll talk about testing in a bit. That's why they're called pseudo random, because they're not truly random, but we want them to behave like random numbers. Clearly, we want it to be reproducible, especially in simulation, because one of the wonderful things about using random numbers in simulation is that we can do the same thing over and over again by using the same random number stream, starting with the same seeds. It's definitely critical for simulation. We want the algorithm to be fast. We want it to work quickly. We don't want to have to wait to get the random numbers. Even a few seconds of a delay gets pretty annoying. We want to make efficient use of computer resources. It's not only time, but it's also space. How much storage space is used by the computer algorithm? Whenever we say efficient in relation to computer resources, we're talking about issues of time and space. Even though we know there will always be cycling, there will be periodicity. We want to make sure that the algorithm we use has a long period, or at least as long as we can get, so that it doesn't cycle too soon, because once it does, it's pretty much useless. We want to make sure that not only do we have numbers that are random, or appear to be random, but we don't want to get the same number over and over again. That's when that happens. We say that the algorithm degenerated. Clearly not useful. Pretty much useless. We know that pseudo-random numbers are not random numbers. We're hoping that they behave like random numbers, and indeed we do testing to make sure of this. Random numbers generated by pseudo-random number generators can be tested for randomness, for lack of a pattern, and the lack of bias. Several statistical tests are used for this, including the goodness of fit test with the uniform distribution applied to the random digit stream output from the pseudo-random number generator. There have been some studies that show that when we're looking at an additional layer of complexity, a multi-dimensional hyperplane graphical approach to testing random digits, which I'm not doing, and you're not going to do in this course, that's for sure. But there have been people studying this. What they say is that at the simple level of looking at just a stream of pseudo-random number digits output from an algorithm, they're fine. They pass all the tests with flying colors, but once you look at it at a different level, on a different plane, patterns, multivariate patterns start to emerge. And all that says is what we knew already, that these numbers are not truly random. We use them. They've been validated in many, many ways, and they're fine for lots and lots of applications, including simulation. But we always have to remember that this isn't true randomness. It's just like simulation is not a true system. The pseudo-random number generators create random numbers that are not random. It's all make-believe. They're simulated random numbers. Thank you for attending this lecture on randomness in simulation. We looked at truly random numbers, pseudo-random numbers, and how these numbers are generated and tested for use in simulation. We use them to model real-world complexity. So this is a very, very critical topic that's part of the material on simulation model implementation. Specifically, modeling random phenomena in our simulation models. Again, thank you for joining me in this lecture.