 What does it mean to be random? And how do you do it in a computer? Randomness is a weird kind of thing because it's more defined by what it isn't than what it is. If you see something happening and it's got some kind of pattern to it, then you say, well, that's not random. If you could predict what's happening next, if you could even just predict better than chance, what's going to happen next, and you'd say, well, that's not random. So something that's random is supposed to be not to have a pattern, not be predictable. Well, that's a real problem for a computer because the entire way that we have designed architect and built computers is so that they will be deterministic, serial determinism, a typical computer the way it's designed. It does one thing at a time, step by step by step by step. Each step is guaranteed as good as the quality of the physical computer we can make to be completely predictable, to be completely determined, the inputs determine the outputs. So in a computer program, if you could get all of the same inputs, you're going to get the same output. And it might not seem that way these days when, you know, you go to your social network and you click on everything's constantly changing, you hit reload, it changes. But that's because there's all kinds of inputs going into that program that's producing that output that we're not necessarily aware of. If we could actually control all of the inputs to the program, it would have to produce the same output. And if it didn't, the computer is broke. OK, so if that's what computing is like, step by step, same input, same output, then how could we ever get a computer to be random? How could we ever get it so that it would be unpredictable? If we know the program, if we know the inputs, we can totally predict. We can run another computer with the same program, the same inputs. We have to get the same output. So in computers we end up using what's called pseudo-randomness. It's not really random in the sense that if you know all of the inputs, you can predict the output. So the unpredictability part is unclear. But the idea that there's no pattern is what pseudo-randomness tries to get at. Let's look at a demo. All right, here's a machine I submit to you. It's an amazing magical random number generator machine. We've got a button down here that we can, all right. So we feed in the spark of execution and the machine cranks and there it produced a number, 90. This is a random number? Well, it says it's a random number. Let's make another one. Actually, let's just lock the switch down so we'll make a bunch of them. Now, the first thing is, did you expect, all right, 53. Did you expect 90 to be a random number as opposed to heads or six or 52 car or something like that? You can't even talk about randomness unless you have some universe of possible events, some set of things that might happen. You can't even ask, give me a uniformly selected random number between zero and infinity. It doesn't make sense. So you already have to have some set of events in mind. So this looks like, you know, two-digit decimal numbers, perhaps. Now, it might not be what you need, right? If you really just wanted a coin flip, heads or tails, what are you going to do here? Well, I mean, you could say, you know, I'll call even numbers, heads and odd numbers, tails like that. It's not up to me to use the whole number that the thing is giving me. If it's really random, if it's really unpredictable, if it's really got no pattern. All right. Well, so 90, that's even. That would be heads. 53, that's odd. That would be tails. So heads, tails, 74 is even. 81 is odd. 50 is even. Wait a minute. 73 odd, 14 even. These are just alternating. Even odd, even odd, even odd. So how can that be random? It's a pattern. Well, this isn't random. Big surprise. It's running in a computer. Let's pop the hood. What we actually have under here is not a random number generator. In a real randomness sense, whatever that means. What we have is a pseudo-random number generator. P-R-N-G, pseudo-random number generator. And this particular one is called a linear congruential generator. And this specific, specific one is actually a particularly bad linear congruential generator. But really, the LCGs are important, at least historically, because they've been used for decades. They were the basis of much of what has been done for computer simulations of all sorts of things that could have led to policy decisions, led to people thinking things work certain ways under the assumption that these were decent random numbers. Well, wait a minute. 90, 53, 74. Look at this. The first number was 90. Here's another 90. Second number, 53, 74. 74, 50. Where's the 57? The next one's going to be 14. There it is. The next one's going to be 61. This has not only got a pattern to it, it's predictable. I know the sequence that's coming here, and that's why this is such a terrible random number generator. But still, let's pop the hood again and actually see what's going on inside here. All right. Single-stepping it now. So this is confusing-looking, but it's really very simple. We've got this red star says what's going to happen next. The first thing is we're going to do a multiplication. What do we multiply? This number, 4, 5, 6, 7, times this number, 93. And the result is going to be, I don't know what it's going to be. The result is 4, 124,731. Okay? Whatever. And then we move on to the next step. This is serial determinism. Step by step by step. The next step is to add this number, 23, to that number. And that's going to give me 424, 754. Okay? Right. There it is. Execution moves on. Next step, do clock arithmetic with 100 o'clock here. Modulus operation with this number is going to end up taking the last two digits. So it's going to give us a 54. And that 54 gets written back to this state box. And that's key. So what we have here is a cycle. The state gets read out, multiplied, added, modded, and written back. Okay? And then as the final step, that number gets output as our random number. Okay? And that's it. Let's go ahead again. So that state variable did not get reset between each cycle of producing a random number. And that's how it works. Okay? A computer is deterministic. Same inputs, same outputs. We thought this program didn't have any inputs. We thought we just sort of made it go, and then it produced an endless stream of numbers. But really, the way we need to think about it is that that state variable is an additional input that it writes and then reads back in. Okay? And as a result, we get a sequence of numbers that changes over time. Although in this case, it's not a very random sequence of numbers. Number one, it alternates between even and odd. If we had different multipliers and increments, we could get that to be different, like we could get them to all even, as if that's better. And in fact, it loops. It has a period that no matter where it starts, we can put it back at the beginning. That's what this reset button does. Because, which shows, by the way, that the only thing that this thing has got going for it is that state. Once we reset it, it's called reseeding, putting a seed, because the beginning number that starts this whole process is called the seed. Setting the seed back to a known quantity is called reseeding. Now we're getting the same thing. 90, 53, and so on. Now, we could make this a little bit better if we increased this limit. Something like that. Now, because the state was limited to only two digits, 0 to 99 before, now it can go up to at least 1,000. Normally, when we make a linear congruential generator, we let the modulus be something like in the billions. So we get these big things out that look, ooh, must be really random, but then they have these same flaws, the last bit of it, even odd, even odd, and so on. So these are not great. Okay, let's turn this off. And these random numbers, these are terrible, so let's get rid of them. Now, the state of the art in random pseudo-random number generation has moved on quite a bit beyond the linear congruential generator, and today you really do not want to be using one. What do we use numbers for? Well, there's at least two big whole categories of uses of random numbers in computers, and they're quite different, and we need to not confuse them. One group is for models and simulations. Computer games, shuffling a deck, solitaire, making a simulation of traffic flow, where when you come to a light, is the light going to be green or red? Well, it depends on, you know, if you can throw a random number, it's red 60% of the time, green 40% of the time, whatever it is. So in that case, we're using random numbers to express essentially our ignorance, that if we had made a more detailed model of that traffic simulation, well, whether the light was red or green is not actually random. It depends on its circuitry and the timing and who pressed the buttons and so on. We could build a more complicated model that would not have used randomness there. That's where our brains work. When we have stuff that's sort of extremely large, we tend to assume it's constant. When we have stuff that's incredibly tiny and rapidly changing relative to what we care about, we tend to think it's random. And then this stuff in the middle about the size and the time scales that we're interested in, we tend to think there's causes for it. One thing causes the next. So the biggest slow stuff, constant, and then causality in the middle. So randomness is deeply tied to how we look at the world. And when we build models in computers using pseudo-random numbers, we're doing essentially that same process. We're deciding to say, okay, well, this is causal here. I walk down to the corner because I'm walking to work. The light, whether the light is walk or weight is random because I'm not really modeling that in detail and so on. The second category of uses of randomness is for secrecy, for encryption. And in that case, we have a very different set of needs. When we're dealing with a deck of cards, the stakes are low. We're playing solitaire on the screen. On the other hand, when we're playing solitaire in a video poker game in a casino, it's very different. If we could predict what was going to happen, we might be able to make money. A casino wouldn't like that. Similarly, if we're making keys for our encryption to send messages to secure our documents, if anybody can predict what those keys are going to be, they can break our encryption. So in essence, there's two views of randomness. One view says a random sequence must be completely unpredictable. And if it can be predicted, it can be broken. So the casino wants unpredictable, encryption wants unpredictable. But then the bigger use, the more common use, all we really care about is that whatever determines the sequence of numbers, it be uncorrelated with the purpose that we're putting the numbers to. So even though there were patterns in that LCG that we looked at, it wasn't a good one. If we use those numbers to cause the simulated traffic lights to turn red and green according to some odds, it's not clear that that would actually cause our simulation to be inaccurate in the long run on average. But still, we would like, and this is the whole crazy business of pseudo-randomness. It's not my words of von Neumann said that essentially if you ever try to produce random numbers in software, you're in a state of sin. And what did he mean? Well, he meant because the whole point of software is to be deterministic, is to be not random. So pseudo-randomness is all about that hidden state. And so one of the reasons why this random number generator is so bad, this pseudo-random number generator is because it has a very little bit of state. Once the state has repeated itself, the whole sequence has to be repeating because that's the only way, the extra input it's got. Here's a picture of a more modern pseudo-random number generator called the Mersenne Twister. It's been around for 20 years or something now, and it's pretty good. I'm not going to show the whole step-by-step of the algorithm because it's a little involved, but let's just look at how much state it's got. So we'll pop the hood here. This is the state of the Mersenne Twister, represented with a white square for a zero bit and a black square for a one bit. And here we're producing these numbers. You might see these colors moving through here. Those are saying stuff about how the algorithm is working internally. But the weird part is we're getting all these numbers, 2 billion minus 100 billion, whatever it is, not 100 billion, that's too big. But it doesn't look like the state is actually changing anywhere, except if you look, let me speed this up a little bit. If you look up at the very beginning here, you can see some stuff changing. Because what really goes on is that's a counter up there, and all of this state is 624, 625 numbers, random numbers that the Mersenne Twister has pre-computed. So when you ask for the next number, it doesn't actually change any of the state. But we're about to get to the end when the green bar gets to the end there. Boom, then the whole state changes. So it's all being batched up. And what that allows us to do, it makes it a little weird, because when you call for the random numbers, you get 624 of them very rapidly and then all of a sudden, there's this big weight while it goes through and mixes all the bits. And the reason it keeps so much state when it's mixing the bits, we want to be able to mix them from far apart. Not just use the previous value of the state, but I'm going to mix the current number with the number 397 calls to the random number generator ago or in the future in the batch. Now, one of the things we can do here is we can kind of cheat. Again, this is a pseudo-random number generator. If we set the seed to a known quantity and get an output 1.7 billion minus 12 million, if we set the seed back to the beginning 1.7 billion minus 12 million and so on. So, reseed the array behavior is completely predictable. Now, we can cheat. The Mercent Twister is designed so that you can't really set the seed to anything that isn't like this, sort of a hash of black and white. But I went in and I cheated. Here's where I initialized the entire state array to be all zeroes except for 1.1 right in the middle here. Now, again, regular Mercent Twister will never do this, but we abused it. Now we're getting this output, which is 0.000 which is obviously not very random at all. But the reason I do this is let's speed it up. When it rebuilds itself, it gradually starts to smear out the bits. And it takes quite a while, but let's speed it up still more. Okay. Now, if you look at this you might be able to see it. There's this pattern where it looks like it's kind of moving up and to the right. Up and to the right. Can you see it? That's because what the Mercent Twister is essentially doing, it's using all of those nearly 20,000 bits as like a giant shift register where the bits move out of this guy up into the next one up to the top of the last column down to the first and so on like that. So it's like a giant multiplication where it then takes a bunch of stuff and adds it in. So it really is in some loose sense quite similar to the multiply in the ad of the simple linear congruential generator. It just has much, much more state and more carefully thought out state as well. Alright, so that's it. Randomness is this weird thing because it's about what it isn't. When you're using a random number generator, you've got to avoid the bug of seeding the number generator over and over again. The whole point of seeding the generator is to get the initial value of the state and then you ask for numbers and let the state inside the random number generator keep track. There's also random number generators I haven't talked about that are in computers that are claimed to be real random number generators. And the way they work is by using timings of mouse clicks and keyboard types and packets arriving and stuff coming from the disk and they mix it all together under the assumption that there's random numbers coming together. And it's probably true, it may be true but who knows? That's just saying there's state in the outside world there's state in the user's head about the timing of when they do things that we can exploit to make numbers that are unpredictable. Finally the bottom line if you're in a position where you're building a model of some sort and you're using random numbers make sure you've got the Mercent Twister or at least something reasonably modern there are still bad random number generators that don't use. Okay, that's it.