 Last week we had a light introduction to random numbers that is some of the concepts we introduced so just recap on that. First random numbers are very important for many aspects of security. We want to generate keys. Secret keys, we'd like them to be random so they are very hard to guess instead of some structured key. We'll often use random numbers to do authentication and to exchange. The algorithms make use of random numbers again. They should be such that the attacker cannot guess it. And a way to make a number that someone cannot guess is choose a random number long enough and all the attacker can do is really do a brute force attempt. So if we have a large enough random number, it's hard to guess for the attacker. But the point is that generating random numbers is hard for computers. So we talk about true random numbers or something that generates true random numbers where we need some source which is not commonly available in our computers that can measure something from the environment like electrical disturbances or some noise or some characteristics from the environment. When those characteristics exhibit randomness, we use that information and then convert it into binary. So we need special hardware to do that. So since that's not so common for most computing devices, what we do is we use pseudo random number generators. That is, they generate what appear to be random numbers but they're not truly random. There is some pattern in those random numbers generated. But if we use good enough algorithms and we start with a good enough source, then we can use pseudo random number generators for security purposes. So we'll look at a couple of example PRNGs in some simple algorithms and then we'll mention a few others but not in much depth. The pseudo random number generators usually have an initialization value which we call the seed. So it's an input to the algorithm that starts things off. So the random seed is commonly used. You may have seen that if you are programming and you want to call the random function. In a number of implementations of that random function, if you call it, depending on the software, if you call it multiple times, it can produce the same sequence of numbers. If you want to produce a different sequence, you use a different seed, a different input. So we'll see an example of that shortly. Our pseudo random number generators, we think they generate a sequence of bits. So they keep generating bits as output and we'd like that sequence of bits to appear random. We had a very simple example of what sequence of bits appear random. We said we want to get an equal number of zeros and ones in that sequence and in any sub-sequences as well. So we had some examples that in the first sequence, we think it's not appearing random because there are many more ones than zeros. In the second one of the 12 digits of the 12 bits, we've got equal number of zeros and ones, but in sub-sequences we don't. If we look at half of them or a quarter of them, we don't have this property. In this third example, we have 12 bits, equal number of zeros and ones, but again, many people would not say that that is random. We think that if we knew, say, the first six bits here, we may be able to predict that the next two bits will be zero, one and so on. So it's not just about having equal number of zeros and ones. There are other properties we expect from our random sequence. Using such short examples does not capture all those properties that we desire. We need longer sequences, which we'll see in a moment. So we're not going to talk about pseudo-random functions, but a true random number generator takes a source of true randomness, something from the environment, measures that, usually some special hardware to measure that, and converts into binary, and the output is a sequence of bits. And we can use that for our applications, for whatever, security mechanisms. A pseudo-random number generator is just an algorithm that we can implement in software or hardware. That takes a seed as an input to get things started, and follows some algorithm, produces some bits as output, and reiterates, keeps going, taking another input and keeps going, with the idea of producing, again, a stream of bits, a sequence of bits which appear random. In practice, true random number generators, although they are good, they produce a true random bit stream. A, they need special hardware normally, or they need some special way to measure the true randomness, the source. And B, the number of bits that they produce over a period of time is usually much smaller than a pseudo-random number generator can produce. So when our application needs many random bits, maybe we need to generate a very long key, or we need to generate many long keys, a true random number generator may not be able to do that within a reasonable amount of time. So we'd lead to pseudo-random number generators. The problem with PRNGs is that the sequence is not truly random, but we need it good enough for our applications. There are different tests of a sequence and trying to determine, is this exhibiting the randomness, or the characteristics of a random sequence? So this just mentions some of those tests, but there are others. One test for uniformity, that is that we have an equal number of zeros and ones. Look at frequency of zeros and ones, runs, that is, sub-sequences. One test is, can you compress that sequence? If you have a sequence of bits and you try to compress it using zip, what do you think you'll get? A larger output or smaller output or the same length output? When you zip something, usually you'd like it smaller, correct? But if you zip a random sequence, you shouldn't, you should get the same size output because the way that zip or compression generally works is that it looks for patterns in the input and replaces those patterns with smaller sequences of bits. So compression usually shrinks things, but if you try to compress a random file, you'll find that it will not shrink things. So if that happens, then that indicates that the source exhibits randomness, just a practical test. And there are other tests which we will not talk about. Let's go straight to an example of a simple pseudo random number generator. There are many, we'll just deal with a simple example to get started. Linear congruential generator, LCG is the easier one to say. The algorithm is this equation here. We can think it generates a sequence of digits, well, not of digits, of numbers, of integers. The algorithm is we have some parameters, A, C and M. We'll see some examples of values. And we have the current value in the sequence. So our random number generators generate a sequence of numbers. And what we want is that sequence to appear random. So this simple number generator takes the current value, which is denoted as Xn, multiplies by some constant A, plus some other constant C and mod by M. And you get the next value in the sequence. And then you do that again and again. And the result will be some sequence of numbers. And we want that sequence to appear random. To illustrate that, we'll go through some examples with different values of A, C and M, just to see how this works. So this is what we'll call just LCG. That's the name of the algorithm. And it's, just repeat, the next value in the sequence, Xn plus 1, is obtained by multiplying the previous value, Xn by A, add some constant C and mod by M. That's the algorithm. Let's choose some values. Let's say A, so example 1, let's choose A to be just some simple values so we can calculate A to be 1, C is 1, M is 100. What we do is we take some initial value of X, what we'll call X0, and then apply the operation to produce the next value of X, X1, do it again for X2. And those values of X that we get out should be our sequence. And we don't like that sequence to be random. So let's try it with those values. We need an initial value of X. And we say the seed is the initial value of X. I'll denote it as X0. And let's say I choose a seed and my favorite, number 23. How do we choose a seed normally? What we should do is we should choose a random seed. How do we choose a random seed for input to our random number generator? Use a true random number generator. So what may be done in practice is to take, to generate a seed to be used as input to a pseudo random number generator. If we want that seed to be random, we use a true random number generator to generate that value, the seed. And then use the pseudo random number generator to generate more values. With the idea that the true random number generator can generate a few values easily, our pseudo random number generator can generate many values as long as we start with a good seed, a random seed. But I chose 23 just for this example. What's the next value of X? That is, what's X1? Well, we go the long way. Take Xn, so X1 will be Xn plus 1. We take A times Xn, the current value, 23 plus 1, C mod by 100. What do we get? 23 plus 1 is 24, mod 100 is 24. That's our first number in the sequence. What's X2? What's X2 in this case? 25, just the previous value, times 1 plus 1. What's X3? 25 times 1 plus 1, mod 100, 26, X4, 27. Do you see some pattern happening here? So we could denote X, our sequence in this case, the set of values, X0, we can write is 23, 24. And you'd keep going to where? Where will it stop? With mod 100, it will get to 99. The next value would be 0, 1, then get to where? It won't get to infinity. Can we see something? It will get back to 23 eventually. That is, 1, 2, 3 will get up to 23, and then we're just back to the start because if we take 23 as input again, we're going to get 24. So we say the sequence of unique values ends here. Actually, I should write 22, and then the sequence ends in that it just repeats from then on. It'll go 22 to 23, 24, 99, 0, 1, up to 23 again, and you think it just comes back to the start all the time. So this sequence, is it a good random sequence? Does it look random to you? No, this is an example of a bad output from a pseudo-random number generator. So I think it's easy to see in this case that if you see the first three values are 23, 24, 25, then maybe you can easily predict the next value will be 26. There's some dependence on the values, and it's easy to see that relationship. So this is an example of a pseudo-random number generator generating a bad output, not all appropriate. Another characteristic, so this is a bad one. How many numbers in this set here? 100. So we can say that this set has a period of 100. That is, there are 100 values, and then it just repeats. It comes back to 23 and so on. So that's a characteristic of our sequences that we generate. We can measure how many values do we get before we come back to the start. And always, when we get back to the start, it will just repeat again, because the algorithm takes exactly the same inputs. In this example, we have a seed of a period of 100, but the sequence is definitely not a good one. Let's try some different values of our constants and see a couple of other possible sequences. Example two, let's set A to 7, C to 0, M to 32. Try the first few values, what do you get? And I'll give you a different seed in this case, X0, set it to 1. Same algorithm, different constants in this algorithm, but let's start with a seed of 1. See what sequence you produce, look at the length of that sequence or the period, and see if you'd observe any pattern in the numbers in the sequence. Does it look random or not? What's X1? Again, we take the value of A times our current value of X, which is 1 plus C, in this case I've chosen C to be 0, so that will have no effect, mod 32, which is just 7. What's X2? 7 times 1 plus 0 is 7. Next value, I hear people getting the next value, so try X2 now, just to see how this algorithm and see some characteristics of this algorithm. X2, 17, just for people who don't see that. A times the previous value, 7 plus 0, mod 32. 49 mod 32. X3 will be 17 times 7, mod 32. Calculator is 23, I've done it before. X4, someone will help me with this one. Try, try with your calculator or your head. What's the remainder when we divide by 32? 7 times 23 is 161, 161 mod 32. 5 times 32 is 160. Remainter is 1. We actually get back to the start with X4. In fact, we'll denote X, the sequence, 1, 7, 17, 23. What's our period? 4, there are 4 unique values before we just repeat, we get back to 1. Then it'll be 1, 7, 17, 23, 1, 7, 17, 23. So we say the period of this sequence is 4. The numbers, do they appear random? Well, at least more so than our previous sequence. Maybe we can say that, 1, 7, 17, 23. We cannot see any simple pattern there. If I gave you the numbers, if you didn't know this, I gave you the numbers 1, 7, and 17. Would you be able to predict the next value was 23 if you didn't have this algorithm? Well, that would be harder to predict. So this one maybe appears more random, but one disadvantage is the period is much shorter. We'll try one more, and then we'll talk about the desired characteristics. And we'll not go through the details. I'll give the answer. If you try 5, keep things simple. C is 0, and it's M still 32. Same seed. If we start with a seed of 1, what's the next value? 5. What's the next value? 5 times 5, mod 32 would be 25. The next value? 25 times 5 mod 32. Basically, it's the previous value times 5, mod 32. Turns out to be 29. 29 times 5 mod 32, and I've done it before, 17, 21, 9, 13. And then the next number comes back to 1. What's our period? 8. There are 8 values in this sequence. What about the numbers? Does it appear more random than the first one? Definitely. And in fact, it's hard to see some pattern in this case. So here we're getting closer. We want an algorithm such that when it generates numbers, it generates a sequence. When we look at that sequence, that it exhibits randomness. And this one is getting closer to what we desire. We need to do some actual tests to see that. So one thing we could do is maybe convert each number to 5-bit values, and look at the number of zeros and ones, look at the runs, and so on, and do some tests to measure the randomness. But it's appearing more random than our other sequences. So in general, we want an algorithm that produces a sequence like this, which appears random. And one other characteristic we want is that it goes for a long time before it repeats. That is a long period. If it repeats very often, then of course, if we have to generate many, many numbers, we'll get a lot of structure in that output. But if the period is longer than if we say need to generate 10 numbers, with a period of four, we'd repeat two and a half times, at least two times. With a period of eight, we repeat just the first two numbers in that sequence. A longer period is better for our pseudo-random number generator. Yes, it could be. But I think it must be the first value. Yes, it could be. So generally, we want a long sequence that appears random. And these are just simple cases where we use small numbers. So back to our general algorithm, LCG, how do we get a long sequence? Look at our constants, a, c, and m. If we want a long sequence, what value should we set? And how should we set it? Mod. m should be larger. The larger the m, the larger the potential sequence. If m is 32, the maximum length is just 32, the maximum period. If we mod by 32, the answers can only be between 0 and 31. So the maximum possible period is m. Therefore, to have as maximum possible period as possible, make m as large as possible. So that's one design characteristic. In this algorithm, you make m as large as possible. How big can you make it on your computer? Well, it depends on how you represent your numbers. So say on a 32-bit system, you'd make it the largest number, 2 to the power of 32. Then you can potentially generate a sequence which contains all those numbers on your system. But there's some other characteristics. And people have done analysis of these algorithms, a, and the parameters a, c, and m. And they come up with recommended values that do generate long sequences that appear random. On the slide, it gives a few examples, or I think just one example. So a practical implementation of this would use m to be a large sequence and to be prime. m should be prime. Large, so it gives us a large potential period. And why should it be prime? If you mod by an even number, like 4, for example, then you always can get back to the start much quicker. We will see in the next topic on number theory some characteristics of prime numbers. And they become important not just here, but in other aspects of security. So a large prime number, for example, is 2 to the power of 31 minus 1. And c, if you set it to 0 to make your life easier, ignore c, then people have come up with some recommended values of a. One of them is 7 to the power of 5. How do they come up with it? I don't know, but they've done analysis and see that with this algorithm there are some good values. And if you use them, you'll get a long sequence which appears random. If you change the seed, you'll get a different sequence. Let's try that last one last example. Same parameters as example 3, but change the seed, the first value. Random number generator to choose the seed. Maybe an easy one so we can calculate, let's say 3. What's the next value? 3 times 5, 15, mod 32, 15. 15 times 5. 15 times 5. 75, 11 is it? 15 times 5 is 75, mod 32. 11 times 5, 55, mod 32, and keep going. And I don't know the values. The point is, and I'll have some period that I don't know, the point is here a different seed will give us a different sequence and still hopefully a sequence that appears random and has a long period. It may not produce the same period sequence. It doesn't have to. But we want a random number generator that when we change the seed, we'll get a different sequence with a long period. And the suggested values for LCG do that. So when you call the RAND function in your program, you write, you call RAND, and all that does is uses a pseudo random number generator, usually with a, depends on the software, some predefined seed, and it grabs the next value from the sequence. You call RAND again in your code, and it gets the next value, and then the next value, and so on. If you change the seed for that random number generator, and most software or most programming languages and applications will allow you to change the seed, then it will grab numbers from a different sequence. Therefore, it's important to, if you want to produce different random numbers to try different seeds. This is one of the easier pseudo random number generators to implement, but it's not so good compared to some others in generating random sequences. A, in terms of length, and B, it's possible or quite easy for the attacker to, given one number in the sequence, for example, if they knew the value 21, to work out the next value. So that should not be possible for the attacker. They shouldn't be able to be given one value in the sequence and be able to predict the next value. This predictability is not a characteristic of randomness, so it should be hard for them to do that. It turns out it's not so hard with LCG. If they know the parameters in one number, then they can easily find the other numbers. There are other pseudo random number generators. There's a couple of slides on the blum-blum-shub generator by three people, blum-blum-and-shub. It is better. We will not go through it. It uses, again, some mod arithmetic, starts with some prime numbers and uses some exponentials and mod, and it produces a sequence of bits on output. And there are many others. So LCG is just one simple example we could go through. There are many others. They need to produce good random numbers, long periods or large periods, and be simple to implement. The other way we can generate random numbers is using our block ciphers like DES and AES, because the idea of encryption is to produce a ciphertext which appears random. So we can use existing block ciphers, and, for example, in the different modes of operation, counter and OFB are common examples. We take some initial value, a key. We encrypt that initial value. We get the first set of bits as output. We increment that initial value. This is counter mode and keep encrypting, and we just get a sequence of bits as output. The encryption key and that initial value combine we call the seed. If we change the seed, we get a different sequence as output. So these are other ways to generate sequence numbers. Just use an existing cipher. What's the problem of doing this? Compare this, say, using DES. Remember simplified DES, the algorithm, and compare it to LCG. Which one's easier for you to understand? LCG is easier and easier to implement. So block ciphers are often slow when they do encryption. Therefore, if we want to use them for pseudo-random number generators, they may not perform as well as other dedicated algorithms. But they work well in terms of producing random sequences. They're just inconvenient or slow. There are, again, different ways to do that. You can use different modes of operation to do it. There are some special algorithms or special approaches. This is just one example. But, again, we will not go through these other algorithms. This one uses DES, encrypt, decrypt and encrypt, using different keys, using some date and time and some initial value. But we're not going to cover that. So there are special algorithms for generating pseudo-random numbers. Or we can just use existing ciphers to do so. Five minutes to go. Let's just finish this on stream ciphers. We just need to say what they are. Let's go through the RC4 example. And that will lead us to our next topic. Stream ciphers are easy now that we know about random number generators. Stream cipher, remember, we take our plaintext and encrypt either a bit or bytes at a time. And the general model of a stream cipher is that we take some key. So this is at the sender. We use that key to generate pseudo-random numbers. So this block here is some function or some algorithm that generates pseudo-random numbers like LCG, like one of the block ciphers or anything else that generates a random sequence. Then to encrypt, we exhore the random sequence or k bits from the random sequence with our plaintext. K bits of our plaintext. And we get ciphertext. To decrypt, we take the ciphertext. The receiver has the same key. They have the same random number generator. Therefore, with the same key and the same algorithm you'll get the same sequence out. If you start with the same seed, you always get the exact same sequence. So we take the k bits output and exhore with the ciphertext and this property of exhore. Ex or k and the plaintext gives us m. Ex or the ciphertext with the same k will return us the plaintext. So that's the general design of stream ciphers. They just use different algorithms to generate that sequence. The output coming here is called a key stream. Questions on stream ciphers for our next quiz. Keep up to date so we can have impromptu quizzes at any time. Any questions on stream ciphers? What's the one-time pad? Remember the one-time pad. Ex or's, the binary implementation of the one-time pad we said was the exclusive or the plaintext with the random sequence. We're almost doing the same here. Plaintext, exhore with a random sequence. The only difference between this and the one-time pad is that in the one-time pad we said that that sequence must be as long as the plaintext. Well, that can be possible if we have a short plaintext, but let's say we have a continuous stream of plaintext. Then at some point this sequence here is going to repeat. Why? Because our pseudo-random number generator has a period. It will eventually get back to the start value. So that's the difference between a one-time pad and a stream cipher is that in this case, lowercase k here will eventually come back to the first value that came out. And therefore we'll be exhoring plaintext with the same value as we did in the past. So we will get repetitions in the output ciphertext. But that's the idea. Exore a random set of bits, lowercase k in this diagram with our plaintext m. Assuming we can generate random numbers well and quickly this can be very fast. Again, exhore is very fast to implement. We usually do it a byte at a time. So k would be 8 bits, the lowercase k. So for stream ciphers the sequence, that key stream should have a large period so we don't repeat very often. The sequence should approximate a true random number generator so it should be a good pseudo random number generator. And of course the key that we use should not be guessable. It should not be subject to brute force attacks. Otherwise the attacker can do the decryption easily. Usually they are simpler to implement and faster than block ciphers. If you reuse keys with stream ciphers too often you'll get the same sequence output and that's bad. So you usually want to change keys quite often with stream ciphers. With block ciphers you can reuse a key much longer. RC4 is an example of a stream cipher. But there are many others. That's one of the common ones you may see. It was used in Wi-Fi, old Wi-Fi encryption and other systems. It's still around today but has some weaknesses. We will not go through RC4. There's a few slides on it. You may look in your own time if you want but we will not cover it in this course. Next lecture on Friday we'll look at a new topic about some basics of number theory. In preparation for public key cryptography everything up until now has been shared or secret key cryptography. The next form will be public key cryptography and to understand that we need some basic mathematics to do that. Next topic is number theory. We'll continue then.