 It's recapping on what we know about the modes of operation because last week we finished with some examples where we encrypted using I think ECB, CBC and a quick example on counter mode. The idea for the modes of operation is that our block ciphers operate on a say a 64 bit or 128 bit block but we normally have more than 128 bits of data to encrypt so instead of simply applying the block cipher multiple times and taking the ciphertext and concatenating together we know that that's what ECB does and that's insecure in that if we have repetitions in our plaintext then we'll get repetitions in our ciphertext. So what ECB does, electronic code book is what we expect as default is simply encrypt each plaintext block and we get output ciphertext blocks. The resulting ciphertext just combine these n blocks that we generate. The problem is if P1 is the same as P2 then C1 will be the same as C2. So there's repetition in the ciphertext, that's ECB. So the other modes try to overcome that problem and they have different advantages or disadvantages. So we went through an example I think we saw CBC, cipherblock chaining. So now we can take the output of one block, so the ciphertext C1 and feed it into the input of the next chain or next stage. So in this case we take some initialization vector IV, some number that we choose known by both sender and receiver, XOR with the plaintext and then encrypt. And then in the next stage take the ciphertext instead of using IV use the ciphertext from the previous stage, XOR that with the next plaintext block, encrypt using our cipher and get C2. So the idea compared to ECB, if P1 is the same as P2, C1 will not be the same as C2 because we have, here's an initialization vector and in this stage we use something that is different or at least most likely it's different than the initialization vector. We use some ciphertext which should be some random value. So we'll get two, since we have two different inputs between stage one and stage two we'll get two different outputs. If we have the same input we'll always get the same output which we don't want to happen. So each ciphertext feeds into the next stage. So we have this chaining of the blocks together. We had an example of that, decrypt and we made the note that a good thing about XOR is that the inverse operation of exclusive OR is also an exclusive OR. So we can use that in the decrypt where we take the ciphertext, apply the decrypt operation of our cipher. So if our cipher for encrypt is des, then we apply des here in the encrypt rectangle and we apply the decrypt operation of des here and we get some value and then XOR with the initialization vector and we'll get the original plaintext back. You can check that. You can try and decrypt different values and make sure you get the plaintext back. And there are other modes of operation. We skip through this one, we'll come back to it, just recap on what other ones we looked at. Output feedback mode, similar we have some connection between each stage except here we take the output of the encrypt operation and feed it into the next stage and note that the plaintext is simply XORed with this output of the encrypt. So even though the second block depends upon the first block, we can get some additional advantage compared to the one we skipped and cipherblock chaining in that if there are errors in the ciphertext that are received, then that error will not propagate onto subsequent blocks. And that's what you'll see in your assignment or your homework, your mini assignment. You have a homework task and I'll show you the software that you'll use, I will show you some demonstration today but what you will do is you'll encrypt and decrypt some plaintext, encrypt some plaintext, get some ciphertext and you'll try it with different modes of operation. And you'll see how those modes of operation impact A on the ciphertext and then let's say I send my ciphertext across a network. So I send C1, C2, CN across the network and the receiver needs to decrypt. So the receiver does the decryption phase. What the receiver does is takes the ciphertext blocks and XOR with the encrypted value here and gets the plaintext. A problem that can happen in some cases if we send the ciphertext across some network or link, there may be bit errors. As a characteristic of our network, I send 64 bits. Maybe one of those bits at the receiver is wrong. A bit one was to send at the transmitter. The receiver receives a bit zero because of a link error or a network error. In that case, when we use this chaining or connection of stages together in some modes of operation, if we have an error in say ciphertext block C1, we don't want it. We know that the plaintext block will be wrong because if there's an error in here, we will not get the correct plaintext out. But we'd like it not to impact the next stage as well. And in output feedback mode, it doesn't. That is, if there's an error in C1, we will get the wrong plaintext here, but it will not affect the next stage. And we should get the right plaintext in the next stage. If we look at, if we go back now to cipherblock chaining, if there's an error in C1, of course P1 will be wrong because when we decrypt and the bits are wrong in the ciphertext, we will get the unexpected plaintext. We cannot avoid that. But also, if there's an error in C1, it will be because C1 is taken as an input into this next stage, we'll also get an error in P2. Sometimes we'd like to avoid that problem. And that's what output feedback mode does. It contains the error in that one block. It doesn't feed into the next stage, whereas CBC and we'll see the other one CFB, if there's an error in one ciphertext block, there'll be error in at least two plaintext blocks. That's a small feature of output feedback mode. I do not have an example. I don't have any examples that we're going to go through for that because that's what your homework will show you. You'll see that example. It will be quite clear if you can do the homework correctly. Let's go to the one we skipped over, cipherfeedback mode. Actually, no, we had one more counter mode. Then we'll come back to cipherfeedback mode, sorry for skipping through. Counter mode. Quite simple. Take some counter value, let's say we initialize it to zero, so 64 bits, 64 zeros. Encrypt that value using our cipher, say des. We get some ciphertext as an output. If we encrypt some value, we should get some random looking value as an output. Ex or with a plaintext and we get our ciphertext block. Now, increment the counter from zero up to one in decimal. Encrypt the next value and we get some random looking output. Ex or with a plaintext too, we get ciphertext too. Then increment the counter to three and do the same for the next block and so on. Quite simple and can be implemented to perform much faster than the others because we can implement this in parallel. Because the second stage does not depend upon the output of the first phase. So let's say you have a computer with four CPUs or four cores, a quad core computer. Then what you could do is that you have your plaintext that's known in advance and you can calculate what the counter values will be because you know in advance the first value will be zero, then one, then two, and three. And then on one CPU or one core of your computer, you can encrypt the first block and at the same time you can be encrypting the second block on the other core of your computer. So you can do it in parallel because this stage does not depend upon the output of this stage. So that's one advantage of the counter-motive operation. It's quite simple and can be implemented quickly or implemented in parallel. And it's considered just as secure as the others. Whereas we see the others, the partial output or the output of the encrypt operation of the first phase is needed as the input of the second phase. So we cannot run the second phase until this encrypt operation is finished. So implementing in parallel is not so beneficial in this case because we have to do this encrypt and then we can do the next phase. And once we've done this, then we can move on to the third phase and we'll see that's the same with the other modes of operation, except ECB. Note that in the implementation, what's the slowest part in all of these modes of operation? What's the slowest part in this diagram? Well, what are the operations you need to look at in all of these? This is just one example of the encrypt. The slowest parts are these grade boxes, the encrypt or decrypt operations. That is applying DES, triple DES or AES or whichever cipher we choose. Compared to performing an XOR or reading the input or reading the plain text, this operation is by far the slowest. So we'd like to implement the encrypt operation in parallel, say on different CPUs, on different cores of a CPU. It's not possible in this case because this one to run needs the output of the previous one. So just identifying some of the advantages and disadvantages of the different modes of operation. Let's return to one more. Those ones we've seen examples from before, the one we skipped over, cipher feedback mode. Looks complex, but in fact it's not too hard, it's similar to the others. It's a way to turn our block cipher into a stream cipher. So it's commonly used for encrypting a small set of bits at a time. Let's say our block has 64 bits and we're using DES as the encrypt operation, a 64-bit block cipher. We can apply cipher feedback mode to encrypt a stream of bits. What's the difference between stream and block cipher? Well, in practice a stream cipher, we need to encrypt faster. For example, in real-time applications, as the data is generated, as the plain text is generated, we need to encrypt it and send it across a network, like encrypting your voice on the phone. But a block cipher, we encrypt a block at a time. And A, the cipher itself, the encrypt block, may take time. And B, we must wait for 64 bits before we can encrypt. Whereas with a stream cipher, if we just encrypt 8 bits at a time, we encrypt those 8 bits, send them across the network, then encrypt the next 8 bits and send them. With a block cipher, we'd have to encrypt all 64 bits and then send. It adds an additional delay in sending the data across the network. Even the difference between 8 bits and 64 bits can be significant for some real-time applications. So a stream cipher, we normally encrypt, say, one or up to usually 8 bits at a time. So let's say we have a 64-bit block cipher, DES, for example. With cipher feedback mode, we take some initialization vector, some initial value we choose. We encrypt using DES and some P. We get 64 bits as an output. Then we select the leftmost bits that we want. So if we want just the first 8 bits, because we want to encrypt 8 bits of plain text at a time, we've got 8 bits of plain text, then select the first 8 bits of the output and simply XOR. And we know an XOR can be implemented quickly, so when 8 bits are generated, we take 8 from here and XOR and we've got the cipher text. So the idea is to not use all bits of the encrypt operation just to use a selection, generally S bits. So select the leftmost S bits, discard the remaining bits, where S is the size of the plain text that you want to encrypt every stage. And then for the next phase, we started with some initialization vector, let's say all zeros, as an example. Instead of keeping that same value, we shift it along and at the rightmost end put the S bits of cipher text that we just created. So it's a shift register. We have our B bits. We discard the leftmost S bits and attach these S bits to the end. And then encrypt again, select the leftmost S bits, XOR and keep going. And what we're doing is we're encrypting S bits of plain text at a time in each stage, not the full 64 or full B bits. Any questions on how to do cipher feedback mode? Of course, S and B are variables here. The shift register box is the order of N, KIP. I will not go through a full example, but a quick one with, let's say the output. The initialization vector was, for simplicity, four zeros. We've got a four-bit block cipher. So let's say B equals four and S equals two. Then the initialization vector, let's say I choose it to be four zeros. I encrypt, so I apply my cipher. I'll get some output. Let's say the output I just randomly choose at this stage is zero one zero one. It depends upon the cipher and the key used. Then I take the first two bits and I XOR with the plain text. Let's say the first two bits of plain text was one one. One one XOR zero one, we get one zero. So that's C one. And then I feed that value. So now to treat this as a register or a space in memory and we shift it along so that these two bits will come to the end. The first two bits disappear or are discarded and these two bits are moved to the left. And then we apply the same step. We encrypt this and we'll get some output, whatever it is. I just made that value up. S bits, two in this example. So it's no, it's in general any value less than B. So B in my example is four, a four-bit block cipher. We want to encrypt a stream of two bits at a time. Another example, maybe we're using a 64-bit block cipher, B equals 64. And we want to encrypt eight bits at a time. So S would be eight in a more realistic scenario. So we selected S bits or two bits and then add them to the end here and that shifts these along. Encrypt, we get some value. Take the first two bits, XOR with P2 and whatever you get, whatever the value P2 is, we'll get some output. Let's say the output was one, one. Then what's the next value here? So first two bits are discarded. These two shift along so it becomes one, zero and these two bits are added to the end, one, one. And we just keep going like that. That's the idea of that shift register. So we've gone through some different modes of operation. There are others. These are not the only ones. These are CBC, we'll see it's quite popular in practice. There are others available. Some are used for special purposes or special applications. But we'll see, for example, CBC is used in file encryption, hard disk encryption, and then the specialized ones built for individual ciphers. Counter mode is used in some network applications and there's some summary of each of those given there. I think if you, again, you don't need to remember how they work. If you looked at past exams, sometimes there's a question, here's, actually maybe you do have to remember. I don't think so. For example, here's a picture of CFB decrypt this ciphertext given this key. Similar to the examples we went through last week in the lecture. And maybe more importantly, to be able to compare some of the advantages and disadvantages of them. And your homework will point out two or three of those trade-offs. And that's all I want to say about those modes of operation. Counter mode we covered. All about equivalent level of security, except ECB. ECB is not good in that when you have a large plain text, you may get repetitions in the ciphertext. And that's bad. The rest we're going to skip over. It's some almost repetition of what we've already talked about, I think. Just some pictures that do not help much. And the last one, again, we will not cover. We'll see that AES is now a very common cipher. So we spoke about des. But we said there was a flaw with des. The key length is not good. And then they implemented triple des. So we've got a longer key, which is good. But it turns out triple des is not so good in terms of performance. So we mentioned that the standards organization in the US developed the advanced encryption standard, AES. Long key length, 128 bits and up. And it's generally considered secure. And it's used in a number of different applications. There are some special modes of operation that work or are designed to work just with AES. And one of them is called XTS. We're not going to explain or cover how that works. You could look in your own time if you want to. What we will do is have some quick examples of the ciphers. Any questions as I prepare for the next topic? Let's find an example. You don't have this, but just some notes on some examples of ciphers in use. So for example, disk encryption. Some most operating systems today allow you to encrypt your entire disk. So from the user's perspective, there's nothing different. You do not see anything, except if someone steals your laptop, for example, then they need your password to access what's on the disk, because everything's encrypted on the disk. That's an example of full disk encryption. Without the password, even if they take out the hard disk, they cannot read the contents because it's encrypted. So some examples of disk encryption on different operating systems and what ciphers they use, they all use AES by default. So these are some of the default configurations in Ubuntu Linux. For example, there's a software called dmcrypt. That by default uses AES, a 256-bit key. AES, in fact, can be used for different length keys. And there's a mode of operation by default uses cipherblock chaining, CBC. MacOS, I understand, has something called FileVault. And that uses AES, 128-bit key, and it uses this specialized XTS, which is built just for AES. Windows has BitLocker, also using AES, 128-bit key, CBC. So you see, just in this quick example, AES is the default block cipher used in current applications. In most of them, you may be able to change the cipher. But if you do not make any changes, this is what will be used. What about another example? What about wireless LANs? We all use wireless LANs. There's three. In wireless LANs, you can either have your data encrypted or not. If it's not encrypted, anyone can intercept and see what you send. So it's very easy to see what I'm sending when I'm accessing the wireless LAN in SIT because there's no encryption. When you choose encryption, there are, in fact, different protocols available. The original encryption protocol for wireless LANs was called Wired Equivalent Privacy, WEP. That was the original one. It turned out that it was not so good in secure, in that if you used it, there were some known weaknesses, and it takes a few minutes to find your key. It uses RC4, and we're going to see an example of RC4 and go through some details in the next topic on stream ciphers. RC4 is a stream cipher. And there are two variations. There was a 64-bit RC4, which used a 40-bit key that the user chose, and a 24-bit known initialization vector, and then the 128-bit version, which uses a 104-bit key. We'll go through RC4 and May if we have time, return to the problem with WEP. It's related to RC4 and how it's used. So since WEP wasn't considered secure for wireless LANs, people with their organizations who developed the standards, developed improvements, and the common one used today or recommended is Wi-Fi protected access, WPA2 version 2. It uses AES, counter mode, and AES is a 128-bit block cipher, and it uses a 128-bit key. So just an example again. The advanced encryption standard is common in practice. Any more do I have? What about web browsing? Another example, you access a secure website. You log into Facebook, and normally, or if it's configured, it will be HTTPS colon slash slash. So that S is indicating you're using a protocol called SSL, Secure Socket Slayer, or now known as Transport Layer Security, TLS. We're going to cover those protocols and secure web browsing after the midterm towards the end of the course. What ciphers do they use? In fact, it depends upon the client and the server in this case. For example, for web browsing, it's up to your web browser and the web server to choose a cipher. So what normally happens is that the web browser contacts the web server, and in contacting the web server, they send some hello message. And in this hello message, the browser indicates what ciphers it supports, so a list of ciphers. And then the server, which supports some ciphers, chooses the one that, of course, the client supports and that the server supports, and in some response message, tells the client which cipher to use. So that's done. It's auto-negotiated when they connect. So on different operating systems, different browsers and servers may support different sets of ciphers. This morning, I captured from my Firefox browser to Google, and I looked at the packets, this request, this hello message from client to server, and the response back. Hard to see. Let's see if we can. This is the packet. It's taken from Wireshark in some text format. It's one packet. It's using secure sockets layer at TCP. This is the list. You don't have to understand at all. This is the list of ciphers that my browser supports. In fact, there are different parts of the security setup. There's both encryption and things to ensure the integrity of data and to authenticate the users. And different protocols are used for data confidentiality, encryption, data integrity, and authentication. So we get the first one's an empty case. There's no cipher. The second one, AES 256-bit key, CBC. So that's identifying the cipher used for encryption in that case. AES 256-bit key with cipher block chaining. The other part of that string indicate the ciphers used for authentication and for data integrity, the algorithms for data integrity. There's a hash algorithm, SHA, and there's an authentication performed using DSA. We're going to cover hash algorithms and the authentication techniques again later after the midterm. But my browser supports different block ciphers, Camellia, AES, AES, well, not many other ones, RC4. It supports RC4 for encryption and AES with 128-bit key. So in fact, this is given in order of preference of my browser. I would prefer to use AES 256-bit key with CBC. But if the server doesn't support it, I can support any of the others. This is sent to the server. The server sends back a response and selects one of them. So it selected RC4 128-bit key in this case. So it didn't select AES. It depends upon the server. So we see AES, RC4, and some other ciphers are commonly used in the everyday things that we do, which involves encryption. The server chooses it based upon the preferences of the client and based upon what the server supports. So the server has some software that implements a selection of ciphers. It depends upon the software that the server, as to what ciphers the server implements. Does it implement AES? Maybe, maybe not. If not, then, of course, they cannot choose AES. In this case, it implements RC4. And the server's preference in this case is to use RC4, 128-bit key. Maybe it may be because of performance. RC4 as a stream cipher may be faster than AES in this case. So it depends upon the software that's running as a web server. Assuming the server's set up the same, yes. When the client contacts that server again, it will use the same cipher. The server administrator may be able to change the preferences. If they change the preferences, then maybe it would swap to something else. But it should mix the same scenario, the same cipher. So just some quick examples of ciphers used in practice, things that you may use on a regular basis. I'll put those notes on the website. No need to copy them down. What's next? Stream ciphers. Stream ciphers, and before we get to them, random numbers or pseudo random numbers. Why do we want to talk about random numbers? Note that with our ciphers, when we encrypt some plain text which has structure, the goal is to get some output cipher text which has no structure, which is random. So if we can have a cipher that produces a random sequence of bits, a random string, then we can use that to encrypt information. And we see that it's in fact similar to what's used in the modes of operation. What we did is we took, in some cases, plain text, encrypt. Then we xor, sorry, we take some initialization vector, encrypt. The output of that encrypt should be some random sequence, some random sequence of bits. Then we xor with the plain text. So simply by taking plain text, xor with some random sequence, the cipher text should have random characteristics. It should have no structure. So random numbers are important in cryptography. In our ciphers, we'd like to produce random strings or random numbers as output. Let's discuss a little bit about how do we produce random numbers. And then see how they're used in stream cycles, finishing on RC4. Choose a random number between 0 and 100. And tell me, 8. How did you choose that? Now, implement a piece of software that does that. Get your computer to choose a random number between 0 and 100 without using any libraries. What do you do? When I say without using any libraries, you can't call the RAND function. Implement that RAND function. How do you do that? Relate it to the time. The time's not random. The time is sequential. What could you do with the time? How do you get your computer to choose a random number between 0 and 100? It's very hard. Without some physical source of randomness, how does it do it? A computer is programmed. A computer does what we tell it to do. So we must write some code, implement some random number generator. And it must be some algorithm that will follow some deterministic steps. We know what steps it will take in advance. So when we write a random number generator in software, it is not truly random, because we know what steps it will take to produce the output number. It's what we call pseudo-random. It is not truly random. It is producing an output which looks random, close to being random. So producing random numbers for computers by computers is a difficult task. But it's important for cryptography, because random numbers are used in many different places in security. Where do we use random numbers? When we want to distribute keys, although we haven't spoken about the details, often when we want to distribute keys, A, we may need to generate keys, choose a key to encrypt using desks. Well, you shouldn't choose the key 0, 0, 0, 0, 0, 0. It's better to choose a random key. Now, get your computer to generate a random number. So we need some algorithm to create random numbers. So random numbers are used in the generation of keys for different ciphers, RSAs 1. We'll see later in distributing keys, authenticating users, random numbers are important, and in stream ciphers. A stream cipher is essentially take a random sequence, X or with a plain text. For that to work, we need a random sequence. So we need a random number generator. So random numbers are important in security. What do we mean by random? Well, there are different ways to measure whether a number or maybe easier a sequence of bits is random. Consider a long sequence of bits, say 64 bits. Then you need to question, is that sequence random or not? Well, some characteristics you can look at is the distribution of the zeros and ones. If I write down this sequence of 10 bits, does it look random? And let's say I have an algorithm that generates this sequence of 10 bits, 10 ones. And I apply the algorithm again, and I get another sequence. And I keep applying my random number generator, and I get sequences, and I can keep going. So if we take the output of some random number generator, the output, we'd like to have a uniform distribution of zeros and ones if we're dealing with binary. In this case, we would say the output is not random. Because what about the cases where we have zeros at the start? In this case, we have more ones than zeros in all of them. Here we have 10 ones and zero zeros. Here we have 7 and 3, 8 and 2, 9 and 1, 9 and 1. I would say that those sequences are not random, or at least not 10-bit random numbers. Because if something's random, in binary, we'd like to have the equal number of zeros and ones. So the distribution of the characters, if we deal with binary zeros and ones, should be approximately equal. In some cases, they will not be equal, but if we take many different values from the same random number generator and take the average, then we should get 50% ones and 50% zeros. Here, that's not true. We get many more ones than zeros if we took the average of these cases. Another measure of randomness, if we have a sequence of zeros and ones, is the independence of the bits in that sequence. We'd like the second bit to be independent of the first bit. Another way to draw that, or give an example, let's say I choose, I have a random number generator that generates numbers between zero and nine, generates decimal numbers between zero and nine. If that random number generator generates this sequence, do you think it's random? Well, no, there's some structure here, zero through to nine. We'd say that, well, there's some dependence between this number and the previous number. We can see some pattern there. This is one more than the previous number. A random number sequence should be... The characters should be independent of other characters. So if this was shuffled in some way, I'm missing some number, one, two, six. What's the dependence between the second number and the first number? Well, it's not so easy to see a pattern in this case. We may consider this sequence random in terms of independence of characters from other characters. So we shouldn't be able to work out what, for example, the next value will be from the previous values. It should be independent of the previous values. So that's two ways where randomness can be measured in a sequence of bits or a sequence of numbers or values. And that's summarized here. A random number generator should produce a sequence of numbers which is unpredictable. It's hard to predict the next value in the sequence. In the first example, I would say it's predictable in this case. If you see this sequence, you guess if the value is four, if the current value is four, then the next value is most likely going to be five. Whereas in this case, it's hard to predict what the next value would be given one of the previous ones. So how do we get a computer to generate random sequences? We can use what's called a true random number generator. A true random number generator has some non-deterministic source, some source of information that the values are not predictable. Usually, almost always, from some physical environment. Some examples, detecting radiation events in the atmosphere or off some device. In electronics, detecting the energy coming from circuits or across capacitors. Noise coming from electronics. The noise generated from... So we pass electricity through the different devices inside your computer. There's some noise generated. If we can measure that noise, it's known that the noise exhibits random characteristics. We cannot predict what the future value of that noise will be. Same with radiation. If we're detecting radiation in some environment, it's known to be random or as random as we can measure. To do this, we need some physical device to measure the environment. Most computers do not have devices to measure the radiation surrounding us. They do not have devices to measure the noise coming from the circuits inside the computer. Mobile phones don't have devices to measure things about the physical environment. So to generate true random numbers, we need some input from the physical environment around us. Some input that exhibits true randomness. A very close approximation to randomness is to take a combination of some of the inputs from the user and from the different devices on a computer. So if you think about my laptop, what's it doing? Well, the hard drive is performing operations all the time. It's performing read and write operations. When the user is using it, the mouse is moving. The keyboard buttons are being pressed. There are different input and output devices, so the USB and so on. So there is a number of operations. If we can measure all those operations, then it's possible in some cases to, if you combine all of them, then you get random behavior. Not just looking at one of these, not just looking at the behavior of the mouse, but combining all of them from your computer's perspective, you can see that that exhibits very close to randomness in terms of what's happening inside your computer. If we can use that information, then we could use it to generate a random number. If we have something that exhibits random behavior, if we measure it, then we can get random outputs from it. The problem with these devices is that they do not exhibit either we need special devices to measure what's happening, the physical environment, or if it's our computer, then even though they generate random numbers, or that we can use them to generate random numbers, they usually cannot generate a large set of random numbers. So there are algorithms to measure all the keyboard presses, all the mouse clicks and the movement of the mouse, all the hard disk operations and other input-out operations, combine all that information, and from that, if you look at that information, it looks random, and for from that, you get a random number as output. And the problem with that is that the number of values that it produces is usually quite small. Most operating systems will have a feature to get a random number from these events that your computer is doing. We may see an example later. But often with security, what we want to do is have a very easy way to generate random numbers and many random numbers, usually large random numbers. Small maybe, let's say, tens of bytes per second. That's small. Often we want to generate thousands, hundreds of thousands of bytes per second. And we need to do it quickly, and any computer should be able to do it without any external hardware. Therefore, we have pseudo-random number generators. We implement software, or hardware, that apply algorithms to calculate random numbers. They are not true random numbers because they are deterministic. Because we use algorithms, we can determine in advance what the value will be. But we see that in most cases, if we do it well, we can get close to randomness and enough randomness for most applications. So we're going to focus on some of those algorithms, pseudo-random number generator algorithms, so that algorithms to calculate numbers which are relatively random. And a sequence, for example, an algorithm that calculates this sequence would be a pseudo-random number generator. We will see that the algorithm will have an input. That input is called a seed, the seed of the algorithm or the seed of the number generator. And the algorithms produce a continuous stream of random bits. So our sequence, we normally deal with bits. We can look at it from the perspective of decimal numbers or letters, but normally it produces a stream of bits, zeros and ones. What's missing down the bottom there? There's also a pseudo-random function. Some fixed, a pseudo-random function which is the same as a random number generator but produces not a continuous stream but a fixed-length sequence of bits. Let's look at the random number generators. This is a comparison between those three. A true random number generator takes some physical source of true randomness, something from the environment, from nature, and measures that and converts to binary. And the output binary is a random number, a random bit stream. Whereas the other two, we use some algorithm. So some algorithm, so we can calculate, we take as an input some seed, some initial value, apply an algorithm as an output, get some stream of bits, and we usually take that output and also use it as an input and keep doing that in some iterative manner. And we get a continuous stream of bits. The difference with a pseudo-random function is that we usually get a fixed-length value as the output as opposed to a continuous stream of bits in this case. We're gonna focus on the middle one. So now what we need is an algorithm that we can implement and that will produce a sequence of bits or a sequence of numbers that appears random. So how can we test that it's a good algorithm? Well, we test the output and see if the output appears random. So what we require, a good random number generator, is that if you don't know the seed, you know the algorithm. If you don't know the seed, it should be hard to work out what the output stream is. Some things that we can measure to determine if that's the case, we can measure randomness. For example, we can count the sequence of zeros and ones. If my random number generator produces one million bits as output, I would expect half of them to be ones and half to be zeros. That's one measure of the randomness. And it's not just of those one million zeros and ones. If I break, so I've got one million bits, let's say I take the first 100 bits. In those first 100 bits, I would expect approximately half of them, 50 to be zeros and 50 to be ones. So in sub-sequences of the total sequence, we'd expect the same characteristics. So there are ways to test those measures of randomness. The frequencies of zeros and ones, the runs, the runs means that the continuous sequences of one bit. So here we have a run of seven ones. If in a long sequence of bits, the number of runs of seven ones should be the same as the number of runs of seven zeros in a long sequence of bits. And the number of runs of three ones should be the same as the number of runs of three zeros. Another way to measure randomness is whether you can compress that sequence. Compression relies upon structure in the input. If there's no structure in the input, it should not be able to be compressed. So a random input, if you apply some compression algorithm, you will not reduce the size of the output. And you could try that. Generate a large random file and then try and compress it with zip. See how large the output is. It should be the same as the input. So compression algorithms, so that's one way to test the randomness. Try to compress it. If it compresses to be very small, then it's not very random. Because compression takes advantage of the fact that if you have some structure, let's say every instance of the word hello, if we can find a pen, if we have text, then let's say we could have a rule that if the inputs is all letters, then every time we find two Ls together, replace them with some special character. So it becomes H E percent O, where we know the percent means two Ls. We've just reduced the size from five characters down to four characters. That's the idea in compression. Find structure in the input and replace it with smaller values that represent that structure. So if we try to compress a random input, there should be no structure if it's random and therefore we should not be able to reduce the size. It should be, our random sequence should be unpredictable. Given one value, it should be hard to predict the next value in the sequence and given the current value, it should be hard to predict the previous value. That's the forward and backward unpredictability. That is, here's my sequence. What we'd like is if you know the current value is five, it should be hard to predict what the next value in the sequence will be. I've shown you the next value here, so you can see. But in a sequence, let's say I have numbers between zero and 100 and I have in my head a way to generate them. What's the next value? Well, it should be hard to predict if it's a true random number generator. The next value here, given the previous values, should be hard to know what the next value is. What's the next value? That's a nine. Again, 77, it's just plus 19. One plus 19, I hope so, plus 19, plus 19. So in this case, I had 120, 39, 58. You can see some pattern there and you could predict the next value, so not random. But in this case, predict the next value, well, it's hard to predict. We'd consider that's better in terms of randomness. In a number of applications, we'd also like the seed value, the input to be random. So we may use a true random number generator to generate the seed value. But going back, the seed is an input to our pseudo random number generator. But in some security applications to be even better, to get a better random sequence where it's hard to predict what the initial value was, the seed should be random itself. Or how to get a random seed? Use a true random number generator. So combine these two. The problems with true random number generators, either you need some physical device to measure the randomness or they produce just small values on a, say per second. But if we only need to produce a seed, which may be say a 64 bit value, then we can use a true random number generator and then use that seed to produce a long sequence of random bits, shown here. A true random number generator generates a seed and then the seed is used in a pseudo random number generator. That's needed in some cases. Let's look at, before the break, one random number generator. The linear congruential generator. Very simple one. Just to demonstrate, here we have an algorithm. Here it is. An equation that we can use to generate pseudo random sequences. Here we're generating a sequence of decimal values. What we do is we take, so we have some parameters. We have some modulus m, some multiplier a, an increment c, and some initial value, the seed x0. So we choose values of those four parameters and then we calculate the next value in the sequence. So we know x0, a, c and m, we calculate x1. And then to get the next value, x2, we take x1, multiply by a plus c mod m. And we just keep applying this algorithm until we no longer need random numbers. Let's give some examples. Do I have any? Let's do it on the board. So the input are the values of those parameters. So let's say, for example, we have a very simple example. The seed is 23, some random value. Generate the first three values. Find x1, x2, x3. Anyone have an answer? So use our pseudo random number generator, this equation, with those parameter values and generate the sequence of the, in this case just the next three values of x. Ignore here for now, we'll discuss that later. Where we use, that's our equation. Is it random? Well it's x1. So we take the previous value, 23 plus one. And then, because it's mod a is one, we multiply 23 by one and then mod 100. What do we get? 24, easy. x2, 25 plus one, mod 100, 25, x3. So what's x? x is, in our case, 23, that is the set. 24, 25, then where's it going to get to? 99, I think if you keep going, then it'll come back to zero, one, and so on. Not a good random sequence, okay? This is a bad case. We're using this random number generator, but with these parameters, it doesn't work very well. It produces a sequence, but here we just increment by one. So if we use these parameters, if you knew the current value was 25, then you would predict the next value is 26, easy. Let's try a different set. Try some different values. Let's give it to you. Let's try, let's set c to zero to make life easier. a to be 7 and m32. And let's set our c to 1. x0, what's x1? So a equal to 7 is our multiplier. So we get 7 times by x0, 7, mod by 32. So the answer will be 7 in this case. What's x2? So we take x1, 7, multiplied by our multiplier, 7. We get 49. We're not adding anything now. 49 mod 32, 49 mod 32, 17, x3. And I'll write down our sequence. 49, sorry, what do we have? 17 multiplied by 7. 17 multiplied by 7 mod 32. I can't remember, you sound correct? 23. Someone can check. And then 7 times 23 is 161 mod 32. And 7 times 23 is 161. 23 times 7, 161. Mod 32 is 1 because 5 times 32 is 161. And what's, so the next number is 1 and the next one. And we repeat 1, 7 and so on. 17 because we get back to 1, which is back to here. So it'll be 1 times 7 mod 32, which will be 7. 7 times 7 mod 32, 17, 23. So effectively our sequence has just four numbers. And then it repeats. So let's note that our sequence really ends here. Let's not worry about them. So with this four different, so our input values are C is 0, A is 7, M is 32 and a seed of 1. Our random sequence is 1, 7, 17, 23. So here we get four numbers. What's the maximum possible numbers we could get with this if we had a different seed in theory? With M equal to 32, the maximum possible values we can have is 32. In theory we could have between 0, 1, 2 up to 31 because we mod by 32. In this case we get just four values. And that's another measure of the randomness or the security of the sequence. We'd like to get as many values as possible. Here we get 4 out of 32. We'd like to get more. Ideally we'd like to use all 32 possible numbers. We'll have one more example. So let's try one more and then we'll have a break. So let's change the values have some impact. We have A equal to 5. Still keep it simple, C equal to 0 and M still 32 and same seed, X0 equals 1. X1, X1, 5, easy. And keep going. What's next? So it's again now, here our multiplier A is 5. So it's 5 by 5, 25, mod 32, we get 25. And I won't require you to do them all. I've got the answers. It's 25, then 25 times 5, mod 32 is 29. Then it's, I'll give the answers 17, 21, 9, 13 and then it goes back to 1. So I've calculated them before. So this sequence we use 1, 2, 3, 4, 5, 6, 7, 8 values. So we'd say that's better than the previous case because it uses more of the possible values. We've got 32 possible values. Here we get a sequence of length of 8 numbers. So a random number, a pseudo random number generator generates a sequence of values and they always repeat. That is we get up to some value and then it will come back and repeat. So that sequence has some length. We'd like that length to be as long as possible before it repeats. In this case, in these very simple cases, we've got 8. How can we make, what parameter can we change to make the length possibly longer? Our parameters are A, C and M. What can we change to make our length potentially longer? M, M especially. Because mod 32, in theory we can only, we can have a length of 32 numbers. Here we only got 8 but in theory we could have a maximum length of 32 and then we'd repeat. Make M large and then you've got much more chance of having a larger sequence of random values or pseudo random values. So this algorithm, very simple but choosing the right parameters is important and one way is to make sure M is large. So to finish, so the choice of A, C and M as we saw through those three examples, the choice of those values is important. We saw one which produces 23, 24, 25, so on. We saw one that produces just 4 values and then the third one produces 8 values. And they looked reasonably random. Can you see a pattern here? If you just look at the sequence of numbers, does it look random? Well, you know that comes from some equation but if you look at just that sequence it looks random or it's hard to see any pattern in there. So that's good. Of course we would like it longer. So to get as long as possible, make M as large as possible. And what's the largest number? Well, it's limited by your computer normally. So in a 32 bit computer for example, make M a large prime number. One example is 2 to the power of 31 minus 1. Making a prime number is good when you mod by that prime number. It should give you different answers, outputs at the end. If you make it an even number then modding by the even number is not going to be as good as the prime in this case. And for the values of C and A people have done some studies to find out what's good values. If for example C is 0 it makes life easy because you don't add anything. Then a good value of A turns out 7 to the power of 5 turns out to be quite good. 16,807. So 16,807 times Xn mod 2 to the power of 31 minus 1 try that and you'll get a long sequence of random looking numbers. So an example of a pseudo random number generator. Let's show another example and a different algorithm after the break. Let's stop now and continue a bit later.