 So, in the previous lecture, we finished on some example of a very, very simple encryption, the Caesar cipher. We'll return to that and define it. And this topic is classical encryption techniques. So we want to spend some time explaining what is encryption? What are the different ways that we can encrypt data? Because encryption is a key part of many security mechanisms. Many of the encryption algorithms in use today are quite complex. So it's hard to go through them and explain them. But the concepts that they use are similar to some of the very simple encryption algorithms and very simple and even very old encryption algorithms. And this topic, classical encryption techniques or classical ciphers, really means the old ones, which are no longer used today, but the concepts are still used. By old, maybe thousands of years old, all right, maybe not that old, but up until probably the mid-50s, mid-1950s, some of these ciphers were used. So we'll go through them with the intention of demonstrating some different concepts of encryption. And then in the next topic, we'll look at an example of a real cipher and we'll see how those concepts are applied. The classical ones are nice because we can do some examples by hand. With real ciphers, it's very hard to do an example by hand. You need computer software to do it. Before we go through some example ciphers, we'll just explain what do we mean by encryption and especially encryption for confidentiality. Our first topic we mentioned, there are six main services that we want to provide. We want to remember them. So maybe that's confusing. There were six attacks that we went through and then we also listed six services. So those services, data confidentiality, keep your data secret or confidential, authentication, make sure that the person you're communicating is who they say they are. Access control is to control who can access your network, like a firewall. We'll see an example of access control. Data integrity, the data that is sent makes sure it's not modified. So from what you receive, it must be identical to what was sent. If it's modified along the way, that's a problem. So we need data integrity. That's for non-repudiation. Make sure people cannot deny that they communicated. I cannot deny that I received a message. You cannot deny that you sent me a message. That's non-repudiation. And what's the last one? Availability, make sure that our network or computer system is available for the normal uses. So those are six main services that we want to provide, especially in network security and generally in computer security. We don't always want to provide all of them, but usually a selection of them. We're going to look at confidentiality. That is, how do we keep things secret? And we said that one way, or the way really, is to encrypt the data first. So the aim is to, if we have confidential information, we want to make sure that people cannot access that confidential information if they're not authorized to do so. That's our aim. Only the people who are authorized can access that confidential information. And how do we do that? Well, we take the information or the original data. We encrypt that such that the output of the encryption, the encrypted data, anyone may be able to find and see the encrypted data, but only authorized people should be able to decrypt and get the original data. That's what we want to do. Encrypt the data. Assume that anyone can see the encrypted form. You, me, the attacker can see the encrypted form, but the encryptions should be such that only authorized people can decrypt. The attacker cannot. And we'll see that that relies on using some form of key, such that the authorized people are those that have the key. Someone with the key can decrypt. If you don't have the key, you cannot decrypt. So we'll introduce the concept of a key. Encryption for confidentiality is used for sending data across a network. I want to send data from one computer to another, such that people who have the ability to intercept across the network, even if they do intercept, they will not be able to see my original data. And it's also used for file storage. I want to save data on my computer. So I encrypt a file such that if someone gets access to my computer, even though they have access to my files, they cannot decrypt it and get the original data. So it's used for networking and data storage. We commonly use examples about sending data to someone, but the same concepts apply for data storage. So our model for encryption is we have two normal users, A and B, Alice and Bob, for example. A wants to send confidential data to B. So they're going to send it across a communications link or a network. And we have another user, user C, who's the attacker in this system. So this is our model where A is going to send confidential information to B. We assume C is some malicious user, some attacker. Their aim is to find that original information. And we'll assume that the attacker has the ability to intercept and see any messages sent from A to B and vice versa in any direction. So assume the attacker can see the thing sent from A to B. In practice, that may mean that depends upon the network that's in use. So if A is my computer, B is a server in the US, then the attacker needs some ability to, somewhere between my computer and the server in the US, along the path that the data flows, needs some ability to intercept the message. In practice, it's not too hard for the attacker to do that. So we assume that they can intercept. So instead of sending the data as is to B, if A sends the confidential information to B, it's easy for the attacker to intercept and see that confidential information. That doesn't work. So we introduced encryption. Before A sends the data, they encrypt it. And the data or say a message that we want to send to B, we call that the plain text. Plain text in that it's not encrypted. It doesn't have to be text. It can be a picture, a video, but the name when we talk about encryption is plain text. So, user A has plain text that he wants to send to B. It doesn't want anyone else to know what that plain text is. Before they send, they encrypt the plain text. The encryption is some function that takes two inputs. The plain text is one input. A key is the second input, key one in this example, or in this case. So it's a function or an algorithm that takes the plain text, it takes a key, and it produces an output, which we'll call the cipher text. So the encrypted plain text is called the cipher text. And the cipher text is what is sent to B. So sent across the network is the cipher text, not the plain text. B receives the cipher text and then applies some algorithm called the decryption algorithm. It decrypts that cipher text. And to decrypt, we of course need the cipher text as input and another key. So this function takes two inputs. And if the algorithms are designed correctly and if the keys are used correctly, say if we do everything correct, then the decryption will produce the same plain text as what A started with. So that's how B gets the plain text. A has the plain text to send to B. They don't actually send the plain text. They send a cipher text version of that plain text. B gets the plain text by decrypting the cipher text. And the plain text that B gets after decryption must be identical to the plain text that A encrypted. Otherwise it's not achieving a goal of getting the plain text from A to B. So that depends upon really two things that we get the plain text. It depends on how the encryption and decryption algorithms work. We need algorithms such that that is true. And also depends upon what these keys are and how they are used. With respect to the keys, there are two different approaches really. One approach, A and B, the keys that they use are identical. So when we say key one and key two, in fact, the key that A uses is the same value as the key that B uses. The keys are the same, or we can say that the keys are symmetric across the two users. And the name of that form of encryption is called symmetric key encryption. A and B use the same key to encrypt and decrypt. And we'll focus on that first because that's the one that's been used for the most time. It's the original form of encryption used and was the only form up until maybe the 1950s or 1960s. But then there was a second approach invented where we can achieve the same thing, but A and B, they use different keys. So when we say key one and key two, they're actually different values, but they're related in some way. So there's usually some mathematical relationship between those keys such that A encrypts with one key, gets the ciphertext, B decrypts that ciphertext with a different key, and gets the original plaintext back. So there are algorithms for encryption and decryption and for creating the keys such that that will work. That's called asymmetric key encryption. The keys are different on both sides. There's no symmetry between the keys. It's asymmetric. Or maybe better known as public key encryption. We're gonna focus mainly on symmetric key encryption and then later, maybe towards the midterm, we'll look at public key encryption. What else can we say? So what about the attacker? A now sends ciphertext to B. The attacker can intercept and observe that ciphertext. So we assume the attacker knows the ciphertext. Their goal is to find the plaintext and or to find the key. So if the attacker wants to find the plaintext, what they have is the ciphertext only. They need to work backwards somehow and take that ciphertext and get the plaintext. If you look at it from the perspective of the encryption function, the ciphertext is the output, the inputs are the plaintext and the key. If the attacker knows, so one way is the attacker to work backwards to try and work out, well, given this output, what was the original input? Easier from the decryption side. Again, the attacker knows the ciphertext. The attacker knows that user B uses some key to decrypt and gets the plaintext. So if the attacker wants to know the plaintext as well, they need to know the decryption function and they need to know the key used here. If they do, it's easy for the attacker. So there's some knowledge that the attacker needs to be able to decrypt successfully. In practice, we normally assume the attacker knows the functions. It knows what decryption operation was used here. It's very hard to keep them secret from people because the algorithms need to be implemented in software and hardware. It's hard to keep the algorithm secret, how they're implemented, and it's also hard for the users to keep it secret as to what algorithms they're using. There are many algorithms to choose from, but normally we'll assume that the attacker knows the algorithms used by A and B. In that case, the attacker knows the ciphertext. They know the decryption algorithm. They want to find the plaintext, therefore to find the plaintext one way is to find the key. If they can find the key, then it's easy for the attacker. So we must make these algorithms such that the attacker cannot find the key and cannot find the plaintext. If you don't know the decryption algorithm but you know the key, what can the attacker do? You cannot find out what the decryption algorithm is from the key. The keys are usually random, for example, but in practice, there are not so many algorithms to choose from, okay? So it's likely that user B and A used one of the popular algorithms. Maybe there are 10 or 100 to choose from. So it wouldn't take long for the attacker to try them all and eventually get the right one, okay? So in practice, there are not many algorithms to choose from so trying to hide it has no benefit because the attacker could just try them all. Or the attacker could maybe find other ways to find out well, what software did B use to decrypt or what software did A use to encrypt and the software implements the algorithms. So we'll make some assumptions. And important, the attacker can find the ciphertext. That's a given for the attacker. The attacker knows the algorithms used. They know the encryption algorithm. They don't know the decryption algorithm. When I say the encryption and decryption algorithm, usually they are related or the same, okay? That is the same name, but they're just slightly different implementations. So we usually just talk about the encryption cipher. The attacker doesn't know the keys or at least in symmetric key encryption, it doesn't know the key used by A and B. And it doesn't know the plain text, he wants to find them. So that's the challenge for the attacker. So we need to spend some time looking at what these algorithms are. How do we encrypt? So we usually use this model and we'll see it come up through the course. That just lists some of those terms there. We've said plain text is the original message, ciphertext is the encrypted or coded message. Encryption is the process of converting plain text to ciphertext, sometimes called encyphering. Decryption is the reverse, restoring the plain text from the ciphertext. The key is some information that we use to encrypt and decrypt. And we'll talk about in different cases that it's only known by the sender and or receiver. The attacker cannot know the key, but we'll come back to the names of the key shortly. Cipher refers to a particular algorithm. So we say the algorithm is a cipher. Cryptography is the study of those ciphers, the study of the algorithms used for encryption. So the people who design new algorithms are really working in the field of cryptography. Crypt analysis is the study of the techniques for decrypting from the attacker's perspective. That is, if you don't know the plain text and you don't know the key, maybe the key, but you know the algorithm, how do you find the plain text? That's crypt analysis, so that the attacker performs crypt analysis in some cases. And cryptology is the fields of both cryptography, creating the algorithms and crypt analysis, breaking the algorithms. I think this states what we've probably said already, the parts of it, requirements for secure use of symmetric encryption. And symmetric encryption means both sides use the same key. We'll define that more in the next few slides. We need a strong encryption algorithm. A strong encryption algorithm is one where if the attacker knows that algorithm and they know the ciphertext, they cannot find the key or plain text. If that's true, then we say it's a strong algorithm. If it's easy for the attacker to find the plain text or key, then it's not a strong algorithm. It's weak and we shouldn't use it. We require that the sender and receiver know some key. So in our picture, for this to work, A must have a key and B must have a key and in symmetric key encryption, they're in fact the same value. So somehow they must know that value. So we assume that they do. And of course, if the attacker knows the key, they can decrypt. So therefore the key must be kept secret. If A and B have the key, but then they tell everyone else what the key is, it's no longer a secret key and the system fails. So we assume that when we say it's a secret key, you can't come along and say, oh, what if the attacker knows the key? Well, that's outside our assumptions. Our assumption is the attacker doesn't know the secret key. If the attacker knows the secret key, then it's not a secret key. It's just a key, okay? So we'll assume the attacker knows the cipher. And, well, A and B need to know the key. A uses a key to encrypt, B uses it to decrypt. Let's say A chooses the key. A starts this communications, they have a plain text, they choose a key, a random key. They encrypt using that random key. They send the ciphertext to B. How does B know the key? How can B learn the key? If B needs to know the key to decrypt, how do they learn it? They could send it to B, correct? No, why not? Because C can intercept. If we send the key to B across the network, again, this attacker can intercept that key and it learns the key. And if the attacker learns the key, it's no longer secret. So we can't send the key across the network because, or across the same network that we're sending our data because the attacker would learn the key. What if we encrypt the key? Can we do that? Encrypt the key, which key do we use? Okay, we have a problem. We take the key, we encrypt it with another key and send the encrypt the key and but now we need to get that other key across the network. So we have a problem there. All right, we assume that one way is that there's some other channel that the A and B can communicate. Let's say that they are two friends, they want to communicate securely across a network but yesterday they got together and they decided on a key, wrote it down on a piece of paper and exchanged it with each other. So we could assume that there's some other manual form for key exchange. That's possible in some cases. So two people agree upon a key and use that. They don't need to send it across the network so that's very hard for someone to intercept that key. It's not very convenient. You want to communicate with someone in another country. You can't just fly there, visit them, write down the key, give them, come back and then send them an email. So we in fact need some automatic way to exchange keys across the network. And when we look at public key cryptography, we'll see that we can do that. We can encrypt the key and send it but we'll return to that. In fact, it will probably be after the midterm. How to distribute keys is an entire topic. For now, let's assume when we talk about symmetric key encryption, both sides have the key already. Somehow they got the key. So we say that there's some secure communications channel to distribute keys. So how do we do encryption? What are those algorithms or what are the characteristics of these systems? So encryption, that function, what does it do? We'll see through the next set of examples of classical ciphers that there are two main operations that we do. We take the plain text. The plain text is made up of a set of elements. Think of, if it's an English message, it's made up of set of letters or characters. One operation is called substitution where we take one element in the plain text and substitute that for another one from the character set. So let's say I take the letter S and I replace it with a letter B. So substitution is one operation. Transposition is another operation where we take the letters in the plain text and rearrange them. Substitution is taking the letters in the plain text and replacing them with other possible letters, not just those in the plain text but others from the alphabet. So if my plain text is Steve, S-T-E-V-E, substitution, I can replace S with any of the other 26 letters or any of the 26 letters. Transposition is that I take those five letters and rearrange them. So the output still has those same five letters. Product systems, and we'll return to this later, product systems really combine these operations in that maybe we do a substitution and then a transposition. And we'll see when we get to the real ciphers, what they do is they do substitution and transposition and then repeat. Again, another substitution and transposition and repeat multiple times. And that can add to the security of the cipher. Another way to characterize cryptographic systems is the number of keys used. And I've mentioned that, that there's symmetric key cryptography, both sides use the same key. So in fact, there's one key. Even though on our picture, I said key one and key two in symmetric key encryption, they are the same value. It's sometimes called secret key encryption, shared key encryption or conventional encryption. It's the old one. It's still used and relevant, but it's the original. The other one is asymmetric key encryption or public key encryption, where A and B use different keys. So key one and key two are different. That is, we have two keys as opposed to one. So different ciphers or different algorithms can be classified as symmetric or public key. We'll return to block and stream ciphers later after we go through some examples. I think in the next topic, what is a block and what is a stream cipher? Basically, they differ on the amount that they process or encrypt at a time, but not yet. So let's focus on symmetric key encryption. Both sides use the same key. So I'll introduce some notation here that we'll commonly see. Same model, plain text, and a key is used to encrypt. We get ciphertext as output. Ciphertext is sent across the network and that ciphertext is used as the input to the decryption and also a key and we get plain text as output. In symmetric key encryption, the keys used by both sides are the same. So we call it a shared secret key, key K. We often denote the plain text as P. The encryption operation is a function, E. It takes two inputs, K and P. So the ciphertext is the output of applying the function E on K and P. So we usually use this shorthand notation to write encryption operations. And decryption, we take as an input the key and the ciphertext and as the output, we get the plain text P. So we require for this to work, for this to be secure, we require algorithms which are strong. That is, if the attacker knows the algorithm and ciphertext, they shouldn't be able to find the plain text or key. And later we'll also go into how to measure the strength. And maybe even if the attacker knows more than just the ciphertext, maybe they know some past ciphertext and even some past plain text values, that they still cannot find the current plain text or key. And the other requirement is that the secret keys are secret. Let me just, we'll finish on this slide and then we'll go to some examples. So the attacker wants to find the plain text or the key. I say finding the key is better than the plain text because if we find the key, finding the plain text is easy. We just decrypt the same as user B does. But if we find the key, we can not just decrypt the current ciphertext, but we can decrypt all future ciphertexts which were created using the same key. So if I am communicating with Dr. Tanarak, we're sending messages each day to each other encrypted. Every day I send him a message. If you manage to intercept one of those messages, you can find the ciphertext and maybe you somehow you break the cipher and you find the plain text that we sent. So you've defeated the system, you found the plain text for today. But you didn't find the key, you just found the plain text. So that's good, you're successful. But it would be better for you if you find the key that we're using because not only do you find today's plain text, the message I sent at Dr. Tanarak tomorrow, which is encrypted using the same key, you can easily decrypt and find the plain text for tomorrow and subsequent days. So sometimes it's better to find the key than the plain text. But finding either of them, we say if the attacker does, then they have defeated the security of the system. What the attacker knows, currently we assume the ciphertext and algorithm, later we'll talk about other information the attacker may know and also some other types of attacks. But let's go towards some real ciphers and we'll come back to these brute force attacks later. We're going to go through five or six different classical ciphers, old ciphers but useful to demonstrate some important concepts, the concepts of substitution and transposition. Usually they operate on letters. We will use, as example, English letters, but they could be applied to any alphabet, any language. In substitution ciphers, the letters of the plain text are replaced by other letters or by numbers or symbols. So a letter could be numbers as well or punctuation characters for example. By other letters, not in the plain text but in the entire character set. We can do that as a sequence of bits as well. So if the plain text is a sequence of bits, we could consider bit patterns. So let's say I have a thousand bits. I could say we break it into eight bits at a time. So we replace one sequence of bits, one eight-bit sequence with another eight-bit sequence. But all our examples will use letters. It's easier to consider. On Tuesday, you saw the Caesar cipher. Remember we did this one. I gave you some cipher text. I gave you eventually the correct key. And the Caesar cipher is some shift of the letters. So let's see the definition. I think the earliest known cipher used by Julius Caesar. The concept is that if we have the English alphabet, the sequence of 26 letters, just consider just one case, a lower case for example, then what the original form did was to get the cipher text to replace each letter in the plain text by the letter three positions to the right in the alphabet. So if the plain text had the letter A, then the output cipher text would be the letter D because it's three positions to the right. A generalized Caesar cipher shift by k positions, not just fixed to three, but allow the users to choose how many positions they shift by k, where k is the key. Now just some notation. Often you'll see the plain text written in lower case and the cipher text in upper case. That's just to distinguish between plain text and cipher text, think that it's case insensitive in this case. Lower case A and upper case A are the same character. We're just sometimes writing upper case to distinguish. So in general the Caesar cipher shift by k positions to the right to encrypt and to decrypt, we must get the original plain text back so the decryption algorithm must be shift by k positions to the left. Take the cipher text, shift to the left k positions. And we wrap around. So if we shift the letter Y three positions to the right, it becomes the letter B. When we encrypt and when we decrypt we need to wrap around as well. So we can express that as an equation. The encryption to get the cipher text c, the encrypt, the encryption function taking k and p as an input is p plus k mod 26. The mod 26 is to implement that wrap around. Where p and k are numbers starting from A is 0, B is 1 and Z is 25. So if we think of it from a mathematical perspective, letters are just numbers, 0 to 25. We can write the encryption function as this. p plus k or mod 26. And the decryption function is c minus k or mod 26. So an example. We continue from a different one from this one. Where can we start? Let's say we have some cipher text. I think everyone understands Caesar. Instead of encrypt, let's go with a harder one, decrypting. I give you the cipher text, that's a D. Find the plain text. Don't guess it, find it. Crypt analysis is using some knowledge to try and guess it. And some people did and can guess quite easily. What will you do to find the plain text? What do you need? You need the key. If you have the key, finding a plain text is easy. We'll do it in a moment. So you don't have the key. You could try all the keys. Let's try. Let's take the dumb approach. Maybe some others will think of the better approach later. But as the attacker, we know the cipher text. We know the algorithm. It's the Caesar cipher. So we know that someone obtained the cipher text by shifting our plain text to the right by k positions. But we don't know k. So let's try some values. And I may interchange between numbers and letters. So when I write that this is the number zero, that actually means the letter A. We number the letters from zero to 25. A is zero, B is one. So if I say the key is A, it really means the key is a shift of zero positions. So as the attacker, one thing I could do is say, okay, if the key was zero, what is the plain text? Shift to the right zero positions. We don't shift at all. In Caesar cipher, if you ever use it, don't use a key of zero, okay? Because you encrypt your message and the cipher text will be the same. Not a nice key to use. So what does the attacker do? Is this the correct key? Why not? It doesn't make sense, okay? We expect, and this is an important point, that we assume the attacker will be able to recognize the correct plain text. So in this case, right, it's safe to assume that the plain text is an English word. So we assume the attacker knows that. So if we get something which is not an English word, A, B, M, D, M is not, as far as I know. So we assume that's wrong. That didn't work. And that's the same in practice in all ciphers. And from the attacker's perspective, let's assume that when they decrypt and they get plain text, that they have some way to know if this is the correct one or not. That's important. And usually they do, even if it's not in English. So if that one's not correct, we'll try a different key, B. We decrypt, what do we get? Be careful. Decrypting. Shift. Shift left. So A wraps back around. So when we decrypt A, don't look at the second line, but just the sequence, A, shift to the left becomes back to here. Z or Z. So decrypting, we need to do the opposite of encrypting. Do you think this is this plain text? No. And we keep trying. Do one more. Shift two positions to the left, so it's going to be the letter before. Again, we don't get anything that makes sense. So we keep going. Eventually we'll get one that makes sense, and we'll get there in a moment. How many possible keys are there? 26. 25 sensible keys. Zero is not a very sensible key to use, but in fact we say there are 26 possible keys. Why not 27? A shift, if we used a key of 27, like as the number, then it means shift 27 places to the right when we encrypt, which means the same as what? Shifting one place to the right. We'd get the same as shifting one place to the right. So a key of 27 is, in fact, gives us the same ciphertext as a key of one. So there are only 26 possible keys that will produce different ciphertext in this case. So we could try them all. We would not do it, but you could do it. You could write some software that would do it for you, and you try up the key. You could keep going. That is, you could go and do them all. We were not. But hopefully along the way, hopefully it's not the last one that's the correct one. I think it's not in this case. What's the correct one? Eight. So you can try it. You shift eight positions to the left. But let's try that from using the equation. So people have found it is eight. Let's try it from the equation. That is, the first letter of the ciphertext was A. Actually, we'll do it down below to have some more space. The first letter was A. What number? So this was our ciphertext. And then we had what? B, M. What was the next one? I can't remember. D, M. B is one. M is, I never remember, 13, is it, or 12? I've got it somewhere. It will save us. Sometimes remember the actual numbers. Just so you can quickly look up. A is zero. M is 12. And what are the others? Do we have? M is 12. And the other one was D is three. So let's map them to the numbers. And use the formula for decrypting, just to make sure that everyone makes sense of this. So the formula was that to decrypt, we take the ciphertext value C, the number, subtract K, eight in our example, and then mod by 26. We get a number, and that will map back to a letter. So let's do it in the mathematical approach. So let's try the key of, key is i or eight. So we subtract eight in each of these. Zero minus eight. Minus eight. So we're going to do them one at a time, but let's do the subtraction first. And remember to decrypt it was C minus K mod 26. P equals C minus K mod 26. Minus eight mod 26. That's maybe the trick part. What's the negative number mod 26? Anyone? You can try a calculator. Minus eight mod 26. You know mod? What's mod mean? How do we describe mod? Remainder. Okay, the remainder. Well, there are in fact two variants of mod. And the one that we assume here is that it will take the number, it will always return a number between a positive number from zero to 25. Okay. How do we think of mod? We'll use it a number of times in different ciphers. This is the way to think of mod. We said mod is the remainder. And the way that we can think of a remainder, if we have some integer times by the modulus 26 plus the remainder, we get minus eight. So it's mod 26. So what? Let's say it was 12 mod 10. What's the answer? 12 mod 10. Two. Why? Well, in the same approach, mod 10, some number times 10 plus two equals 12. One times 10 plus two equals 12. So 12 mod 10 equals two, the remainder. It's the remainder in that case. So the same approach here. When we multiply 26 by some integer, we get some remainder. That's this part here. And we add that remainder, we should get minus eight. Where the remainder is positive. So can someone find values of these two empty places? Something times 26 plus something equals minus eight, where this is positive. Integers. We're not using decimals here. We have mod. An integer times 26 plus a positive number equals minus eight. Anyone? Try some integers. One times 18 plus something equals minus eight. Possible? Not if this has to be positive. One times 26 plus a positive number will never give us a negative number. Zero times 26 plus a positive number will not give us a negative number. So in fact, it must be a negative number times 26. Minus one times 26 minus 26 plus 18 gives us minus eight. Minus one here times 26 plus 18 gives us minus eight. In other words, minus eight mod 26 is 18. The remainder when we divide by 26 is 18. That's the left over. So our definition of mod, we only deal with positive answers in this case. You may find other definitions of mod which will allow negative answers. So that's a slight difference. That says minus eight mod 26, the answer is 18. Minus seven mod 26, what's the answer? 19. Minus one times 26 plus 19 is minus seven. Four mod 26, easy one. Minus five mod 26, 21. Four mod 26 is four. And you see the pattern here. This 26 minus eight is 18. Shift to the left eight positions. Subtract. And then convert them back to letters. What do you get? 18, 19, four, 21, four. 18 is S, 19 is T, four is E, 21 is V. If we got to key equal to i, or number eight, we'd get the word Steve and you'd be smart enough to recognize this is a correct or a valid plain text. And I don't have it, but if you tried all values, you'll find that all other values just look like random characters combined. They don't make any sense. So it'll be very easy to pick out the correct one in that case. So just an aside on mod, the way that we use mod here, we only return positive numbers. We'll see that come up towards the midterm again. Any questions on Caesar cipher? I think I have another example to show that multiple keys. Maybe hard to see. Let me see if I can... Here's a larger one, it's hard to see, but we'll zoom in a bit. That was some ciphertext given. You don't have to see it, but if I zoom in and I'll back out. But a large sequence of ciphertext that was encrypted using the Caesar cipher. What's the plain text? Well, you could try all possible keys. And I did that. Again, hard to see. So I tried all possible keys. It goes down to the last key there and I give the potential plain text there. What's the key? 11. If you look at all of them, only one of them makes sense. All the rest are random arrangements or random-looking arrangement of characters. So this is this point of if we try all possible keys, one of them will give us something that we can understand, and that's how we can find the right key. This is called a brute-force attack. The attacker tries all possible keys. The name of such attack is brute-force. It's the dumb approach, but can be very successful. Very easy, just try them all. In this case, we try 26 keys. So to break the Caesar cipher, a brute-force attack is easy because there's only 26 possible keys. It doesn't take long to try them all. Well, of course, this assumes that we can recognize the correct plain text. So we need to know the language. What if it was a different language and we didn't know the language? Well, still, it wouldn't take long to try different languages. There are not many languages that we have to try, really. There are hundreds of languages, but if we had a computer, we could try them and check the words in a dictionary or even better, if we know who's communicating, it's likely that we can guess which language was used in the plain text. So that's usually not... using a different language doesn't add security. Sometimes we compress data. Compression involves... you take the plain text first, maybe apply the zip algorithm to compress it, and then encrypt that. So now a brute-force attack, when we decrypt using each key, we get the compressed version, which looks random again. But again, the attacker can overcome that by trying some different... decompression algorithms. So again, try some decompressed... using zip and other algorithms, and then you will still get plain text. One of them will make sense. So usually... well, usually, I think always, when we communicate, the plain text that we're communicating to someone has some structure. Not very often do I send random messages to people, because that's not really communications. If I send a random sequence of characters to you, what does it mean? So every message we communicate has some structure. Therefore, it's possible for the attacker to recognize the correct plain text by identifying that structure. The structure in English is the arrangement of letters. So how do we beat brute-force in the Caesar cipher? Maybe we could try and hide the encryption algorithm. Let's say if you didn't know it was the Caesar cipher. You were just given the ciphertext, but not the name of the algorithm. Well, again, in practice, that's hard to hide from people, because when we have computer implementations of the algorithms, then it's usually possible for the attacker to find out maybe what software you're using, and then from work backwards and try and work out what encryption algorithm was used. So it's not practical to hide encryption algorithms. Use a different language or compression doesn't help much. It helps a little bit, but not much. So the way to defeat a brute-force attack increased the number of keys. Here we only had 26. I can do it by hand with 26. It takes me 10, 15 minutes, and I can beat it. With computer, it's almost instantaneous to break that. So Caesar cipher is not so good with respect to a brute-force attack. Let's try a different cipher, a different classical cipher, and these are what's called a group of mono-alphabetic, single-alphabet substitution ciphers. With the Caesar cipher, we had this fixed arrangement. We always moved each letter in the plaintext to the right by K positions. In this mono-alphabetic cipher, and what we do is we choose for every possible plaintext letter, A through to Z, and I'll enlist some of them here. Before we encrypt, I'm mapping to any of the other letters, or to any letter. That is, I choose to map my plaintext letter A to D. If my plaintext letter is B, it will become Z. C to G, D to L, E to S, and so on, Z to Q. I choose that mapping at the start. Then when I encrypt, I take my plaintext, hello, and then I just look up in the mapping and I find, okay, it's not in here, sorry, but H maps to some letter, E maps to S, so the plaintext letter E becomes ciphertext letter S, and so on, and we get some ciphertext, and I send the ciphertext to you. You must also know this mapping. So I must have somehow told you that we're going to use this mapping of A to D, B to Z, and so on. And since we both know that, when I send you the ciphertext, you can decrypt because when you find the letter S, you know that means the plaintext letter E. So in this cipher, we map each letter to any of the letters in the alphabet. The Caesar cipher maps it always to the right by K positions. This is more general. Sorry? There's no pattern in this arrangement here. I choose what I want. So in effect, the pattern or the sequence here is the key. So I say to you, I'm going to use the substitution of D, Z, G, L, S, B, T, F, Q. That's the key. I tell you that. I give you that key, because before you know, whenever you receive ciphertext of S, that the corresponding plaintext was E. It's too long to do an example because we need a list for all 26 characters, and I'm too lazy to do that, but we'll do some a little bit of analysis. So the idea is that let's choose a key right now just for some of the letters. So choosing a key involves choosing a mapping of 26 letters of plaintext, and we choose, let's choose A maps to some letter. Choose a letter that you want to map A to. L. I'll write it in uppercase just to distinguish that this would be the plaintext and this would be the ciphertext. When you chose that letter L, how many could you choose from? You could have chosen from 26 possible values. A can map to A, B, C through to Z. You chose L, but there were 26 to choose from. So can we choose a combination of lowercase and uppercase? Yes, but in fact what we should do at the start is to define the character set. Define the set of characters that we're allowed in our plaintext and ciphertext. In my example is I'm assuming all one case. That is, we only, all our messages just use any of these 26 lowercase characters. But if you want to say it's either uppercase or lowercase, then you've got 52 characters. Or uppercase, lowercase numbers, then you've got 62. You can do that. It's the same concept. In this example, we're just using 26 lowercase characters. When I, now we need to choose what we'll B map to. Anything? H. How many did you get to choose from? 25. Because you can't map B to L because we've used A to map to L. If we did, if we had A map to L and B map to L, and then we receive ciphertext of L and try and decrypt what is P equal. So if the mapping was from A to L and from B to L, and then you receive ciphertext which is L, what's the corresponding plaintext letter? A or B. A or B. You don't know. So to be able to decrypt, you must have a unique mapping here. That is, we can't have A and B both mapping to the same letter because the decryptor would not be able to get the correct plaintext back. So we cannot allow that. C. And you had 24 to choose from. D. 22 to choose from there. And you keep going. This is what you do really to choose the key. The key is this sequence of letters and maybe you end up with T, A and Q. You need to do it for the rest. And how many? 3, 2, 1. That is, for Z, when you choose one to map to, there's only one to choose from because there's only one left. Of course you can choose a different mapping. That really corresponds to a different key because when you encrypt, you'll get the different ciphertext. So the mapping corresponds to the key with this cipher. And someone to communicate, they need to know what this specific mapping is. The numbers here, 26, 25 is the number of possible values that we could have chosen. 26 possible keys. There are, when we map A to a letter, there's 26 possible values. When we choose for B, there are 25. So in total, there's 26 times 25. And if we keep going, what is the answer? 26 factorial. That is, there are 26 factorial possible combinations of those 26 letters. Possible arrangements. Which means with this cipher there are 26 factorial possible keys. I may use this key, that's one possible one. You may use a different one instead of LH, it may be in HL. So that would produce a different set of ciphertext when you encrypt. And someone may use a different one. In total, there are 26 factorial possible keys. Brute force. How many keys? 26 factorial is more than 4 by 10 to the power of 26. Caesar cipher has 26 keys. This mono-alphabetic cipher, which is not very complex allows us for more than 4 by 10 to the power of 26. Which is I can't even think of the name. It's billions of billions of billions keys. And we'll see some numbers later. It would probably take centuries to try them all. Not years, but centuries to try them all. Maybe several lifetimes of the universe. So for a brute force attack this cipher is good. It's not subject to a brute force attack because even though in theory someone can try all keys in practice there's not enough time for the rest of the universe. So this is a better cipher in terms of security. How long does it take? Brute force attack. I'll just jump back. There's a slide that has the number. I always forget it. This one. This last row. 26 factorial. If I was trying 10 to the power of 15 decryptions per second. That's what, a million billion decryptions per second. My computer does a million billion times per second decrypt. It would take me 10,000 years on my computer. Okay. So it's a large enough key space such that brute force is ineffective. We'll see some other numbers later. So in fact there are two types of attacks on ciphers. One we call a brute force attack. Try all keys. The other is the more intelligent attack is to do some analysis of maybe the cipher text, the relationship between the plain text key and cipher text and how the algorithm works. Do some analysis to try and find and calculate the key called crypt analysis. And when we're using English or plain text which has some, it's from some language, then crypt analysis usually involves exploiting the regularities in the language. Some letters occur more often than others in each language. In English, what's the most common letter? E is usually the most common letter. So there's some letters, if you look at a large set of text, many different plain text values and count all the letters in there or if you look at a dictionary all the words, you'll see that some letters occur more often than others. This is from some analysis of a large set of I think legal writings but a large database of text. They just counted all the letters and then worked out the percentage of the letters which were an E were about 12%. E occurs much more often than some of the other letters. The next most frequent was what? T. About 9% of the letters were T in those large set of texts, of plain text and then A and so on. The most the least frequent Q X, J and Z. So some letters occur more frequently than others. And crypt analysis can take advantage of that attack to look at the cipher text and try and map it back to the plain text. It's called frequency analysis on the text. What's more some diagrams occur more frequently than others. What's a diagram? Not a diagram a diagram. A diagram is a pair of letters like A, B, C, D T, H A, N. So if again you look at all a large set of English texts you'll see that some pairs of letters occur more frequently than other pairs. The pair I think A, N O, N, I, N are quite frequent. Q, X doesn't occur very often. So there are some combinations of pairs of letters that are much less frequent. And trigrams are triples of letters. T, H, E occurs often. So other triples of letters don't occur so often. We can use those statistics to break ciphers which are not possible to break with brute force attacks. I will point you to an example of that. It's hard to go in the lecture because it's quite long. I'll point you to an example of that before next week. Let's go back to and summarize on the different classifications of attacks. Brute force attacks try all keys in the key space. So the key space is the set of possible keys that we can use for some particular cipher. With the Caesar cipher the key space there were 26 possible keys. The size of the key space was 26. With the mono-alphabetic cipher the size of the key space was 26 factorial. In real ciphers today usually the keys we choose are random binary values. So a k-bit key has 2 to the power of k possible keys. When we measure attacks on ciphers we want to know how long does it take to find the key or find the plain text. Usually we measure in terms of number of operations. That is how many times do we need to decrypt until we get the right plain text. Which is usually proportional to time. Because each operation each decrypt operation takes usually the same amount of time. The time depends upon the computer speed. If we're using a computer to do such an attack. Crypt analysis is the intelligent approach for breaking ciphers but much harder. It's about finding weaknesses in algorithms. There are different approaches we'll see as we go through real ciphers but one of them with the classical ciphers is what's called frequency analysis where you look at the characteristics of which letters, diagrams, trigrams and words occur more frequently and use that to try and find the plain text or key from a cipher text. So let's just explain this so you can make sense of the numbers. The columns are the key lengths. Well, not the last one but if we have a binary key a 32-bit key, 128-bit key if we use binary keys the key space is the number of possible keys in fact in the last one 26 factorial is also the key space. So if we have 32-bit key there are two to the power of 32 possible keys. And then I give some example durations it would take a computer to find the key if it did a brute force attack and now that depends upon the key space and how fast my computer can test all the keys. If my computer can do 1 billion tests per second and with some of the real algorithms today they can get up to that speed then we get these values. For example if I have a 32-bit key there are about 4 billion possible keys to try if I can do 1 billion per second it takes me 4 seconds on the computer brute force is easy. If I have 128-bit key which is common for some systems today my computer my same computer would take me 10 to the power of 22 years so by increasing that key by a factor of 4 from 32 bits up to 128 bits still not very long I've made a brute force attack impossible. If my computer can run faster or if I have not just one computer but a thousand computers or a million computers 10 to the power of 15 times per second my 128-bit key is just down to 10 to the power of 16 years it doesn't help effectively that is if I have all the computers in the world and they're all in the next 10 years are very very fast they increase in speed still I'm not going to do a brute force attack so we just need to make the key space large enough to stop brute force attacks and typically today that's maybe more than 80 or 100 bits is sort of a limit. Usually people recommend 128 maybe 256 bits just to be safe 26 factorial still is not possible with brute force attack what we'll do next week is we'll come back and continue on the classical ciphers and keep going through them and return to some of these brute force attacks and other attacks