 So we're going to focus on symmetric key encryption. So in our general model, we encrypt plain text with a key. We get ciphertext. We send the ciphertext to user B and they decrypt using a key to get the original plain text. Metric key encryption, both A and B, use the same key. We call a shared secret key. It's shared amongst the two users and it must be secret. Only those two users must know that key. If anyone else discovers the key, they can also decrypt. So with symmetric key encryption for confidentiality, we use some encryption algorithm. And we're going to talk about some of those algorithms shortly. We take the plain text and we apply the algorithm on that plain text. The other input is the shared secret key and we get ciphertext's output. And that's what we send across our network or that's what we save on the disk. We want to encrypt a file. And the receiver, you want to decrypt the file or decrypt the message that you received. You use the same key and you apply a decryption algorithm on that ciphertext and you'll get the original plain text, the exact same message that the source started with out. That's if the cipher or the algorithms are designed correctly, we must get the original plain text out. An attacker, assuming that they know the algorithm used, so they know what you used to encrypt and decrypt, what the steps you used, they know the ciphertext, they can intercept the ciphertext, then their goal then is to find the plain text or the key. If they find the plain text, then we don't have the service of confidentiality. We've defeated the service from the attacker's perspective. If we find the key, also we can easily find the plain text. If somehow I can find the key k, if I'm the attacker, then I can easily decrypt because I know the algorithm, I know the ciphertext and the key, then I simply apply the decryption algorithm on the ciphertext and I get the plain text. So the attacker aims to find the key or the plain text. They are given the ciphertext and the algorithms. They are not given the key. We assume they don't have it. For this to work, for encryption to be used for confidentiality, we need strong algorithms. So what we're going to do is work through starting from some very old and weak algorithms, but look at the principles they use to build the encryption algorithms and arrive eventually at some of the common techniques used today, the common algorithms. We need a strong algorithm so that the attacker, even if they have the ciphertext, they cannot work backwards and find the plain text or find the key. And in fact, in practice, even if the attacker has the ciphertext and some old plain text, they know in the past you encrypted this plain text and you got this ciphertext. They have a pair of plain text ciphertext, but they don't have the key. It still should be hard for the attacker to find the plain text or key. If it's easy, then we say this is not a strong algorithm. And the other thing that's important, because we have shared secret keys, the way in which that key is shared is important. How does B get the key K from A? Somehow we have to give B the key, or the way in which we distribute that key is very important. It must be done in a secure manner. And we'll talk about that with some examples as we go through. So we assume for now that the key is distributed in a secure manner. So we said that what are the encryption and decryption algorithms? Well, they use two main operations. The algorithms use two main steps, substitution and transposition. So I'll use some of the old ciphers to demonstrate those steps. Substitution ciphers and transposition ciphers. So we'll go through some examples. And then after going through some examples, we'll introduce some different attacks. We'll jump through to the substitution technique, starting from one of the very first known ciphers. So how do we encrypt? Remember substitution? We take one element of our plain text and replace it with another possible value. Well, what are our elements? What are our possible values? In our examples and the classical ciphers that we're going to go through, they are operating, we assume, on the English alphabet. So the possible values of the plain text to keep our examples simple will normally be A to Z, 26 possible values, lower case only. So sometimes I'll write answers or ciphertext in uppercase just to distinguish between plain text and ciphertext, but the set of possible letters are only the lower case letters that we are operating. So we assume the plain text is made up just of the 26 English lower case letters. We could extend the algorithms to operate on different character sets. We can include uppercase, lower case, we could add punctuation, or we could use a different language. The same principles will apply. So substitution ciphers take one letter in the plain text and replace that letter with another possible letter of the 26 possible values. Now we can also extend this to operate on a binary form. That is, it doesn't have to be just on English letters. We can think of a letter using the ASCII encoding maps to a sequence of bits so we could convert to binary if needed. Let's start with English language examples. The first cipher we'll start with is the Caesar cipher named after Julius Caesar, a Roman emperor or general. It's about 2,000 years old, so this is a very old cipher and supposedly was used by Julius Caesar to send secret messages to other people. And it's a very simple cipher. The rule is take your plain text letter, so you have a plain text message you want to send, a set of letters. For each letter you take it and you replace it with the letter three positions to the right in the alphabet. So if your first letter of your plain text is A, then the output cipher text letter will be D. It's three positions, one, two, three to the right of A. If the second letter of your plain text is T, then the output cipher text, the second letter, will be three positions to the right will be W and this just shows the direct output in two lines. And if you get a letter Z as input, then three positions to the right we wrap around back to the start. So three positions to the right of Z is C. So we move one, two, three. So we wrap around so we can handle any possible letter. So that's the algorithm for encryption. It's a substitution. We substitute one letter of our plain text with one of the other 26 letters, which one will define by the rule the one which is three positions to the right in your alphabet. Now this could be extended to other character sets quite easily. It doesn't have to be 26 letters. We could have uppercase and lowercase. We could have a different language and so on. Let's have an example. What will we start with? The cipher text that I'll give you is... Let's start with... That's a D. There's the cipher text. And so you're the attacker in this case. You've intercepted a message sent from A to B and this is what you've intercepted. P, R, Q, G, D, B. And we assume the attacker knows the algorithm that the two users are using. Some other way that they've realized are they're using the cipher... Caesar cipher as the algorithm. So in all our analysis and attacks, we'll assume that the attacker does know the algorithm. In practice, how could they know the algorithm that two people are using? Usually the algorithms used for encryption are standardized. So the algorithm that your computer uses to encrypt files, it maybe comes as part of the operating system or it's a standard for the network protocol. So it's known in advance what algorithms are used for encryption. So it's not hard for the attacker to find out which algorithm you're using. So it's not kept secret. So we know it was performed using the Caesar cipher. Find the plain text. You're the attacker. What's the plain text? The cipher is the Caesar cipher in this case. What's the plain text? Calculate it for yourself. Decrypt. And if you look on the lecture slide, you'll see that the amount of letters help you quickly work backwards to find out what the original plain text was. What's the first letter of the plain text? Z. Now, in all of the examples we'll go through, the plain text will make sense. And we'll see that's one way that will perform a tax. If you get a plain text message in the exam or a quiz answer that doesn't make sense to you, it's usually not English, then you've done something wrong. So make sure you get a plain text that makes sense as a hint. Monday today, yes, but what's the plain text? That's the plain text. Right, I think you'll find that if you go backwards, you get M as the first letter. Why? Decryption. No, decryption is the opposite process of encryption. When we encrypted, we take, say, the letter M, if that was our plain text letter, and we'll get the cipher text letter P. When we decrypt, we must get the original plain text back. So the decryption algorithm is not the same as the encryption algorithm. It should do the opposite. So if we have the cipher text letter P to decrypt with the Caesar cipher, we move the left three positions. And we should get the original plain text letter M. Encrypt, move to the right three positions. To get the original plain text, you must move to the left three positions. So we have the cipher text letter P. We move to the left three positions. We get M. The second cipher text letter was Q, I think. No. What was the second cipher text letter? I'll have a look. R. So the second letter was R. We move to the left three positions and we get O. And then we had Q. Where's Q? To the left, one, two, three, we get N. G. YG. Have I written down the wrong thing? D. Ah, G, sorry, not Q. G goes back to D. P becomes M. R goes to N. O, Q goes to N. And we get our plain text. So we shift to the left. So when you look at decryption algorithms, we'll often only mention the encryption algorithm. The decryption should be the opposite such that we get the original plain text out. This Caesar cipher takes plain text, shifts to the right three positions to get the cipher text. We said in our general model that we, just jumping back to our picture, our ciphers take plain text, encrypt to get cipher text, but we said that there's a key. In this case, we didn't mention a key with our Caesar cipher. But if we generalize that Caesar cipher, which said shift three positions to the right, we could say, well, shift N positions to the right. Where the number of positions we shift to the right could be a parameter that the users choose, and that can become the key. So a generalized Caesar cipher shift K positions to the right to encrypt where K is the key. And to decrypt, shift K positions to the left. So we can think, in this case, the key for this example is three, in that we shifted three positions to the right. And with our letters, how many letters do we have? 26 letters, we'll usually talk about them if we can write this as an algorithm. We can think of them as numbers from zero to 25. A is zero, B is one, is two, D is our key three. I'll just list them. So what we'll often do when we're talking about ciphers to think of them in a numerical form, we can map those letters to numbers, starting from zero. So when I say the key is three, sometimes we'll say the key is D, the letter D. Because D is the fourth letter, if we start at zero, is key number three. So the key may be given either as a number or a letter. And that will become useful when we talk about some other ciphers that are built on Caesar cipher. So there's our very first cipher, very simple, and it substitutes one letter in plaintext with one of the other 26 letters, which one was defined by the key and this algorithm, shift to the right. Unfortunately, you or most of you found the plaintext without knowing the key. In fact, the key is built into this algorithm, three positions. Let's consider another example when we have a different value of the key. A different example where the cipher text, I'll give you another one. It's still using the Caesar cipher, but I'm not going to give you the key in this case. Find the plaintext. So in this general Caesar cipher, the key can be one of 26 possible values. Instead of shifting three positions to the right, I shifted k positions to the right where you don't know what k is. So what I did is I took my plaintext, shifted k positions to the right, and I got ABMDM. You don't know what k is. Because you're the attacker, you don't know the key. What's the plaintext? Or more importantly, what approach would you use to find the plaintext? Brute force. What does brute force mean? Try all possible keys. So if we don't know the key, I'm the attacker, I know the ciphertext, I know it was a Caesar cipher, but I don't know the number of positions shifted to the right, well let's try them all and see if one of them gives us a plaintext message that makes sense. Because we expect that the plaintext message should make sense. And to be English. So let's try some of them. As the attacker, let's say, let's go in order, the key of zero, which corresponds to the letter A. If we label the letters A is zero, B is one and so on, the key of zero means a key of A. And what do we do? We decrypt and we get plaintext, I'll write here, plaintext P, if we decrypt with key zero, what do we get? Decrypt, move K positions to the left. Move zero positions to the left, we get the exact same message. Does this message make sense? Is it an English word or phrase that you recognise? Probably not. So this is the first thing, that the attacker needs some way in this attack to recognise does this potential plaintext make sense? Is it correct or not? And we'll use the structure of the intended message, or say the language, to do that detection in this case. I don't know of any words called Abdomenum. And I can't see if it makes a phrase, so I'm going to assume that this doesn't make any sense. That is, I'm going to assume that this is the wrong key, because it gives me a plaintext that doesn't make sense. So as the attacker, I now try a different key. Let's go in order. K of one or equivalent to B. Just remember. That is, I'll assume that the user encrypted by shifting one position to the right, therefore to decrypt, I'll move one position to the left. What's the letter before to the left of A? That is, the letter to the left of A by one position is in fact Z. So the plaintext will be Z. And to the left of B is A, and so on, L, C, L. So I try this key as the attacker, and I get this plaintext. Does it make sense? No, it doesn't look obvious to me. So I try another key. And we keep going. What keys can I try in worst case? 26 letters. And the way that we... Caesar Cypher works is we move K positions to the right to encrypt, or K positions to the left to decrypt. If we... We saw if we move zero positions, then we get the same input as output. If we move 26 positions, it'd be the same. It would wrap around. 26 to the positions to the right of A is the same as zero positions to the right of A. We wrap around. If we move 25 positions, then A becomes Z. But if we move 26 positions, then A becomes A, which is the same as moving zero positions. If we move 52 positions to the right, it's the same as moving zero or 26 positions. So in fact, there are only 26 possible variations that we can have. There are 26 possible keys, zero to 25. So the attacker in this brute force attack, in the worst case, needs to try all possible keys. So one way we'd go through is we keep trying, and I'm not going to write them down. And we'd find some plain text and whatever it was. I think you'll find that that doesn't make sense. And you keep trying. And anyone know the key? Eight. Eight. We get to eight. Let's see what happens, which is the letter I in our alphabet. The plain text, what do we get with eight? First letter? Well, A lifted... The letter A shifted eight positions to the left, brings us to S. The letter B shifted eight positions to the left, will bring you to T. M, eight positions to the left. One, two, three, four, five, six, seven, eight. Gives you E. And then you may see the plain text. The letter D, eight positions to the left, will go to V. And M, mapped to E. So another M, also mapped to E. Now you guess, ah, this is a message or this is something that makes sense. It's the lecturer's name. So that's probably the correct plain text. And indeed it was. So this is the steps of a brute force attack that a hacker can do. To try and find the plain text and the key, they can try all possible keys. Well, in this case, we only needed to try nine before we got something that matched. We could keep trying those just to confirm. You could try all 26. But what you do, you'll see, I think, none of them make sense. Now, this becomes more accurate for the attacker to identify what makes sense as the message gets longer. The longer the message, the less chance of getting two that make sense when you try different keys. And most plain text messages we're trying to decrypt are not just five characters, usually they're much longer. In this case, we took nine attempts to find the correct plain text. The attack, we had to decrypt nine times and then we got something we recognized. In this example, the actual attempts, we had nine decrypts. That is, the attacker decrypted the ciphertext nine times before they got something they recognized. What's the best case from the attacker's perspective? Let's say it doesn't go in order or it just randomly guesses keys. What's the best case it could achieve? Well, the best case I choose the key first time, the correct key. I randomly choose a key and it is indeed the correct key. So the best case is one decrypt. It takes me one attempt. What's the worst case? 26, I need to try all possible keys. This is from the attacker's perspective. What's the average number of attempts? If we can assume that for a different plain text so maybe a different key was used. So we need to do this many times and we need to guess the key. On average, how many attempts would it take the attacker to guess the key? The correct key. Sometimes we'll get the best case. Sometimes we'll get the worst case. Sometimes we'll take two attempts. Sometimes 25 attempts. On average, how many attempts would it take to get the key? We've got 26 to choose from. Take us... Sometimes we get one attempt. Sometimes we get two attempts. Sometimes it takes three attempts. Sometimes it takes 25 attempts or 26 attempts. On average, it would take half of 26 or 13 attempts. If we want to guess a number from all we're doing here is guessing the key. We know it's from 0 to 25 or 26 possible values. If you randomly guess, on average, it would take you 13 attempts to get the right one. So we can now talk a little bit about the performance of this brute force attack. From the attacker's perspective... Right, sometimes it would be easy. Sometimes it would take... The worst case, it would take all possible keys to try. On average, it would take half of the possible keys. The number of possible keys, 26 divided by 2. Alright, in this particular case, it took 9, better than the average case. But if we had another plain text and we tried again, it may take more than 13. So, in general, for a brute force attack, a brute force attack in the worst case, try every possible key. On the average case, try half of the keys. Here, the 26, we try 13. And it applies to other cyphers as well. How do we stop the attacker from doing a brute force attack? Or how do we make it hard for them to do a brute force attack? Assuming I give you a computer, so you don't have to do it on paper even to calculate, I give you some software to decrypt with the Caesar Cypher. So if it does one decryption very fast, then how can I prevent or stop you from doing a brute force attack? On the Caesar Cypher, is it possible to stop a brute force attack? No, 26 attempts with the computer is going to take less than a second every time. So this key problem with the Caesar Cypher is the number of possible keys. There's only 26 possible keys. So if we have more possible keys, a thousand possible keys, a million possible keys or more, then maybe that leads to more attempts that the attacker needs to take. So the number of possible keys is going to be an important design parameter of our cyphers. We need to have a key space such that a brute force attack is not possible, or it will take too long. We'll return to that, what is a good size after we go through with a few more examples of cyphers. I'll give you another example of Caesar Cypher, but with a computer, so we save some time. Let me bring it up. I have a Cypher text message. There it is. I just squeeze it on the screen. This is the Cypher text. I've intercepted. I'm the attacker. I know the Caesar Cypher was used, but I don't know the key. Then what I would do as the attacker is to decrypt that Cypher text message with every possible key. I can try the keys in order, or I can just try them randomly. Assuming the users, when they chose a key, they chose one randomly, it's sufficient that I choose either randomly or in order. So I will not do it on paper. I just have some software that will do it for us. That's the Cypher text. I just have some simple software that will use the Caesar Cypher and decrypt that long Cypher text. So that was the Cypher text. Don't be too concerned about how the software works. I decrypted the Cypher text using the key 0. Now the key 0 is not a good one to use if you encrypt, because this plain text and the Cypher text will be the same. So it's a possible key, but not a smart one to use if you encrypt, because nothing changes. But I don't make any sense of this plain text, so as the attacker I'll try the next key. Key 1. Does it make sense? And I keep trying keys. And I keep going until I get a message that makes sense. And I'll not do them here, but I've done them before, if I can bring it up. Here they are. That is the 26 possible plain text messages. Hard to read, I know, but with key 0 or A I got this. With key 1 I got DROY. 26 possible plain text messages. Which one's the correct key? The only truly secure computer is one buried in concrete with the power turned off and the network cable cut. And all the rest are random characters. So we can easily detect which is the correct key by detecting the plain text that makes sense. When you have large enough or even reasonable sized plain text messages like this one, it turns out that if you use the wrong key you'll get effectively random characters or characters that don't form an English phrase or message. If you use the right key you'll get a message that does form an English phrase or message. So that's one part of the brute force attack or any attack. The attacker needs some way to recognise is this plain text the correct one or not. And when we have a large enough key space and we use a normal language, that will almost always be true. The attacker can simply look at them, not just look at them and read them but apply some software on them to determine of these 26, this one's correct. So it can automate the detection of the correct plain text. So Caesar Cypher is not good from a brute force attack perspective. There are only 26 possible keys. It doesn't take us long to try them all and find the correct plain text. But there are other possible attacks. Let's try maybe a slightly more intelligent attack on the Caesar Cypher. Let me find another example. I'll give you another example where we know the Cypher text and we know it was encrypted with the Caesar Cypher. So there it is, that's the Cypher text that we've intercepted longer again. We could do a brute force attack. I could try all 26 possible keys. Take me 26 attempts and I would find the correct plain text, fine. But we can do another smarter attack. We can do analysis based upon our knowledge of the algorithm and the language that's being used here. Let's assume the user was using English as the language. The plain text is in English. Therefore we expect that the plain text message has some structure. That is some letters occur more frequently than others. What's the most frequent letter in English? Well, if you look at a large set of texts, you go to a website and download many documents and then count all the letters of them. You could do some analysis. I've downloaded a book just as one example. It's just a book, I don't know how long, but hundreds of thousands of characters. It's just an old book. And what I'm going to do is count the letters in that book. The software we're using to count doesn't matter. Remember, count the letters in our book and that's sought by the percent. And it tells me of all the letters in that book ignoring the punctuation and spaces and so on, the most frequent letter is E. It occurs about 12% of the time. The next most frequent letter is T. It occurs about 9% of the time. And I've only listed the first, I think, 10 or so. If you look at all of them, I think you'll find Q and Z and X are the least frequent letters. And that holds commonly in other texts. It'll be different, but E, T, A, O are usually the most frequent letter in English texts. If you look at other languages, you would see other characteristics, but some characters will be more frequent than others. So we expect, let's assume, that I expect the most frequent letter in my plain text that I'm trying to find is E. But so if E is the most frequent letter, then if I look at my ciphertext, let's look at some statistics. Let's count the letters in our ciphertext as the attacker. I count the letters in my ciphertext, which is shorter, and I see the most frequent letter is J, about 13% of the time. Why about 9%? You'll see that these numbers are similar to the ones we saw before, but of course the letters are different. We had E and T. Here the most frequent letter is J. Remember, Caesar cipher is a substitution cipher. We substitute one letter of plain text with another possible letter of our alphabet, and it's quite a simple cipher in that some letter is mapped or substituted with a letter J. What letter do you think was substituted to get J? Most likely it was E, because if E was the most frequent letter in the plain text and we substitute that with one other and J ends up the most frequent letter in the ciphertext, then it's most likely that E and J map to each other. That is, E is the plain text, J is the ciphertext. If that was true, what would the key be? E moves right five positions to get J, so we would assume that the key is five in that case. Let's try it. Let's decrypt, decrypt our plain text, our ciphertext, which is in a file in this case, with a key five, and it's a file in this case. So I'm going to use a key five. This is my first decryption attempt as the attacker. It's slow because my software is very poorly implemented. It's not. And do we get it correct? We get a plain text that makes sense. What's the point here? The point is the attacker in this attack used one decrypt to get the correct plain text. With a brute force attack, the worst case, I'd use 26 decrypts on average 13 decrypts. So I've sped up my attack by using some knowledge of the algorithm. I know that the cipher maps one letter to one other and I know something about the characteristics of the English language. The letter E is the most frequent. Therefore I guessed the most frequent letter in my ciphertext corresponds to the letter E in the plain text. And once I know that, I know the mapping, the key is five. I try it and it works. It may not have worked. That is, even though E is the most frequent letter in our book, it may not be the most frequent letter in our plain text. But commonly it will be close. And if it's not E, I would have tried T, the second most frequent letter. This is called frequency analysis. We analyse the plain text and ciphertext based upon the frequency of letters. And it's a simple way to do an attack on the cipher. So we've got two problems with the cipher. The key is too short. The key space, there's only 26 possible keys. A brute force attack is possible. But also it's quite weak. Even if we don't do a brute force attack, we can use some analysis of the frequency of letters to find the key quite quickly. Any questions on cipher so far? Just to make it more interesting, we can represent the cipher as an equation. We map our letters to numbers. A is zero, B is one, C is two and so on. The shifting to the right is equivalent to doing addition of our key. The ciphertext is obtained by taking the plain text p letter, the letter of the plain text p and the key value k. We take the plain text value, add the key and the concept of wrapping around, we can implement using mod. Mod of 26. So we'll often look at ciphers from an algorithmic perspective and here we can implement it as a simple equation. Let's just confirm that. Let's just see if it works on our previous example where we map our ciphertext to numbers. Same ciphertext, just to show you the idea of that equation. Let's map those to numbers. A is the number zero. If you can't remember your alphabet, start remembering it. We're just mapping the letters to numbers. B is one, M is 12 and so on. So our ciphertext numbers are 0, 1, 12, 3 and 12. Our key in this case, we found the key to be 8 or i. To encrypt, we take our plain text and add the value of key mod by 26. To decrypt, we take the ciphertext and subtract the value of the key and mod by 26. So we do that on each letter. That is, we take 0, the value of the ciphertext, subtract the key and we get minus 8 and we do it for the others. The second letter, 1. Well, we'll stay on this one. So this is the value of c, the ciphertext, 0. The key is 8, so c minus k is minus 8. What's c minus k mod 26? What's minus 8 mod 26? The same. The same as what? Minus 8 mod 26. Here we have a bit of a problem. What does mod mean? Well, there are in fact two different variations of mod. Here we'll use mod which only returns a positive number 0 to 25 when we mod by 26. Sometimes in different implementations of mod, you can get a negative number. So you may think, what, minus 8? No, here when we write mod, it means the answer must be in the set 0 to 25. And the way to think of it, the meaning of mod in this case is the remainder, I think people understand. The mod we think is the remainder. Well, the idea is some integer times by 26 plus a remainder equals minus 8. So what values can we put here in the spaces such that we get minus 8? Where the remainder is positive. Some integer, positive or negative, times 26, y times 26, we're modding by 26. We understand mod as meaning the remainder when we divide by 26. So that what's left over? So some integer times 26 plus something equals minus 8. What values can we put in there? 18 for the second one and the first value, minus 1. Minus 1 times 26 is minus 26 plus 18 gives us minus 8. The point is the answer is 18. Mod 26, minus 8 mod 26 is 18. It's the remainder when we divide by 26. So be careful with mod here and other places I use it in the course. The answer is always positive. It's between 0 and the modulus, 26. 0 and 25 in this case. What is the letter 18 in our alphabet? It's the 19th letter, which is s. 18 is s. And you can do the same for the other characters. 1 minus 8 is minus 7 minus 7 mod 26 you'll find is 19. In fact, the mod implements the wrap around. The ciphertext value minus the key of 8 4 mod 26, that's an easy one. And 4 is the letter e. Minus 5 mod 26, 21. So if this sees a cipher, we think of shifting letters, we can implement or as an equation in this case, both encryption and decryption. And then you can apply it to any alphabet. If you order the set of characters, then you mod by 26. If we wanted to have 27 letters in our alphabet, include the space as the 27th character. So z is 25, the space character is 26, then we could have the Caesar cipher, but it would be mod by 27. There'd be 27 characters in our character set. Remember that characteristic of the mod that we use in this course. We always end up with a positive number. Alright, let's try a better cipher then. Caesar cipher, the key space is too small. There are only 26 possible keys. Therefore brute force attack is possible. On average, it would take 13 attempts. With a computer, that takes zero time. Very fast. Also, another problem, even if we don't do a brute force attack, the way that the Caesar cipher maps one letter to one other letter makes it easy to do this analysis using the frequency of letters. That is, we see it here. The ciphertext M corresponds to the plaintext E. Whenever we have an M in the ciphertext, it will be the same letter in the plaintext. And from that, based on the frequency of characters expected in the plaintext, we can use that by counting the frequency of characters in the ciphertext to guess the key. And that's quite successful. So how do we improve? Well, how do we improve against brute force? Well, one way is to make the attacker guess what algorithm it uses. Don't tell them that we use the Caesar cipher. But in practice, that's hard because A, there's not many algorithms that they need to try. And B, usually we know the implementation. Again, it's hard to hide the implementation of the operating system or from the encryption software. So usually it's assumed that it's known the algorithm, or it's easy to find. Another thing could be compress the message before we encrypt so that the frequency of characters changes. But again, if the attacker can guess the compression algorithm, ZIP, RA, or Z7Z, or whatever, different compression algorithms, then they can try that into a brute force attack. Use a different language, same problems occur. There's not many languages we could try. Even if there are 100,000 different languages that we could try, a brute force attack increases from average 13 attempts up to 1.3 million attempts. But with a computer, 1.3 million attempts won't take long. So the main way to deter against a brute force attack makes sure that the key space is large. Increase the number of possible keys. So let's see a variation what's called just generally a mono-alphabetic substitution cipher. And instead of doing a shift by K positions in our alphabet, we define which letter maps to which other letter in advance and tell the other side that mapping and then encrypt in the same way. For example, I list my 26 plain text letters and as the user A, the encryptor, I define that every time I encrypt a letter A it will become as D. Every time I encrypt the letter B it will come out as ciphertext Z and C come out as G. And I can choose a random mapping here. I cannot reuse the letters, so there will be the 26 letters in the output. Then when I want to encrypt a plain text message like bad, B-A-D, B-A-D would become Z-D-L. So with the Caesar cipher it defined that the output ciphertext letters must be in the same order. Here we allow it to be in any order. We're to be chosen by the user. Let's consider that. A different example. So the concept of using this cipher, we have our user A, our two users, user A will choose a mapping. That is, think that there's all the letters I will not write them all, but we'll define a mapping and we will choose our own mapping. A can map to any of the 26 possible values in our English alphabet and the user will choose a value randomly. Anyone want to choose a value? L. I just chose a random character. How many values could I choose from? There were 26 possible letters I could choose. Normally I choose to map to L. It could have been one of the 26 possible letters. I just record that. When I chose this letter L, I could choose from 26 possible values. And now I choose a letter that B can map to. And I choose a letter randomly, but I cannot reuse L. This is what we mean by mono-alphabetic cipher. We use just the one instance of that alphabet. So I choose... It doesn't matter for the example. I choose H, for example. The point is that the number of letters I could choose from was 25. Because I couldn't choose L. It was already chosen before. And in C, I mapped to one of the other 24 characters and I could have chosen from 24. And you start to see the pattern and how it's useful. And when it gets down to the last three characters, there's not many to choose from. So there's three remaining. I haven't used the letters T, A and Q. So for X, I'll choose T. There were three to choose from. For Y, I've got A and Q left over. Let's just use A. There were two to choose from. And for Z, there's only one to choose from. The last remaining letter in alphabet. So before I encrypt, this is not encryption. This is defining the mapping from plain text to ciphertext. The user defines this mapping in advance and tells user B what the mapping is. They share this mapping. So the mapping is in fact the key. And they share it to user B saying whenever you get ciphertext I, it really means plain text E. So user B also knows the mapping. When they want to send a plain text message, they'd simply use that mapping. And since I don't have all 26, let's say if the plain text was the word dead, then the ciphertext for this particular mapping, D would become R, E, I, A, L, and D, R. And I would send this ciphertext to user B. User B, which also knows the mapping, sees R is D, I is E, L is A, and R is D. They get the plain text back. That's how it's used. The comparisons of the Caesar cipher, an important point here is how many possible keys can we choose from? Well, I can use this mapping or I could have chosen a different mapping which would give us a different ciphertext which is equivalent to a different key. How many possible mappings are there? How many possible combinations do we have here? What's the answer? 26 factorial. 26 times 25 times 24. That is, for the first letter I can choose from 26 values. For the second letter, 25. So the number of possible combinations there are 26 times 25. For the next letter, 24. So the number of possible mappings, you just multiply all those numbers together, which is 26 factorial, which is a big number. So this is the number of mappings I have possible. Let's ignore this plain text. We'll use this particular mapping for all of my plain text. So I have a long message like my book that I want to encrypt. Then I would most likely use all of the letters in that case, and therefore the entire mapping would be used. If I have a different plain text, then I would use that same mapping all the time. So all I care about at this stage is how many possible mappings I could choose from. Because what the attacker has to do, in this case in a brute force attack, if we have a large plain text, is try all possible mappings. How many possible mappings are there? 26 factorial. Caesar's cipher had 26 possible mappings. Shift to the left, or shift to the right by 0 positions, or 1 position, or 2 positions. This mono-alphabetic cipher has 26 factorial possible mappings. And a brute force attack, if you're going to try all possible mappings, has 26 factorial worst case performance, 26 factorial divided by 2 average case, which is the average brute force. All of them were on average, you only have to try half, which is about 26 factorial. Let's see what my calculator does. It's about 4 by 10 to the power of 26. That's just a coincidence that it's 10 to the power of 26 divided by 2, 2 by 10 to the power of 26. That is, on average, a brute force attack, if we're going to try all possible substitutions, would take us 2 by 10 to the power of 26 decrypt operations. How fast is your computer? Maybe a tablet or a laptop or a PC. How many decrypt operations could your computer do per second? Let's have a guess. I don't know how fast it is with this mapping, but let's say it's very fast. Let's put a number to it and say that my computer could try, I don't know, 10 to the power of 12, which is a million billion decrypts per second. That is, what, a clock rate 1 gigahertz is 10 to the power of 9. Let's say it could do 1,000 decrypts per clock cycle on your CPU, which is unlikely, but you have many CPUs just as an example. Then how long would the brute force attack take? Let's get our calculator. How many seconds? 26 factorial divided by 2 divided by 10 to the power of 12. 10 to the power of 14 seconds. Convert to minutes. Convert to hours. Convert to days. Convert to years. Convert to centuries. Alright, 63,941 centuries. The point is that with this large key space, our brute force attack is no longer possible because it would take so much time to try all those possible keys. Well, you say my computer is slow, or why not try more computers? Well, let's say I have a computer like at this speed, but I have access to a million of those computers. I use computers from other people. If we speed up by a factor of a million, then the time decreases by a factor of one million. Is it six years or something? I think it becomes 6.3 years. If we speed up by another factor of 1,000, then maybe we can bring it down to days. But now we've exceeded all the possible computers, maybe that we can get access to. So it's very easy to stop a brute force attack, make the key very large, the key space very large. When we look at the real ciphers, we'll return and we'll give some numbers of what's a good key size, such that a brute force attack today is not possible. Caesar cipher, brute force attack, 26 keys. This mono-alphabetic cipher, 26 factorial keys. So secure from that perspective. But it's still easy to break if we look at the frequency of letters. And I cannot go through an example in the class, but we have one on the website which is worth reading through because it takes some steps. I'll let you read through. But I have gone through an example where we start out with a cipher text. We can't do brute force because it would take my computer millions of centuries to do it because my computer is quite slow. But what we can do is we can look at the frequency of letters again. And I'll let you read through, but if you count the frequency of letters, t is the most frequent, z is the next most frequent in this cipher text, and compare it to the expected frequency in the plain text, e being the most frequent and so on. And also, you don't have to just look at the frequency of letters, you can look at the frequency of pairs of letters in English called diagrams. In English, the most frequent pairs of letters are things like t-h-a-n-o-n and so on. The letters q followed by x are very infrequent in English. So the frequency of diagrams in the cipher text and expected frequency in the plain text can also be used to start to guess what is the mapping. And I go through an example which is too hard to go through in class that starts with our cipher text and with a few guesses, you can start trying some letters and replacing them and if it doesn't work, you can go back and try a different letter and after a few attempts, it eventually gets to the plain text, the correct plain text. This example on paper may take a couple of hours. With a computer, it would take a few seconds if you could automate the process. So a brute force attack would take hundreds of centuries but a frequency analysis attack on a mono-alphabetic cipher is possible in terms of seconds. So this cipher is still not secure. What I'll leave for homework is I'll maybe, once I add everyone to the course list, I'll send out an email summarizing some homework, unassessed homework in this case, but read the course website and from it you'll find a link to this example and make sure that you understand the Caesar cipher and mono-alphabetic cipher. Your first quiz will give you some test about them but that first quiz may not be until later in the second week.