 So let's recap on what we've done so far, we've introduced security and we're saying that we need some encryption techniques to provide the mechanisms for securing our computer systems and we're looking at what we call the classical encryption techniques. The very old approaches, the reason for looking at them is that they are easy to demonstrate on a piece of paper in an example and importantly they show the main concepts that we use in encryption and we've gone through two simple cases but the main concepts with which we're trying to go through it, substitution techniques where we take one character from our plain text and substitute it with another one from the alphabet and then we'll get to transposition techniques where we take our characters in the plain text and we rearrange them. We do a transposition. The first substitution technique we looked at is called the Caesar cipher. Very simple. Given our plain text, we have a plain text message we want to encrypt and we're doing all our examples in English that we take our letter and we shift to get the ciphertext, we shift K positions to the right to get the ciphertext letter. The example here is K is 3. If the plain text letter I want to encrypt is D then the output of this encryption will be the letter G and I do that for each letter in plain text message and I'll get a ciphertext message as output. By changing the number of positions I shift I'll get a different ciphertext output and the number of positions I shift is the key in this case. So we went through an example of the Caesar cipher. We saw that there are two problems with the Caesar cipher. First the number of keys possible is very small and it's not hard for an attacker given ciphertext to try and guess all or to guess the correct key. Try them all. A brute force attack. All 25 keys well in theory there are 26 keys because if we shift 27 positions to the right where we wrap around we come back to the letter A. So it's 27 positions is the same as shifting one position. So then in fact 26 keys but one of the keys is not a good key to use because nothing shifts. A shift of zero positions. But another problem with the Caesar cipher we saw is that even without trying all possible keys we can do a intelligent attack where we look at the frequency of letters. So we'll recap on that one as well. And if you did your homework you would have also read about some attacks or an example attack on mono-alphabetic ciphers. So one second task of your homework was to read about that frequency analysis attack on the mono-alphabetic cipher. It's too hard to go through in lecture because it requires a number of manual steps but you read up about that and see how easy it is to attack a mono-alphabetic cipher. What is a mono-alphabetic cipher? Well in fact a Caesar cipher is. I've got some examples here which you don't have in front of you. There's no need to copy down but let's just go through you'll see the idea quickly. This is the green shows us the plain text letters. Not the plain text message but this is illustrating the algorithm for encryption. If we have one of these plain text letters then the output from the encryption will be one of the cipher text letters the blue at the bottom where it's a mapping direct down that is if the plain text letter is A the output will be D. What cipher is this? It's the Caesar cipher and what's the key in this case? Three in that you can think we've shifted three positions to the right to get the output. A becomes D which is the letter three positions to the right and every letter that we get as output M becomes P three positions to the right. So we say the key is three in that case or sometimes we write the key as being a letter where A is zero B is one C and D is the key. So that's the Caesar cipher with a key of three and the second one also the Caesar cipher with a key of four. Okay we've just shifted four positions not so hard. So the Caesar cipher we've gone through in an example and the third one a Caesar cipher with a different key wrapped right around so that A becomes Z. So three instances of the Caesar cipher with different keys so when we encrypt the same plain text with a different key we'll get different cipher text as output. If the word I want to encrypt is Steve with the first key the first letter out will be V but if I use the different key the second key the first letter out would be W. If I use the third key the first letter out would be R. So changing the key will give us generally a different cipher text value. That's the Caesar cipher. What about a general mono-alphabetic cipher? Well we don't shift by K positions to the right. We allow the the encryptor to choose the mapping from letters from plain text to cipher text letters. Any mapping and we'll see I've chosen one here. Let's find it the bottom one in this case. Here's a particular mapping I've chosen. I'm choosing that if I'm going to encrypt the letter A in plain text I'm going to output the letter H. B becomes W and so on. So I choose this particular mapping for a mono-alphabetic cipher and you note the difference from the Caesar cipher. With the Caesar cipher we have the mapping defined by the number of positions we move. Here we effectively have random arrangement of the characters and that needs to be chosen and in fact that arrangement of the characters is the key. So I could have chosen a different arrangement of those characters. If I choose this one I send the cipher text to the the other side and they decrypt using the same arrangement and you can see the encryption is really a mapping from plain text to cipher text. So we map A to H, B to W and so on. I chose that one randomly and we could choose another one. The one at the bottom is just a different mapping from plain text to cipher text letters, which represents a different key. So the mapping is defined by the key in this case. The point with the mono-alphabetic cipher compared to the original Caesar cipher, with the Caesar cipher there are 26 possible mappings, 26 possible keys. With a mono-alphabetic cipher the number of mappings we have available is the number of possible arrangements of 26 letters. Here's one, here's a second and in fact the Caesar cipher is also an instance of the mono-alphabetic cipher. It's another possible arrangement but we have many other arrangements. It's the number of permutations of these 26 letters which we said last week was 26 factorial which we calculated as about 10 to the power of 26. Caesar cipher has 26 keys. This mono-alphabetic cipher has 10 to the power of 26 keys and with 10 to the power of 26 keys an attacker, if they don't know the key and they want to guess the key, they have many to try before they'll find it. So we can make a brute-force attack difficult by increasing the number of keys and that's what the mono-alphabetic cipher does. What's the problem with this mono-alphabetic cipher? Brute-force attack is hard, that's good. What's the problem still if you're trying to attack it? What do we see last week? Frequency of what? The frequency of the letters can be useful to the attacker and let's go back to the slides. We're going to switch between a little bit. I'm just going to go back a few slides just to define the attack methods on the bottom. We've said a brute-force attack involves trying every possible key on the cipher text. What the attacker does, they know the cipher text, they decrypt with one key, if they get the correct plain text they are successful, if they don't they try again with a different key and keep trying. Under the assumption that the attacker can recognize the correct plain text, which we said is normally the case. We'll look at some extreme cases later. But the more intelligent approach to performing an attack is what we call crypt analysis. We try to find the key in shorter time than doing a brute-force attack by exploiting characteristics of the algorithm to try and work out what the key or plain text is and a common one is called frequency analysis. Frequency analysis attacks take advantage of the fact in our English examples that some letters in English occur more frequently than others. What's the most frequent letter, we said? E. E is very common English texts. Therefore the attacker assumes, okay, if E is the most likely letter in the plain text, let's say we're using the key in the middle example here. If we assume E is the most common letter in the plain text, what's the most common letter we expect to see in the cipher text if using this one? We expect to see N the most common in the cipher text. That is, if we have a cipher text, we count the letters and if we see N is the most common letter, then we'll make a guess that N maps back to E because if N is the most common letter in the cipher text, that's likely that it corresponds to the letter E. Now it may not always be the case because we say E is the most common letter in a large set of English texts, but if we take a selected one, E may not be the most common letter, but it's going to be one of the most frequent usually. So it requires a little bit of trial and error in practice. So the problem with the mono-alphabetic cipher is each letter maps to one of the other 26 letters. So if in the plain text, E was the most frequent, then in the cipher text, N will be the most frequent. And the attacker uses that to try and work out what the original key was. And you can use more than just the letters, you can also look at the pairs of letters, I-N-O-N and so on called diagrams and because they occur more frequently than other pairs like Q, Z and T, S and so on. So you can look at the frequency of letters, the frequency of pairs of letters, triples of letters and so on, diagrams, trigrams to work out the key in the plain text without trying all possible keys. So we got to that last week and to see that attack in place, it takes by hand to break a long plain text on a mono-alphabetic cipher with 10 to the power of 26 possible keys. It would take your computer centuries, we calculated. But by hand, maybe with a bit of computer support, it would take maybe half an hour or an hour to break it. So the example that's written up, you can go through and see an attack on the mono-alphabetic cipher. So we need a better cipher. Mono-alphabetic is good against the brute force attack, but not so good against cryptanalysis. So let's keep going. Caesar cipher, easy to break, mono-alphabetic. Brute force is hard, but you can exploit the regularities in the language to try and break it quite easily. So some of the problems that come out with the mono-alphabetic cipher is the frequency of letters in the cipher text maps to the frequency of letters in the plain text. So there are two approaches to try and overcome that. Encrypt multiple letters at a time. Don't map one letter to another letter, but take, say, a pair of letters and map it to a pair of letters, and we'll see that in the next example. Or use multiple cipher alphabets, which we'll call a poly-alphabetic cipher, which we'll see in the upcoming examples. First, let's try another cipher that encrypts multiple letters at a time. This is what I was looking for before. This is an analysis people have done of a large set of English texts, and they counted the letters in those English texts. I think they took a database of law articles, but you can do it on other texts, and they counted the letters, and they found out that the letter E occurs about 12% of the time, 13% of the time. So that's what we say is the most frequent, and then the other frequent letters are T and A, and then what have we got? O and a few others here are the most frequent letters. The least frequent, as you probably guess, is J, Q, X and Z. So that's what we expect in the plain text, so we use that to try and determine the key or plain text if we just have the cipher text. That one maybe doesn't make sense until we've gone through the other ciphers, the next couple. So let's look at a different cipher called the play-fair cipher, and it would just be an example of encrypting multiple letters at a time. With Caesar and the mono-alphabetic, we took one letter, replaced it with another letter. Let's do it on a pair of letters. Quickly, the algorithm, we start by drawing a five-by-five matrix, so we have 25 elements, and we write a key word. Here we don't have a key as a number or just a letter, we have a word. Normally, we choose a word that we can remember, like you choose a password. So we choose a key word, and we write that key word in the matrix from the top row, then the second row, left to right. And once we have finished writing the key word, then we fill in the rest of the matrix with the remaining letters in the alphabet in order. With the rules that we don't repeat any letters, so we're going to have a 25-element matrix. The English alphabet has 26 letters, so we can't fit all the letters from the alphabet in, so we'll do a special case and we'll treat i and j as the same. We'll see why that's okay as we go through an example. So we're going to write the 25 letters of the alphabet into this 5x5 matrix where the order of those letters is determined by the key word and then the alphabet. That's the initialization step. Then we'll encrypt by looking at pairs of letters in our plain text and looking up the matrix to find a pair of letters for the ciphertext, and those steps will become clear from an example. So let's go through an example. This is the one we're going to try. So draw a 5x5 matrix. So we have a plain text message, and again for the examples we keep things short, but often we'll have long plain text, and there it's easier to analyze, but the examples will keep things short. So we have a 5-character plain text message, hello, and the key word, the secret word I've chosen is Thailand. So what I will do is I'll encrypt the word hello using the key word Thailand. I'll get ciphertext, and we should get this ciphertext. We'll go through and see if we get it. We would send that ciphertext to the other person, and that other person must have the key word as well to decrypt. So let's see how we derive that ciphertext. The 5x5 matrix, I'm not going to draw the squares, I'll just write the key word. So we start with our key word in this case. So what I do is I write that in five rows across five rows. So it will start with here, t, h, a, i, and I'll just note that we're going to treat i and j the same. So I'll just write a j down here as well, l, and now the next row, a. We don't repeat letters. We already have an a in our matrix, so we'll skip the second a in Thailand and write n, d. So that's our key word, and now we fill out the rest of the matrix using the alphabet in order, of course not repeating letters. So we already have an a, so the next letter from the alphabet is b. We don't have a b, so we write that in, c, e. We have the d already. We just keep going until we fill out the matrix. This is the initialization step. We're not encrypting yet. Remember we don't repeat letters. So with our 5x5 matrix, 25 elements, treating i and j the same, we have the 25 letters filling up that matrix. The decryptor, the person decrypting would do the same when they initialize the process if they receive ciphertext. So now we want to encrypt. We use this matrix to encrypt, so we take our plain text and do the encryption on the plain text, and there are a few special cases we'll need to deal with. So let's say the plain text, we have hello. What we're going to do is work on pairs at a time. So we'll encrypt he. That's the first pair. But to work on pairs at a time, there are two special cases we need to deal with. When we have a pair, they must be different letters. That is, he are different letters. They are okay. But the next pair, ll, they are the same letter, and the encryption will work if we use the same letters in the pair. So what we do when we notice before we do the encryption is we separate those two letters, ll in this case, and we'll separate them with another special character, and we'll use a letter that's not very common in plain text. What's not very common? Well, a few from the previous one, q, x, a few z. Let's use x. The letter that we choose here must be known by the encryptor and decryptor, okay? And it can be known by the attacker. It's part of the algorithm. We'll choose x, which is common, or common to use as a separator. That is, if we think of this plain text in pairs, the first pair is he. The second pair, ll, we'll separate them by inserting in between an x. Again, we're going to insert in here an x such that we'll have he, lx, lo, because we don't want a pair of the same letters. And the other rule but not necessary now is that we must have an even number of letters to encrypt if we're going to operate on pairs. Well, we do in this case. We started with five letters. We inserted an x, so we now have our six letters or three pairs. So now to encrypt, we operate on a pair at a time. Let's start with he. And the way that we encrypt is we look up those letters in the matrix, he, I lost it for a moment. And the ciphertext that comes out will be the letters on the same row as the current letter and the same column as its partner. That is, let's encrypt, think of, we're dealing with he, let's think of h as the first one. He, the same row as h on the top row and the same column as the partner e will be l. The first output ciphertext letter will be l. He becomes l as output. And the second letter in the output, the same row is e and the same column as h will be d. So the first pair of letters that come out are l and d. Questions so far? Can I repeat? Good. Find the pair in the matrix, the normal case. We find the pair h and e. And in order, let's deal with h first. Find the letter on the same row as h, so it's going to be in the top row and the same column as its partner e. L. And then for e, the same row is e and the same column as h, d. So the pair that comes out is ld. So when we encrypt he with this cipher, this is called the play fair cipher. Out comes ld. I write it as uppercase just to distinguish between the plain text and cipher text. It's common notation to use uppercase for cipher text and lower case for plain text. We don't need to. Just sometimes it's easy to distinguish. So repeat for the next pair, lx. Note it is lx in the next pair, not ll. We inserted the special character. Encrypt lx. And the last pair, lx should be easy, lo will be another special case. Of course, you have the answer on your slide, so work out how to get to it. Encrypt lx. Look at it in the matrix, l and x. Where are they? And look at the same row and the same column. What let them come out? l and x gives us a and z. Okay? Question? Where does lx come from? My plain text message is hello, but if I treated that as is, then the plain text message of ll, the pair ll will not work with this cipher. Our rules won't work. So what we do is we introduce a special letter, let's say x, because x is not very common. So instead of ll, it's lxl. Hence, he, lx, and lx, people see comes az. Lxaz. Lo, what's wrong with lo? L and o are in the same column. So here we treat this as a special case. If our plain text letters are in the same column, then the rule is move down. Where'd we go? Lo, take the letters which are one positioned down, so it will become eu. So when the plain text letters are in the same column, the cipher text letters will be the ones moved down. We're going to wrap around if necessary. If the plain text was oz, then down would be ul. We wrap where necessary. So lo becomes eu. And that's it. The other special case which we don't see in this example is if they're in the same row, then we move to the right. If we're encrypting the letters ha, then it will become ai as output. Move to the right. If they're in the same row, move down in the same column. So we have our cipher text. We send our cipher text to our friend. The friend has the keyword. They must have the secret key. Our friend creates the same matrix, exactly the same procedure. And the friend takes the cipher text letters, splits them into pairs, ld, az, eu and decrypts. And remember with decryption, we essentially use the opposite operations. Now sometimes the opposite are the same. For example, to decrypt ld, you look up l and, where's d? Here. ld, we get as output the same row, the same column, h and e. We then look up az, az. We get x and l. We look up e and u, eu to decrypt. When we encrypt it, we move down to decrypt. We do the opposite. We'll move up. e becomes l, u becomes o. The receiver receives h, e, l, x, l, o. Any questions? When we deal with a pair like h, e, we consider h first and look at the same row as h and get l and then consider e and get the same row as e and we get d. So we must do it in order within the pair. Right. How does the opposite side know that I chose the letter x? I told them. That is, as part of this algorithm, what the encryptor and decryptor must know, they must know we're using a play fair cipher, so that the algorithm, they must know the keyword and that secret. They both know Thailand. And part of the algorithm is what are the special letters to use? So the algorithms, we would specify as saying, whenever there's a pair of same letters like l, l, always insert the letter x. It would be agreed upon in advance by the encryptor and decryptor. So in the quizzes or the exam, if you see a question about play fair cipher, you can assume, let's say, use x, unless I say otherwise. What if the plain text is not an even number of characters? If we had Steve, for example, s-t-e-v-e, five letters, first pair, s-t, second pair, e-v, last pair, e-x. We'll pad out with a special character. So we'd use the same special character to make a pair. So it'll be s-t-e-v-e-x that we encrypt. What if we want to encrypt l-e, move down, and it becomes e-o? It's still okay in that case. l-e becomes e-o. Is it necessary to use x as a special letter? It's necessary that both the encryptor and decryptor agree upon a value. So it's not necessary to use x. We could use j, z, some other letter, as long as both sides know the letter. Now, why do we choose x, or why would we choose one particular letter? Well, let's see. The receiver, when they receive the ciphertext, and the ciphertext in this case was l-d-a-z-e-u, they decrypt and they get plain text of h-e-l-x-l-o. This is what the receiver gets when they decrypt. What's the message that the receiver thinks they've got? Well, it's not a word that makes sense, but they know that here we see an x in between two letters which are the same. If we took the x out, we would see a word that makes sense. So that's why we usually choose a letter which is not common here to insert as a special character, such that when we do insert it, we don't get a word that also makes sense in English. So remember, when we're communicating, we're always sending messages that make sense to the sender and receiver, English words in this case. So the idea of choosing a special character, choose one that when we add it will not cause confusion when we decrypt. You will not make a word if we add that. So x is a good one there. Any other questions on PlayFair? If we have an even number of letters, then we don't need to insert. We only add our special character if one of the pairs is the same letter or we have an even number, an odd number of letters at the end. We pad at the end. That's the only two reasons. The best way for you to be experts on these simple ciphers is to practice so the upcoming quizzes will give you usually ciphertext and you decrypt. So that will give you practice. Last thing on the PlayFair cipher. One thing that we notice is that if we encrypt the same letter in the plaintext, we don't necessarily get the same letter in the ciphertext. Here we have two Ls in the plaintext. The first L becomes A. The second instance of L becomes E. That wasn't the case with Caesar or Monoalphabetic. That was one of their problems. That is, now if the attacker counts the number of As and finds that's the most frequent, it doesn't necessarily mean it maps back to the most frequent letter in the plaintext. So that's the benefit of the PlayFair cipher and operating on two letters at the same time. We no longer have this one-to-one mapping of plaintext letter to ciphertext letter and that makes it harder to attack, harder to use frequency analysis. There's seats down the front if you need. Make sure you can see the screen. Any questions on PlayFair cipher? If the plaintext is an odd number of characters with no letters the same then we'll add an x at the end. We'll pad out the end and make it an even number at a special character. If the pair was H and L we would move to the right and wrap around. I'll show you on the screen. Last special case, what if, what do we say, the pair was H and L. I don't know why we'd have H and L. But if the pair was H and L, move to the right, A and wrap around, T. Same on columns, we wrap around where necessary. So if we encrypted H and L it would come out as A and T. The pair would be A, T. It turns out that the PlayFair cipher is better than the mono-alphabetic. We always must end up with an even number of characters. So once we do our insertion of special characters we still may have to pad at the end. So if we have end up with an odd number of characters always add an x at the end. So we do our insertion of special characters and if we still have one character left over then add the special character at the end. We must have pairs always. And it's a common thing in other ciphers as well. We'll often operate on a fixed length of characters or bits. So we may encrypt 64 bits. If we only have 63 bits to encrypt we'll add some special bit at the end which would be agreed upon. We'll see that in other real ciphers. It's called padding. Is it breakable from the brute force attack? Then the attacker needs to guess the keyword. If we choose a word which is say from a dictionary like a word which is known then the attacker would need to try all possible words in a dictionary. How many words in a dictionary? Millions? Not quite millions in many cases. Most English dictionaries are hundreds of thousands. Maybe a few million words. And other dictionaries of other languages or you add special words. Maybe hundreds of thousands, millions, tens of millions is maybe the upper limit of the number of words. Tens of millions isn't many to try if you have a computer trying it. It won't take long to try. Tens of millions of possible keywords. So that's one limitation. If we do use a word from a dictionary, a known word then it's possible for the attacker to guess that eventually. So you could use a random keyword. What's the problem with me choosing a random keyword? Hard to remember. And I must tell the other person, let's use this random keyword. So the longer it is, the less structured it is, the harder it is to use. And you know that from passwords. If I require you for our login system to use 20 character passwords which must be random, you won't remember it. And you'll write it down on a piece of paper and the attacker will come and read it from your piece of paper. So that's another issue that comes up. We'll talk more about how to share the keys and keywords in a later topic. From a crypt analysis approach, play fare is better than the mono-alphabetic cipher because we operate on pairs but people have done analysis and find out if you look at diagrams, the frequency of pairs of letters, it's still possible to attack. And it's still quite easy with respect to today's ciphers to attack in terms of seconds to program an attack on the play fare cipher. In using diagrams, that is pairs of letters, the frequency of them, trigrams, triples of letters and even expected words when you have a long plain text. So let's keep moving. Let's look at another cipher. And we'll return to really something based on the Caesar cipher. With a mono-alphabetic cipher, the output could be only one of the 26 letters for each position. Here we allow repetitions of the alphabet, multiple alphabets. And it's best explained again from example. There are different variations of this. The ones that we'll go through are called vision air cipher. And it really extends upon the Caesar cipher. And then a small extension of that to get the one time pad. The Vernum cipher we'll not look at in this course. Vision air cipher. Really we use a set of 26 general Caesar ciphers. Remember the Caesar cipher, shift by K positions to the right. Where K is the secret key. Here what we'll do for every letter of plain text we want to encrypt, we'll shift by K positions to the right. Where the value of K is determined by a keyword. So with the Caesar cipher, K was fixed for every letter. When I encrypt H, shift by three positions to the right. When I encrypt E, shift by three positions to the right. L, three positions and so on. Hello. With the vision air cipher, when I encrypt H, I can maybe shift by four positions to the right. When I encrypt E, shift by seven positions to the right. We'll use a different key for each letter. The example is my plain text internet technologies. And the keyword that I choose, so it's no longer a single value, we're going to have a keyword. Serent horn. And what we need to do, so I'll choose a word that I can remember. And I'll tell my friend, let's use the keyword serent horn. And when we have a plain text which is longer than the keyword, which is common. Usually we'd like a short keyword, so it's easy for me to tell them. But we'll often have a long message. So when we have a long plain text longer than the keyword, we will repeat the keyword. We'll repeat the letters so that the key is the same length as the plain text. And the way that we encrypt is for each plain text letter, we use the letter in the key and we use the Caesar cipher there. That is, plain text letter I using key S. And if you check, we will in a moment, we get the cipher text letter A. Check that and we'll show you the, we'll do the first few letters. I'll not do them all, but we'll just do a first few and the key and we'd keep going. What we do is we use the Caesar cipher where the plain text letter is I and the key is S. That is, take the letter I and shift by S positions to the right. Can anyone remember what is S? Let me bring it up the notes from last week. Take the letter I, shift by S positions means take the letter I, which is letter eight. We're starting at zero. Shift by 18 positions to the right. You can try on there. Take I, move 18 positions to the right. What do you get? A. Your answer is on the slides. You can see I is the number eight or the ninth letter starting from zero. So I is eight. S is 18. We add them together. Eight plus 18, 26. Mod by 26, we get zero. Letter zero is A. Remember, Caesar cipher can be expressed as an equation plain text plus key mod by 26. The mod by 26 implements the wraparound or you can check manually. So I is really eight. S is 18. So what we get is eight. If we have P plus K, we get 26. And the Caesar cipher P plus K mod 26. 26 mod 26 gives us zero, which is the cipher text letter A. Try for N and I. N is 13. N is, I'll write it here. N is 13. I is, of course, eight. C is 19. R is 17. Find the cipher text for the next two plain text values. We're just using the Caesar cipher, but the key is changing for every plain text letter with a script. If you wanted to use Thai language here, you could. You were just, instead of mod 26, you would mod by the set of characters in the alphabet you choose. It works for any language as long as we define the alphabet. Okay? So rather than shifting to the right and having to count, okay, 18 positions to the right of I, sometimes easier just to do the mathematical approach. Eight plus 18 is 26. Mod 26, because we have 26 characters, gives us zero. Output is A. Second letter. 13 plus 8 is 21. 21 mod 26 is still 21. 21 is V. 21 mod 26 is still 21, which gives us the letter V. The last one, 19 plus 17, we get 36. Mod 26 gives us 10. Letter 10, I never remember. K. And keep going for the rest of the letters. Questions on how to encrypt with this vision air cipher? Did we get AVK and the rest of the answers are there? So we take our plain text. We have a keyword, which is known by the sender receiver. And we must make sure the key is as long as the plain text, so we repeat the keyword when necessary. And then we simply use the Caesar cipher, where the key determines the key of the Caesar cipher. If the plain text had another letter at the end, then with the next key value would be S. If there were another two letters at the end of the plain text, the next key values would be SI. So we just keep writing the keyword such that the key becomes the same length as the plain text. Look at the letter, let's say, the letter E in the plain text. See what the letter E in plain text becomes in the cipher text. Look at all the instances of E in the plain text and see what it becomes in the cipher text. What's the first E become? It becomes M. The second E becomes L. The third E, R. We see, just look at the E's. E becomes M. The second instance becomes L. E becomes R. And this one becomes V. So here our common letter in our plain text, E, is mapping to different letters in the cipher text. So we're no longer have this frequency of letters in the cipher text which matches the frequency of letters in the plain text. The most frequent letter E is mapping to different letters. The letters it maps to depends upon the keyword. So that's a good characteristic of this cipher. When we saw the mono-alphabetic cipher, E always mapped to the same letter. And that makes it easier for the attacker. So what the vision air cipher is, it distributes the statistics of the frequency of letters in the cipher text. And that's a good feature. What about a brute force attack? What would the attacker need to do to do a brute force attack? What's secret? The keyword is secret. The attacker would need to guess the keyword. Same with play fair cipher. So we need to make it such that it's hard to guess. In terms of the frequency of letters and doing a frequency analysis attack, it's much, much harder because letters in the plain text don't always map to the same letter in the cipher text. It's almost, yes. Right, how long should the key be and what should the structure of the key be? Let's assume in practice our plain text is a long message. I don't know, a document. A text file we want to send with thousands of characters. Then I think we know in terms of passwords we normally, if I want to share a secret with someone, I would like to choose for convenience a short value for the convenience purpose and a short value that I know or can remember. Not random, maybe a dictionary word or a variation. But from the security perspective, the key should be long and so that a brute force attack is not possible, it should be unstructured. That is random, preferably. So we have two conflicting requirements. Security, say the key is long and random. Convenience, the key should be short and structured. And that's a problem that we have always in security. And if we do choose a short key and it's structured, it turns out that frequency analysis attacks are still possible, much, much harder than the previous ciphers. But there are still attacks known that by looking for a long plain text which is encrypted, you can start to look at there still are repetitions in the output cipher text. So if we keep repeating the key, we may get the same letter E in the plain text always becoming the same letter in the output cipher text. It's hard to describe, I don't recall the details of the attack, but there are some algorithms that will go through. And first what they do is they estimate the length of the keyword and then they go through and try and break it from a set of, in the same way that they break mono-alphabetic ciphers. And again, with a computer support it's not hard to break the vision air cipher. The problem is that the keyword is repeating when we have a long plain text and it's structured, like a dictionary word. So like we said, what do we do? You don't want a short keyword, make it longer? How long? How long should the keyword be? Equal to the plain text length. So if I have a plain text length of a thousand characters, choose a keyword which is a thousand characters. That's the best case. From a security perspective. And don't choose a keyword from a dictionary, choose a keyword which is random. Okay? So generate a random keyword. And that leads to the one-time pat. The last classical cipher we look at in this substitution techniques is called the one-time pad. And it's essentially the same as the vision air cipher. But the rule with the one-time pad which differs is that when you choose your keyword it must be random. You can't choose a word you know. You'd usually use a computer to generate a random set of letters. And it must be as long as the plain text. If you use the vision air cipher with a keyword which is random at as long as the plain text it becomes what we call the one-time pad. And it becomes unbreakable. So this is an important cipher. With the one-time pad if we use it correctly it's unbreakable. It's very easy cipher. And based on the cipher. And very secure. What do we mean by unbreakable? Unbreakable here we mean with even a brute force attack we'll not be able to break the cipher. Okay? So not only are there no frequency analysis attacks possible a brute force attack on a one-time pad will not be able to determine the correct plain text. So it's unbreakable in that it's what we say it's provably secure. We can prove that there's no way for an attacker given just the cipher text to find the key or the plain text. And in fact it's the only known cipher that has this property. Of all the ciphers even the ones we use in practice today this is the best one known. It provides unconditional security. When we use the one-time pad the cipher text that we get as output has no relationship to the plain text. The cipher text is random whereas the plain text is structured. And if the cipher text is random there's no way for the attacker to work backwards and find the structure of the plain text. And such the brute force attack is not possible with the one-time pad if we have two potential plain text messages that is the attacker takes the cipher text they decrypt it with one key. They try and guess the key. They may decrypt with another possible key and they may get two plain text messages which make sense. Say in English and therefore there's no way for the attacker to identify the correct message. So we also say it's unbreakable with respect to a brute force attack. An example of that because it's time consuming to go through an encrypt and we need to use long examples to demonstrate. This is from the textbook. And the attacker has intercepted a cipher text. Here it is at the top this A, N, K, Y and so on. So they have this cipher text and they're going to try a brute force attack on the cipher text. So what they do is they try all possible keys. How many possible keys are there? Well remember with a one-time pad the user would have chosen a keyword the same length as the plain text. Can someone count them for me? How many letters? Let's say there's 30 letters there. So they choose a key from 30 letters and each letter can be one of 26. So we have 26 to the power of 30 possible keys there which is many. The longer the plain text the more possible keywords. So what the attacker must do is try all possible keys which is if it's 26 to the power of 30 then there are many possible keys and may not be able to complete in reasonable time. But even if they had a computer that could calculate for all possible keys, here are two examples where they decrypted the cipher text using this key, key one, some random key, and they decrypted the same cipher text with a different key and they got the plain text shown below each of those. Which plain text is the original plain text? Plain text one or plain text two? You're the attacker, you've decrypted this cipher text with two different keys, you get two different plain text messages. Now you must choose which one is the correct plain text. Does anyone know which one is the original plain text? Why don't you know? Why don't you know which one is the correct one? Because they both make sense. Plain text one makes sense as an English phrase, it's from a game or a book and plain text two also makes sense. And the attacker, what can they do? They could guess but that's no good especially because as you try other keys you may find others that make sense. So there's no way for the attacker to know for sure which one was the original plain text. Even if they try all possible keys, the keys may produce plain text messages that make sense and the attacker cannot determine the correct plain text message with certainty. And that's why we say it's unconditionally secure even if we do a brute force attack, the attacker cannot find the correct plain text. Questions on the one-time pad? After the break we'll look at the definitions of unconditionally secure again but in the last couple of minutes, any questions? Incripted message. Right, so the weakest link of this is how to deliver the key safely to the other side. So this is secure, it's good when we encrypt but it requires the user, the sender and receiver will say the sender to choose a long random key. Let's say I want to encrypt a one megabyte text file, word document, which is made up of letters. So I must choose a key which is also one megabytes in length. Before I communicate with the other side I must somehow get that one megabyte key to the other side and it must be done securely such that no one else can intercept and get that key. So we have a couple of problems there. A, I must generate a large random key and there are some problems with that. B, I must transfer this large key across a network if we're communicating across a network that involves some overhead and C, how do I get this large random key to the other person without the attacker discovering that key? What do I do? I could write it down on a piece of paper or print it out and when I see that person I give it to them secretly in a room, maybe that works but if they're on the other side of the world we want to use the internet to communicate, that's not so convenient to go visit them, fly over to them, give them the key, come back and then send them the message. So the convenience of distributing that key is a problem. What can we do to send a key across a network such that if someone intercepts it they cannot read the key? We encrypt the key. We think the key is plain text. So we have the key as our message we want to send to someone, we encrypt it using which key? We encrypt that key word with another key and send it but that other key, how do I get it to the other person first? We have the same problem. We need to encrypt that other key and send it to them first but how do we encrypt that? Another key. So that doesn't work because we need to somehow get that key to the other side and if we use the same approach we have the same problem but another long key that needs to be distributed. So this problem of key distribution is real up until now we've assumed magically that both sides have the same key. In later topics we will see that there are some cryptographic algorithms that will do it in a much more convenient way which don't rely on this problem of using the same key. There are ways to do it. For now let's assume that they've passed it to them securely on a piece of paper which I think you recognise is inconvenient. Let's stop what we'll do after the break is we'll return to the definitions of unconditionally secure, maybe another example of one time pad and look at transposition cyphers.