 So, the topic on Tuesday just introduced some concepts and terminology for computer and network security. Now we're going to start looking at the, well, and one thing that we mentioned from Tuesday, we distinguished between security services. So we listed six important services like authentication, data confidentiality. Let's go through them all. So authentication, making sure you know who you're communicating with, data confidentiality, making sure your messages are secret, integrity, making sure your messages don't get modified. Non-repudiation, making sure no one can deny communications that have happened in the past. Access control, controlling who can access particular resources. So there's five, and number six, the service that is often used is availability, making sure your system, your network resources are available to the normal users. So there are six main services. So when we build a computer system, a computer network, and we want to provide security, we look at which of those six services do we want to provide? Maybe one or more. Okay? Maybe all in some cases. And then to provide them, we use mechanisms, security mechanisms. And that's what we're starting to go through now. And most of this course is about the different mechanisms we have available. And the mechanisms should be able to be used such that the attacks will be unsuccessful. And we listed some attacks. So the most mechanisms are based upon encryption or cryptography. So we'll spend some lectures talking about what is cryptography and what are the different encryption techniques. And this lecture, what we start with is to first introduce some concepts for what is cryptography and then look at, we look at classical encryption techniques, means old, 2,000 years old, up until maybe 100 years old, 50 years old. So encryption techniques which are in most cases no longer considered secure, but the development of them and the principles behind them are still applied in the ones that we use today. So they're much easier to discuss than the actual ones we use today. So let's go through and explain encryption. And one of the most common reasons we use encryption is for confidentiality. Remember I send a message from A to B. The message is intended for B to read, but no one else should be able to read that message. We want the data in that message to be confidential. It's very hard to stop other people from being able to intercept on some network and not be able to receive the message. So when I send a message from my laptop to some server, if I want it to be confidential, it's very hard to control who is between my laptop and the server and who may intercept that message. The most obvious place where any of you could intercept is as I send from my laptop to the server, my laptop sends via a wireless link to this access point. And most of you did in an assignment last semester, you had two laptops sending a data to each other and you had a third laptop that monitored and captured the file that was sent. So you've done it. You can intercept what we send across a wireless LAN. So it's very easy to do that. So if I want to send data to a destination and for that data to be confidential, we use encryption. We encrypt the data first. So we encrypt the data so that anyone can still intercept the message and see the encrypted data, but only the authorized users, the receiver, should be able to decrypt and see the original data. So I take my original message, encrypt it, send it, someone can intercept and see the encrypted message, but they shouldn't be able to see the original message. Only the person who that data is intended to should be able to receive and see the original message. So encryption is used for confidentiality for sending across networks and also storing, for example, storing files on a computer system. If you want to save a file on your hard disk and maybe your computer is going to be accessed by multiple people, one way to make sure that that data is confidential is to encrypt the file. So that's not sending across a network, that's just storage, but the same technique is used, encryption. We'll focus mainly on examples based upon sending across a network. So a model for using encryption. Let's say we have two users that want to communicate, A wants to send some data to B and we have some other user, the malicious user or attacker, user C, and we assume anything that's sent from A to B can be seen by C. So anything that comes out of A's computer and goes into B's computer that can be seen and user C can see that, the malicious user. We want to make sure that they cannot see the original message. The way we do that is using encryption. And this introduces some notation. The original message which we start with is, we refer to as plain text. So we take, it doesn't have to be text, but it's just the name of the original message. It can be a word file, an image, a video, a stream of packets, but our plain text is the original data. User A encrypts that data where the encryption block is some algorithm to transform that data into what we'll call cipher text. So we take plain text, encrypt it and get cipher text as an output. But the other input to the encryption is some key. So there's two inputs. We need a key and the plain text. Then we apply some algorithm and as an output we get cipher text. We will spend some time looking at different algorithms that we can apply here, but let's keep going. We send the cipher text to B, which means user C can see the cipher text. Whatever the cipher text is, we assume C knows the value. At B, we apply a different, or a decryption algorithm, usually different from encryption but related. It takes cipher text as input, another key which may be the same or may be different from the original one, but a key. And if everything's designed correctly, the decryption algorithm will produce the original plain text as output. That's the goal. So we want to get a message from A to B, the plain text. A takes that message but encrypts it and sends the cipher text and B decrypts the cipher text to get the original plain text. So we need to design these algorithms such that if we have cipher text as input and an appropriate key, we'll get the original plain text as output. We must get the same plain text, otherwise it won't work. From the attacker's perspective, they know the cipher text but don't know the decryption key and the algorithm should be designed such that if the attacker doesn't know the decryption key, even if they know the cipher text, even if they know the algorithm, without the decryption key, they should not be able to obtain the plain text. That's what we require for encryption to work. The attacker's goal is to find the plain text but if they don't have the key and we have a good encryption algorithm, it should be too hard for them to find the plain text. Of course, another goal is to find the key because if you find key 2, if the attacker finds key 2, then they can decrypt and immediately get the plain text. So we need to look at what algorithms satisfy these properties such that this will work. The property is such that decryption of the cipher text produces the original plain text and it will only be possible if we use the key. Without the key, we cannot decrypt and key 1 and key 2 are related. In some systems called symmetric key cryptography, they are the same, key 1 and key 2 are identical. So A has a key and somehow gives it to B. In other systems we'll see later, public key cryptography, they are different but still related. So that's our model that we'll use for encryption and it applies in most cryptography systems or simplified if we remove the users. So we need to look at especially what are the algorithms for encryption and the related decryption. This topic will and the next several topics will as well, focus on those boxes. This lists some of that terminology so plain text, our original message, cipher text is the encrypted or coded if we talk about we encode a message, the output. Encryption is the process of converting that plain text to cipher text, decryption is the opposite, cipher text back to plain text. Sometimes called encyphering and deciphering. We have a key which should be in most cases known only to some entities, for example the sender and receiver. If the attacker knows the decryption key then our system will not be secure. A cipher is a particular algorithm. So what is this encryption algorithm? There are many to choose from, some better than others. We call those algorithms ciphers. Cryptography is the study of those algorithms. Designing ciphers is part of cryptography. From the attacker's perspective, their goal is to take the cipher text and find the plain text without using the key. And that is considered or the techniques for doing that is called cryptanalysis. So that's like breaking the cipher. Cryptography is the study of building the ciphers, cryptanalysis is the process of breaking the cipher, defeating it. And cryptology is the combination of those two areas. So some requirements for this to work, for the algorithms, for our system to be secure. And this is the requirements, although I haven't introduced it, it's a bit out of order, but I say for symmetric encryption, which is a case when key one and key two are identical. We'll see some other case or another case later, but the case where the both keys are the same value, but still secret. Then we need a strong encryption algorithm. So for this to work, for the attacker not to be able to get the plain text, our encryption algorithm needs to be strong. What does strong mean? It means it should be almost or practically impossible for the attacker if they know the algorithm and they know the cipher text, the attacker should not be able to get the key or the plain text. If that's true, then we'd say it's a strong algorithm. So if the attacker knows what was used in these blocks, they know the cipher text that we consider the blocks to be strong if the attacker cannot find the plain text or find the key. If we find the key, we can immediately find the plain text. It's not strong if it's easy for the attacker to do that. Later we'll look at some more specific measures of strength and security. The other requirement is that for symmetric key encryption, both send and receive and know the secret key. We call it a secret key because no one else should know that value or whoever knows that value can decrypt. So if we don't want others to be able to decrypt, then we cannot let them know the secret key. We must keep it secret between the entities communicating. So they must keep it secret. We assume that. If I ask you in an assignment or homework to generate a key and pass it to another student and exchange some information securely and then you post that key on some website or tell everyone sending in an email, then the system is not considered secure because it's no longer secret. It needs to be secret between the entities communicating. Some related assumptions for that is that the algorithm, the cipher, we normally assume is known. That is the people who encrypt and decrypt must know the algorithm. They must know what algorithm they're using. And normally we assume everyone else knows the algorithm, the design of the algorithm and which algorithm is being used, including the attacker. The algorithm is normally not kept secret. So the attacker, we assume, knows the algorithm or the cipher. Keeping algorithm secret is difficult because in practice you need to implement the algorithm in software or in hardware and someone needs to implement it unless it's a very simple algorithm and few people are using it, then usually it's hard to keep the algorithm secret. And there are other reasons for making algorithms public. People can test them and see if they're secure or not. So normally the algorithm or the cipher is known, we say it's public. The other thing is we need some way to distribute the keys between the users. I am user A. I want to use encryption to have a secure communication with the Facebook web server. I'm using HTTPS to encrypt my data sent between my web browser and web server. If we're using symmetric key encryption, both my computer, my browser and the Facebook web server, they need the same key. In symmetric key encryption, these two values are the same because when I encrypt with my key and send the cipher text to Facebook, they must be able to decrypt and they need the key to decrypt. And the question arises, how do I get my key to the Facebook web server? Any suggestions? How do you get a secret to the Facebook web server or any web server? Yeah? Comment? Question? How do you get a secret to someone else? I'm sure everyone exchanges secrets on a regular basis. How do you get secrets to another person? Type on the keyboard and send. I type a secret key out and I send it in the email to the Facebook administrator and they program it into their server. But just someone intercepted my email and now they know the key. The key is no longer secret. So that's not a secret. How do you get a secret to someone else? I can tell them, I can walk to that person and maybe make sure no one's watching and write it down on a piece of paper and give them a piece of paper. We could do that. We can keep that reasonably secure, but not so easy if we want to communicate on the internet. I mean, I cannot go to the Facebook in the US and give them the actual key on a piece of paper. So how do we give a key to someone? Encrypt it. Encrypt the key, but we use another key to encrypt it. How do we get that other key to them so they can decrypt? You see the problem, it goes round and round. If I want to send that key securely across the internet using the same encryption approach, well, I need another key to encrypt it with. And how do I get that other key across the internet? It becomes a difficult problem. We have a topic on key management is about different ways to distribute keys. And the challenges involved, we'll see that there are some, a technique called public key cryptography, which is commonly used. But for now, we're assuming somehow magically we can distribute keys. There are other, there are different ways to do it. We will see that the protocols for doing it in a later topic. For this to work, we assume that somehow we got a key between both entities. We have some secure channel, I mean some secure means for getting the key from A to B. Whether it's sent by the post, sent by personal delivery, or more practical is to use different encryption techniques, which don't require two keys which are the same to encrypt. Okay, so that's what we require. What we spend this and most of the next few topics on is, how do the encryption algorithms work? And many encryption algorithms use two basic operations. So we take plain text and a key is input, we produce ciphertext as output. How do we produce that ciphertext? Well, we modify the plain text somehow. We perform some operations on it. The two main operations used, which provide some form of security, substitution and transposition, and they're very simple. Substitution, you take one element in your plain text and replace it with some other possible element. The best example is, let's say our plain text is some English message, subset of characters, H-E-L-L-O, hello, that's our plain text. Substitution would be taking say the letter H, one of the elements and replacing it with another letter from the alphabet. Let's say I take H and replace it with the letter T. So that's a substitution operation and that should follow some rule. So we don't just replace it with any element, the algorithm specifies what elements you replace them with. But that's the operation. Transposition is even easier, rearrange those elements. You've got H-E-L-L-O as the plain text. Transposition is rearranged, those five letters. So maybe you get L-E-O-H-L, just reordering of the elements, reordering of the letters. The other two main operations we'll see in most or in many ciphers, both the classical ones and in the real ones, which are used today. Except we don't just use them once. What we do to add more security is to repeat them, do a substitution, maybe do a transposition, a rearrangement, and then do another substitution and another transposition. And we'll see the strengths of that with some examples. When we combine them in multiple stages, we get what's called product systems. Most ciphers or many ciphers use this concept. Not all though. The other part is the key. So our algorithm takes plain text and a key as input, performs some operations and produces ciphertext as output. What about the keys used? Well, there are two main types of cryptography. There's symmetric key cryptography and public key or asymmetric key cryptography. And the main difference, symmetric key cryptography, back to our picture, key one and key two are identical. They are the same. So the symmetry is in the keys. Both use the same value. And in public key cryptography or asymmetric, key one and key two are different values, but related. There's some relationship between them. So symmetric key cryptography we're going to focus on first. It's been used for the last one or 2,000 years, up to the last say 50 or 60 years. And then this new technique, public key cryptography, was designed and developed and has been used in the last, as well as symmetric key cryptography in the last 50 odd years. So we'll start with symmetric key cryptography. They have advantages and disadvantages. The other names that you'll hear for symmetric key cryptography include single key, crypto, secret key, because the key that we use must be kept secret. Shared key, because it's a key that's shared amongst the two entities and conventional. And public key cryptography also called asymmetric cryptography. Processing of plain text. With symmetric key cryptography, some often we distinguish between how much plain text do we process at a time. We'll see our algorithms take a chunk of plain text, encrypt it, and then move on to the next chunk of plain text and encrypt it. So if you have a one megabyte image that you want to encrypt, you don't encrypt it all at once, you encrypt chunks at a time. How big are the chunks? Well, it depends upon the cipher used. We can talk about block ciphers, which process one block at a time. The block we'll see is usually 64 bits, maybe 128 bits. Stream ciphers usually process one element at a time, a bit or more commonly a byte at a time. So it's just different size of chunks that they process at a time. It leads to differences in the complexity of the algorithms and how fast they can run in real time. Stream ciphers are usually faster. We'll return to them later. We'll focus mainly on block ciphers and one of the later topics we'll mention, stream ciphers. This summarizes what we know so far. But I know all of you like mathematics, so we've introduced some mathematical notation that we'll use in our examples and in our slides. Just to keep things short, we say that plaintext is p. We apply some function on the plaintext, e. Think of the encryption algorithm as a function. That function takes two inputs, the plaintext and the key. So p and k are inputs to the encryption function. And the output of the encryption function is the ciphertext c, and that's shown here. C equals e of k and p. It's just a shorthand way to write what we're doing. And d is the decryption function and the plaintext equals the decryption of ciphertext c using key k, where with symmetric key encryption, these two values of the key are identical. They're the same. The rest is what we've said before. Before we go into the algorithms, how can we attack the algorithms? Actually, let's go to the algorithms. We'll come back to how we attack. After we see a few examples, let's go and look at some very, very basic encryption algorithms, some ciphers. We call them classical ciphers because they're the ones which were first developed and no longer used, except maybe one, no longer considered secure for our current day computer systems. First, we'll look at techniques which use substitution, replacing one element with some other element. And then we'll look at a couple of transposition techniques which rearrange the elements. That's, steganography is something different. And we'll use some examples to show them. Classical substitution ciphers. We think of the plaintext as made up of letters. We will use English language as the examples, but in general, it can be any language. Any plaintext that we can represent with a particular character set. We can convert any character set into binary if we want to. For example, if we have English characters and some punctuation marks, we can use ASCII encoding to convert those characters to binary and operate in binary if needed. The examples we'll use are all using English alphabet. So let's, first example, Caesar cipher. Earlier known cipher, used about 2,000 years ago. The first, or the simple version is, we start with a set of characters, our plaintext, and save from a particular character set. In this case, the character set is the set of 26 lowercase English letters. And we take the first letter of the plaintext and we replace it with another letter from the character set where the rule is, shift three positions to the right. So if the input plaintext is the letter A, then the output ciphertext will be the letter D because D is three positions to the right. And if the input letter is N, the output would be Q. If the input is Z, the output is, we wrap around and we get to C. So we have this wrap around at the end. That's the Caesar cipher. Shift to the right, to encrypt, shift to the right by three positions. We can generalize that to, instead of shifting to the right by three positions, shift to the right by K positions. So a generalized Caesar cipher, which covers the specific case. So in general, we take our letter and shift to get the ciphertext. We shift to the right by K positions, wrapping around at the end where necessary. And we can express that as mathematical function or as an equation, the encryption process. Assume now each letter is assigned a number. So instead of A, B, C, think of zero, one, two. So starting from zero, so going up to 25. Z is 25. Then the encryption process, we take the input plaintext letter, P, and the number of positions we shift by is K. So the original Caesar cipher was three, K is three. The input plaintext letter, P plus three, mod 26 is used to consider the case when we wrap around. But let's forget that for a moment. But if letter A is the number zero, if K is three, then P plus K, we get three as the output. And the ciphertext is the value three, which is D. So this equation just expresses our operation mathematically and the decryption we'll see. The easiest cipher you'll see in this course. Some examples, just to make sure everyone's awake. I know everyone's find this easy, so we'll start with decryption. Let's say the ciphertext K-H-O-O-R, and the key is D, where remember we can map out each letter to a number. So a key of D means really a key of three, a shift of three positions. What's the plaintext? Find the answer. Ciphertext is K-H-O-O-R, what's the key is D, what is the plaintext? And just to make it a bit easier for those who can't remember the alphabet or the ordering, that's the mapping of letters to numbers. P, a C, ciphertext, key. Anyone? Hello. Okay. Any problems? Is it a word? Yes, it's a word. I'm not that mean to give a plaintext which doesn't make sense. Although in the IT security class this morning I made a mistake and it didn't make sense. So sometimes I'm wrong. So with decryption you go backwards. With encryption we shift to the right. With decryption we take the ciphertext and to get the original plaintext we must shift to the left. That's the concept here. So the letter K, the key of D means shift three positions. D is three. So if we start at the letter K, that's the ciphertext. Therefore the plaintext must have been the letter three positions to the left, H in this case. Because if you consider plaintext, if we had H as the plaintext and we encrypted it by moving three letters to the right we would have got K. When we decrypt we must get the same plaintext as the original. So we get H as the first letter and then ciphertext H, seven. Go back three positions to E and you'll get hollow. Nothing complex about that. Now just some notation that I'll use. Sometimes we're assuming in this case the character set is just the set of 26 lowercase English letters. No spaces, no punctuation, no uppercase. Although note that often we write the ciphertext as uppercase. Our characters are only in lowercase except sometimes on the slides especially in some examples I'll write plaintext as lowercase to show that's plaintext and ciphertext as uppercase. But that's just to distinguish between the ciphertext and plaintext. But really in this example we don't convert from lower to uppercase the output is must be from the same character set as the input. Let's look at it mathematically. Okay, everyone was fast at that. So one more. Another piece of ciphertext. No, let's try plaintext even easier. Okay? But try and do it using the equation. Not just using the, I know it's very easy using the shift but just look at the equation for encryption. Make sure you can remember how mod works in the alphabet. Yep, it's here somewhere. So the plaintext is Steve, the key is I, eight. Just encrypt that. But think of it from the number's perspective. The mathematical perspective. Where the equation that's on the slide is c equals p plus k mod, all right, the full mod, mod 26 was the first ciphertext letter. Let's have a look. Sorry, it was to go back in a moment. S, so we can think of these as numbers now. So let's try, we have s, t, our plaintext, s is 18, t is 19, e is what, five, four. Go the right way, four. We start at zero. Our plaintext, but expressed as numbers as well. Our key, which was I, which is eight. So the equation, we just add the two numbers together, and mod by 26. So eight plus 18 is 26, eight plus 18 is 26, mod 26 is zero. Eight plus 19 is 27, mod 26 is one. Eight plus four is 12, mod 26 is simply 12, 29, mod 26 is three, eight plus four, again, we get 12. And that gives us what, a, b, what's 12? 12 is m, good. So we can do it mathematically, all right? This is very easy to do by a concept, but we'll see that most of our, well, our encryption algorithms, often we can express as an equation or some steps in some algorithm. And go backwards, decrypt. Decrypt b, you should get t. Just back to the slide. The equation for decryption is the ciphertext value as the number minus the key, mod 26. Always go the wrong direction. The ciphertext, we take the ciphertext value, for example, zero, a is zero, here's our ciphertext value, zero, minus eight is minus eight, mod 26. And that's the first bit of confusion that people come across. What's minus eight, mod 26? So in the first letter, c is zero, k is eight, we get minus eight, mod 26, which is, anyone remember mod minus eight, mod 26? Anyone know what mod means? Modulo? Remainder. Okay, how do you deal with minus eight? So there's in fact two, this leads to some confusion sometimes. There are really two different interpretations of mod, okay? Some deal with, or the answers may be negative numbers. In this interpretation, our answer is never a negative number. Our answer is a number between zero and the modulus minus one, zero and 25 in this case, okay? So there's different ways to deal or to perform a mod. And the mod in our case implements this wrap around. It becomes 26 minus eight in this case, which is 18. Why? That's essentially our definition of modulo. Modulo is when you time some number by, or you times the modulus by some integer, plus some extra, where do we go? Oh, sorry, not eight, 18. Someone said modulo is the remainder. Got this right? That's plus 18. We don't care about this number, minus one. Minus one times 26 is minus 26, plus 18 is minus eight. So if this is true, then minus eight, mod 26, what's the remainder? Mod, the remainder for minus eight, mod 26 was 18. So that's how we deal with negative numbers in mod. It's easy to see the concept. You just go back and wrap around where necessary. But we'll see later mod comes up with different algorithms. What's next? Caesar cypher, very easy. Is it good? Is it secure? How do you, you're an attacker. I know some of you may be, want to be malicious. How do you break the cypher without the key? Try all the, try all the keys. Easy. How many keys are there? 25 or 26? 26 in theory, 25 makes sense to use. One of them is not a good key to use. Because if you encrypt with a key of A or zero, then your cypher text will be the same as the plain text. But still, we have 26 keys. Therefore, what the attacker needs to do, given the cypher text, try all 26 keys. One of them should give you the correct plain text. One of them will give you the correct plain text. I say on this side, try all 25 keys, because the key of A equal to zero, you should notice that that's not the key, if you see the cypher text. But in theory, all 26 keys, and the plain text should be recognized. Let's show another example of that. Where's the example? Here's a cypher text encrypted with a, taken from some plain text and produced using the Caesar cypher. Okay, all right, it's small up here, but it was using English as input, plain text, and here's the cypher text as output. So as the attacker, you have this, you wanna find the plain text, but you don't have the key. So you try all keys. Key from A through to Z, or zero through to 25, and I've done that on that cypher text, and these are all the potential plain text, all 26 values. What's the key? If you can read some of that. L or 11, I know it's hard to see, it's very small font, but if you look through all of these, only one of them is English. The rest is random looking letters. The only truly secure computer is one buried in concrete with the power turned off and the network cable cut. So that was the original plain text. So there's what we call a brute force attack. One attack that we can do in theory against any cypher is try all possible keys. The attacker doesn't know the key, but they're a limited set of keys, so try them all. One of them will give you the plain text, and in most cases, you'll be able to recognize that plain text. In most practical cases, if you get a set of plain text like this, one of them will be recognizable, the others will be random or look like garbled output, and therefore the attacker knows the plain text and knows the key in this case. So C's the cypher is not good from a brute force attack because there are only 26 keys. We'll see there are some other ways to break it as well when we look at other cyphers. What can you do? Well, people sometimes suggest, okay, well, to do a brute force attack, you must be able to recognize the plain text. And in most plain text messages, you can. They have some structure that you can recognize, so it's not too hard in practice. Some people say, well, don't tell anyone what encryption algorithm you used, but again, usually they're only a limited set of algorithms so someone can guess even if you don't know. Compress or use a different language, okay? What if you don't know what language the plain text is? Again, they're only a limited set and usually with some context of who's communicating, you can work out what the language is. So the only real way to protect against a brute force attack is increase the number of keys. 26 is not enough. You can do it manually in tens of minutes. You can do it on a computer almost instantaneously. Increase the number of keys so that a brute force attack which requires trying all keys would take too long. And let's go back now to some of the earlier slides and look at attacks, specifically brute force attacks. So from the attacker's perspective, what they wanna do is discover the plain text because with confidentiality, someone sends a ciphertext across a network. If you intercept that ciphertext and can find the plain text, then you've defeated the security, especially for that message. Or find the key. Because if you can find the key, you can automatically find the plain text. And often finding the key is better because in practice what many people do is send many messages encrypted using the same key. So once you know the key, you can easily decrypt future messages. So the attacker find the plain text and or key. If they find either of one of them, we say the system is defeated, it's not secure. What does the attacker know? We assume they know the ciphertext that they can intercept and find the ciphertext. We assume they know what algorithm is being used, the details of that algorithm, the parameters of that algorithm. And in some cases we'll assume that they know other pairs of past plain text ciphertext. A pair is, let's say in the past, user A has taken a plain text encrypted with their key and sent the ciphertext. It may be possible in some cases that the attacker has of course gained that ciphertext. Also they may have gained what the plain text was for that corresponding ciphertext without getting the key. If they got the key, that's no problem. But in some cases, the attacker may learn about pairs of plain text ciphertext without knowing the key and then use those pairs to help them find what the key is or what future plain text is. We'll see some examples of how that's of use and how that's possible a bit later. So the methods the attacker has, brute force, try all keys, almost always works brute force if you've got enough time and enough computer resources. And the other, so think of that as the dumb approach, just try them all. Then the other approach is crypt analysis. Use a little bit more intelligence. Consider the algorithm, consider the structure of the ciphertext and try and use those characteristics to work out what the plain text or key is. So some analysis. And we've seen a quick example of brute force attack on cipher, just try it all 26 keys. We'll see an example of crypt analysis shortly. And we're gonna assume from now on that the attacker can recognize the correct plain text. If they do a brute force attack, they'll be, we assume that they can pick out R. This is the correct plain text. The others will be random looking characters. And it's normally true in practice. Okay, more on those attacks, brute force attack. All right, try all keys, we talk about the key space now. The key space is the set of all possible keys. In the Caesar cipher, all possible keys was there were 26 values. So the key space was zero through to 25, 26 values. So we had to try them all. The way we measure a brute force attack is in the number of operations or time. So if we use a computer to do that, we think, okay, how many decryption operations do we need to apply? Well, in this case, we needed to apply the Caesar cipher 26 times. Decrypt with the first key, the second key and so on. So we care about from the attacker's perspective how long it will take. We usually don't measure actual time. We measure how many operations, how many decryption operations. The time would depend upon the computer speed. In practical ciphers today, keys are binary values, usually random binary values. So we measure them in number of bits. So a K bit key has a key space of two to the K. There are two to the K possible keys and therefore a brute force attack on a system that uses a K bit key would take two to the K operations. If my key can be a three bit number, so the key, I can choose any value from those three bits, then there are eight possible values. So if the key is three bits in length, K is three, then there are two to the power of three or eight possible keys. A brute force attack would require eight operations. I decrypt the ciphertext with key one, key two, key three through to key eight. And after doing that, for sure I've got the plain text because I've tried all keys. So in practice, the success of a brute force attack depends upon the key length, the longer the key here, the more operations we need in the brute force attack and the computer speed because each operation of decrypting takes some time on your computer. And in practice, we usually don't consider the computer speed, we look at just the key length. Computer speeds vary across computers and of course increase over time. We'll see with the key lengths that we look at, usually it doesn't depend upon the computer speed. We'll see some examples on the next slide. Crypt analysis, we'll return to that after we see an example of crypt analysis. Let's just stay on brute force attack. Here's some examples of different key lengths. So a 32-bit key gives us a key space of two to the power of 32 possible keys. So I choose a 32-bit value, two to the power of 32 possible values to choose from which is about four billion, four by 10 to the nine. That's the key space. And now I've given some numbers in these three columns. What if my computer could decrypt and try those keys at a particular speed? And I've given three different speeds. The column here is what if I had a computer that could decrypt at a speed of one billion times per second, 10 to the power of nine, one billion. So if I could decrypt one billion attempts every second the key space is about four billion. Therefore, to try all keys, it would take about four seconds. I've just approximated these things. So a brute force attack on a cypher that has a 32-bit key with a computer of this speed of 10 to the power of nine operations per second, the worst case time to try all possible keys is four seconds. If we had a faster computer, up to 10 to the power of 15 operations per second, which is a million times faster than this one, four microseconds, okay, trivial. So to stop a brute force attack, we need to increase the key lengths. And I give some different values here. We'll see one of the cyphers we'll look at was called DES, Data Encryption Standard, effectively used a 56-bit key. And we see the times it would take to break using a brute force attack for those different cases of computer speeds. Common keys used today, things like AES, at least 128 bits. So a common system used today we'll see later uses 128 bits. So the key space is two to the 128. And at different computer speeds, we see the number of years it takes. Why? That's a good point. That unit is wrong, I would say. The seconds here, let's calculate and check. Two to the 64. So there's something wrong there because the longer the key, the slower it should be. One of my numbers is wrong. Let's find out which one. I need my calculator. So with the key length of two to the of 64, we have two to the power of 64 values. That's how many we need to try in this row. And the speed of 10 to the power of 15 per second, how many seconds does it take? Well, two to the 64 divided by 10 to the power of 15. That's the number of seconds. How many minutes divided by 60? How many hours divided by 60? I think this should be five hours, not five seconds here. You can fix that on your slide. Good. Five hours is not long. But who has a computer that cannot do 10 to the power of 15 decryptions per second? We'll look at some numbers for some computers soon. But for example, with practical ciphers, my computer around 10, what did I do? With usually about 10 million decryptions per second. Not my computer, but other computers, around 10 million decryptions per second. Some dedicated computers with, and I've got some numbers I'll show in another lecture, the fastest machines built for decrypting at a cost of about 10 or $15,000. About $15,000 can decrypt it from memory around 500 million or billion. I've got a number somewhere. 500 million decryptions per second. I'll show you the details at a later lecture, but we're talking about one billion per second with dedicated computers at about 10 or $15,000. So this speed is typical for maybe not my laptop, but for multiple machines, but maybe $10, $15,000 of hardware. What about a thousand times faster? Well, cost about a thousand times as much, $10 million. Another thousand times faster, well, about another thousand times as much, $10 billion. So if you've got $10 billion to buy and build a computer, these speeds may be possible. If you could decrypt at these speeds and you've got 128-bit key, you'll still have to wait longer than the age of the universe to get the result, okay? 10 to the power of 16 years, one with 16 zeros. The age of Earth is about four billion years. So we see by using large keys, 128 bits or larger, effectively, we defeat brute force attacks. In theory, they're possible, but in practice, they take too long or too much money. So 128 bits is typical, 256 is also common. 64 bits, well, we're starting to be in the realm of possible for a lot of money. Let's look at another cipher, moving on from Caesar cipher. The problem with Caesar cipher, a problem with Caesar cipher, we only have 26 keys. What about a similar cipher that replaces letters with other letters? Generally, a mono-alphabetic substitution cipher. Mono-alphabetic, we use a single alphabet for both input plaintext and output ciphertext. So with Caesar cipher, remember, we shifted by K positions to get the ciphertext. In this mono-alphabetic cipher, we determine for each possible plaintext letter, one of those values is the possible ciphertext output. For example, and I don't show them all here, but we have the 26 plaintext input letters. A particular key with this cipher would say that if your input is A, the output is, and we choose this D, and if the input is B, the output is one of the other 25 letters, which is, say, randomly, Z in this case. If the input is C, the output is G and so on. So in this case, in this particular instance, if the input plaintext was, I've only got a few letters there, BAD, for example, then the output, the input BAD would be ZDL. What is the key in this case? The key is this list, which tells us which input letters map to which output ciphertext letters. So I could write the key is this list of 26 values. Of course we can use a different key. I can choose a key where A maps to some other letter and B maps to some other letter, with the condition being that each input letter must map to a unique output letter. We cannot have two letters mapped to the same ciphertext value, A to D and B to D, because then when we decrypt, we wouldn't know which value to decrypt to. So what we do is we define this mapping in advance, tell the other person, okay, we're gonna use this mapping. I haven't listed them all here. And then we can encrypt and the other person can decrypt because when the receiver receives ZDL, they see from the mapping, Z becomes A, D becomes B and L becomes D. How many possible mappings in this case? 26 factorial possible mappings. Why? When you choose a mapping, start with the letter A. A can map to any of 26 possible values. We've got 26 letters in the alphabet. All right, I chose D for this specific instance. B can map to any of the 26 possible values minus the one that we've already used, minus D in this case, so 25 possible values. A can map to 26 values, B to 25 values, similar. C can map to any of the 24 remaining values, D to 23 and so on. So how many possible mappings? 26 times 25 times 24 times 26 factorial. So in this case, there are 26 factorial possible keys that we could use. And if you calculate that, that's greater than four by 10 to the power of 26. That's a lot, okay? The 10 to the 26 here is not related to 26 factorial, it's just a coincidence. The Caesar cipher used allowed for 26 keys. Our mono-alphabetic cipher allows for four by 10 to the power of 26 keys. We'll come back to that in a moment, just to show you in brute force attack, that's this row. If we have 26 factorial possible keys, or it's about two to the power of 88, with our superfast computer, 10,000 years to do a brute force attack on this mono-alphabetic cipher. So a simple cipher, but a very large key space. So we've defeated against a brute force attack already, but it's still a weak cipher in that it's still very easy to break. And I'm not gonna do it here. I've got some, I'm not sure if I printed it, but I'll point you to a website that describes some steps where you can by hand almost break a mono-alphabetic cipher. You use crypt analysis. That is you use the structure of the cipher text and the algorithm to find the plain text. Even though it takes 10,000 years to do a brute force attack, it takes a few minutes to do an attack by hand. So we'll look at why that's the case. And it's, we true for other ciphers as well. The attack takes advantage of the fact that all languages have some structure and all input plain text has some structure normally. And in the English language, for example, and in other languages, some letters occur more frequently than others. What's the most frequent letter in English? 26 letters. E is the most frequent letter. If you look at a large set of texts, a lot of documents, and you count all the letters, you'll find that E occurs the most. I'll give you some statistics or they're on the next slide, but we'll see some others. Not just letters, diagrams. Diagrams is pairs of letters. So some diagrams appear more frequently than others. And trigrams, triples of letters. T-H-E occurs very frequently, much more frequently than Z-Q-F, okay? So the plain text has some structure. What an attacker can do is look at the cipher text. If there's some structure in the cipher text, then map that back to the expected structure of the plain text and use that to try and assist in finding the key or the plain text. Let's see some, oh, first. This is a plot. Someone's done some counting of some English text, some large documents, and they counted all the letters and they see that the letter E occurs 12%, or 12.7% of the time, about 13% of the time. The next most frequent letter is T, and then A, and then O. And the least frequent letters are J, Q, X, and Z. So that's statistics of one particular text. But many English texts follow that same pattern, not exactly the same, but you can always distinguish the most frequent letters and the least frequent letters. Then, if the cipher text has some structure, then we can use this information to work out what the plain text is. In the last few minutes, let's finish with one example. So, using our mono-alphabetic cipher here, this specific instance, and I haven't listed all the letters, if we looked at the, if we had a large plain text, thousands of letters in the plain text, and we encrypted using this particular key, what do you think would be the most frequent letter in the cipher text? What letter is going to occur most in the cipher text? S. So we take a large input plain text, we use this mapping and we map each letter of the plain text into the corresponding cipher text letters, and we see every E in the plain text will always map to an S in the cipher text. So when we get the cipher text and the attacker has the cipher text, they will count the letters. And they'll see of the letters in the cipher text, E, S, S occurs the most frequent, most likely. And therefore they'll make the assumption, okay, S is the most frequent in the cipher text, that means it's most likely that S maps back to E. And then they'll do it with the second most frequent letter, T, whichever value it was, I don't have it here, and see what that was in the cipher text. And they can do it with the least frequent letters, they can then start to fill in some of the letters in the plain text. And it doesn't take much to get all of the letters. It takes a bit more that I can do in the lecture, but one example to finish. And I think there's a link on the website, but if not, I'll point it to you later. This is worth, if you don't see why that's the case, I've got some examples on this website, I'll point you to that later, about Caesar cipher and brute force, okay, we've seen that. Letter frequency analysis, analyzing the frequency of letters. And for example, there was an example, I took a book, like a large book in text, countered the letters in there and turned out E was about 12%, T, 9%, A, 8%, and so on. That's typical structure. And similar for diagrams in that book, TH was the most frequent pair of letters. HE, the next most frequent and so on. And trigrams, THE is the most frequent. So you can do such analysis. And where are we? From those statistics, you can look at a particular cipher text. So here's some cipher text. And this was using Caesar cipher. Instead of doing a brute force attack, you can see that the same with the mono-alphabetic cipher, we count the letters in the cipher text, we see J is the most frequent, about 13%. So I make the guess, the educated guess, J in the cipher text maps to E in the plain text. And I try that. And if you try it, then you immediately get the plain text. That's where the Caesar cipher. And you can do the same with a mono-alphabetic cipher. It's just a little bit more involved. And you can read through, I won't show all here, but there's some cipher text. The attacker is given that only. They count the letters. They see TZOL on the most frequent. And then we start trying, T maps to E. But maybe it doesn't. It's not guaranteed, but we'll try different ones. And I go through some steps of trying both with letters and diagrams. And then I start filling in by changing them to uppercase to represent the plain text value and trying different combinations. And after a few steps, it's not so easy to see, you start to see patterns, possible words. And you know what I like and you start to see, look at the uppercase letters, they are the plain text values if we've done the replacements. So maybe you start to guess, this is internet. Okay. And then say, see that, okay, K maps to R and start filling in other letters. And you keep going and it doesn't take too many steps. And once you're at this stage, then you use the context of the message, this course and then you can fill in the rest quite easily. And you eventually get the plain text. So that's done manually, it takes, all right, by hand with a bit of computer support, it takes minutes to do that. So something that takes 10,000 years to break with brute force attack, takes minutes to do by hand and is almost instantaneous with some computer support. So we look at other ciphers next week and see how we can improve upon the ones we've looked at so far.