 And today we must finish this topic, it's longer than I hoped. And I've given you, if you haven't grabbed a copy, there's a sheet floating around, just a one-page two-side sheet which summarizes a lot of the things that you need to know from this large topic. This large topic covers many different subtopics. We got onto public key encryption, where we have a public and a private key. So we generate a key pair. Each user has their own public-private key. And we use algorithms such that if we encrypt with one key, we can only successfully decrypt with the other key in the key pair. That's how the algorithms are designed. So if I take some message, I encrypt it with my public key, then the only way to decrypt the ciphertext is using the corresponding private key. And usually they work in the opposite order as well. That is, if I encrypt something with a private key, then the only way to decrypt is using the corresponding public key. If we use some wrong key, some other key, then when we try to decrypt, we'll get an error and we'll be able to detect that. Similar with symmetric key encryption, when we encrypt some message with a secret key, we can only successfully decrypt with the same secret key. If we try some other key, then we'll get an error. And those assumptions, or those basic assumptions are listed, and I'll return to that handout in a moment, are listed in that handout. So with public key cryptography, every user, we assume, generates their key pair. And that's one of the first tasks of your homework. Generate your own key pair. We'll use it. And we can provide two main services with public key encryption. Confidentiality and authentication. Remember, confidentiality is making sure that no one can read the contents of the message. I send a message from A to B. I don't want anyone else to be able to receive or read that message. Let's keep it confidential. Authentication is sending a message from A to B. I want to make sure from B's perspective that this message came from A. It didn't come from someone else pretending to be A, and no one modified it along the way. So two different aims. One, keep the message secret or confidential. The other, make sure the message we receive is authentic. And we generally can do both with public key cryptography, and we see by using the keys in the opposite order. With confidentiality, the idea is that user A to send a message to B, we encrypt with B's public key. Why? Well, the nature of our public key algorithms is that if something's encrypted with B's public key, the only way to decrypt is using B's private key. And the only person who has B's private key is B. So the only person who can decrypt this ciphertext and get the original message is B. No one else can find the message. So that provides confidentiality. If user C intercepts a ciphertext to decrypt, they need the private key of B. But by definition, that must be kept private. So that provides confidentiality. Encrypt with the destination's public key. So if you want to send me a secret, you need to know my public key, and you encrypt using my public key. Of course, that needs to be made available some way. I publish it on a website, or I give it to you in class, or we distribute it across a network in some way. The other service, authentication, we don't care if someone sees the message in this example. What we care about that B can confirm who sent the message. So in this case, the source encrypts the message with their private key. To decrypt, to successfully decrypt, we need the public key of the source, public key A to decrypt. And that means if it successfully decrypts with the public key of A, that implies the message must have been encrypted with the private key of A, which implies it must have come from A because the only person who has the private key of A is A. And that provides our authentication. If I decrypt with the public key of A, and it doesn't decrypt successfully, then that implies that it was not encrypted with the private key of A, or it was modified along the way. And that shows from the receiver that something's gone wrong. It's not authentic. If an attacker pretends to be A, but encrypts a message with private key of C, sends it to B, B thinks it's from A, will try to decrypt with the public key of A, it will not decrypt successfully, the receiver will detect that and assume something's gone wrong. So since A is the only one with a private key of A, then this provides authentication. And in fact, this is the most practical way where public key cryptography is used. It's not used so much for confidentiality, but it's used mainly for authentication. And we'll see that leads to signing something. There are different algorithms for public key cryptography. The only one is RSA, but there are others. RSA maybe has been around the longest and the most widely used. And you'll see in the handout that one of the principles that is often applied in security is that you gain more trust in algorithms that have been used more. So a new advanced algorithm that's released last year may be faster. We may think it's better, but an algorithm that's been around for 10, 20 years and been used and analyzed, we may consider more secure. So something that we have experience with, we generally trust more. Because security is complex and people need to do some analysis before we can build up the trust. So RSA's been around for many years, even though there are potentially faster algorithms, RSA is still used a lot. Diffie-Hellman digital signature standard and elliptic curve cryptography are the common techniques. And that puts the operations that are in those diagrams in a set of requirements. So we need some algorithm, encryption algorithm and decryption algorithm such that these requirements are met. Three just means it's practical to encrypt and decrypt if we have the right information. From a security perspective, four and five are important. From the attackers' perspective, if they know the public key, they know the algorithm and they know the ciphertext, it must be impossible to find the private key. And note that the public and private key, the key pair, are related. There's a relationship between the two values. They're not random. There's some algorithm that determines the values. So even though we know the algorithm that relates these two values, we need it to be such that if I know the public key of someone else, I know the ciphertext and I know the algorithm, it still should be practically impossible to find the other person's private key. If I do, it's not private and our system fails. And similarly, if I know a public key, a ciphertext, it should be hard to find the plaintext, the message. This last requirement six just says, often we want algorithms to be able to allow to use the keys in either direction. If we encrypt with a private key, we can decrypt with a public key, or if we encrypt with a public key, we can decrypt with a private key. Some algorithms have that property, some don't. There are some more details about the requirements, which I think are not necessary for us to cover. That slide is one of them. Attacks, it turns out attacks often for the algorithms that we're discussing come down to brute force attacks, trying to guess the key. And if we make the key large enough, then it becomes too hard to guess the key. But an interesting thing about the public key algorithms is a lot of them are based upon some mathematical operations, and the security really relies in that some of those mathematical operations are hard to compute, solving the mathematical equations are computationally hard. And you need to study the algorithms and some of the mathematics behind it to start to understand the difficulties there. RSA is the main algorithm. You'll use it to generate your key pair. What you do, you generate keys, a public and a private key, and then you can encrypt and decrypt with those values. Without going through the steps for generating the keys, so there's some mathematics behind it, it involves using some large prime numbers. You select some large prime numbers, perform some operations, and from that generate your keys, which really are made up of some value n, some value e, which are public, they form your public key, and some other values, p, q and d, private values. P and q are your initial prime numbers, d is generated from them. So the idea, what you'll do with OpenSSL in your first step is the software chooses two large prime numbers, p and q, almost randomly chooses two large prime numbers, does some calculations, and from that generates your public and private key, where we can think of the public key, I'll have it, maybe we'll draw it on the board. The public key, p, u, is denoted as the value e and n, and the private key is d, and often we include n as well. That is, we choose two large prime numbers, generate n, e and d. How we generate them, you need to study the algorithm to see how that works, and we're not going to go into that detail. But we generate these values, and n and e, we can tell everyone, they're public, and they form our public key. So anyone else in the world can know those two values. d is really the secret value, that's generated as well, and what's commonly done is that we say that d with n is considered the private key, but no n is public, so d is the secret value. So when I generate these values, e, d and n, I can tell you the values of e and n, and in fact that's what you do with open SSL in your homework. Open SSL chooses two large prime numbers, p and q, generates e, d and n, and saves them in a file, as well as a few other parameters, but saves them in a file, and when you extract your public key, you'll see it contains the value of e and n, and you send them to me. You submit the file which contains these values, and now I know these values I can use to encrypt and decrypt. And then with RSA, the algorithms design them the way that these values are generated is designed such that if you have some plain text, you treat it as a number, an integer. And to encrypt, you take the value, you raise it to the power of e, mod by n, and you get your ciphertext value. So encryption mathematically is very simple or beautiful in that here's an equation to encrypt some plain text. AES, DES, many of the symmetric ciphers are very complex in the design of the algorithms. RSA is just a simple equation with four parameters. It says take your plain text or your message, we often use m to mean the plain text because p often is confused with the public and private key. So m, the message, treat it as an integer, raise it to the power of e, the value that we generated, mod by n, the value that we generated, and the value, the answer is our ciphertext and we can send that ciphertext across the network. When someone receives that ciphertext, note that we used e and n to encrypt. So we use the public key to encrypt that message. So to decrypt, we need to use the private key. So to decrypt, we take the ciphertext, raise it to the power of d, the secret or private value, and mod by n, the same n as we used to encrypt, and we will get the original message back. That will work because of the way that the key generation was designed, and you need to study that algorithm to see why that's true. It's an interesting design to see why if you apply these algorithms, you'll get the original message back, but we will not go into that detail. So RSA, one of the most commonly used public key algorithms, is simply for encryption, this mathematical operation, for it to work, you need to generate keys according to some steps, and the keys we denote as the public and private key, e and n and d and n, whereas n is in fact public. Rather than doing that by hand on paper, we use some software to do that for us, and that's what you're using OpenSSL for in the first homework task. You go back to the instructions, you generate your own RSA key pair, 2048 bits determines the length of n. It's 2048 bits long, a long number, a very large number. And e is in fact fixed to 65,537. There's some reasons for doing that. So your key pair will contain e and n. n would be 2048 bits, some large number, e would be 65,537, and also it will generate your private value, d, some other number. Of course, you cannot tell anyone what the value of your d is, because then it won't be private. It actually generates some other values that help to speed up the encryption and decryption, which are the main values. So that's step one. It generates these values for you. It actually combines them into one file. For you, step two is to extract these two values from it and put it into a second file so you can submit to me at step two. We'll see the other steps later. So RSA, designed by Rivest, Shamir and Edelman, RSA. It's considered secure today. There may be a few special instances where if you use particular values, it's insecure, but in practice it's considered secure. And it turns out that the way to break it, the best known way to break RSA, is to, if you know as the attacker the value of n, which is in fact generated from two prime numbers, p and q, n is in fact just multiplying p and q together, if you know n, if you can factor it into the two prime values as the attacker, you can find d. And once you can find d, you've broken RSA. So the strength of RSA depends upon the ability to, what, the fact that it's hard to factor large numbers into its primes. Large numbers, n for example, 2048 bits. So 2 to the power of 2048 is the maximum value of n. So take that number. If you can then factor it into primes, then you can break RSA. And there's no known way at the moment to do that fast. Some of the past attempts, for example, in 2009 when n was 768 bits, so there was some competition. And with n, 768 bits, so if you write that down, it was about 232 decimal digits, so that's how long the number is. Someone did an attack using some computers and effectively took, if you had a single computer, they distributed across many computers, if you had a single one, it would take 2,000 years to factor this number into its primes, or 10 to the 20 operations. Of course they actually factored it, but they used many, many computers at the same time in parallel. So that was the best known in 2009. Nowadays it's assumed that still 1,024 bits is secure, 2048 is better and commonly used or recommended, 4,096 even better. Whereas the metric key ciphers are measured in the length of their secret key, often, especially for RSA, we can talk about the length of the value of n. Because the difficulty is, given n, find its prime factors. You will use 2048 bit n with RSA. But we're going into too much detail. Again, if you want to know this, you need to take security and cryptography to know how RSA works. That's not important for your objectives. A problem. Most public key crypto systems, especially RSA, are much, much slower than symmetric key encryption. And you can try it in OpenSSL. I won't do it on my computer, but you can do a speed test. Take a file, encrypt it with AES. Take the same file, encrypt it with RSA. And RSA will be much, much slower. It's too slow to encrypt large files. So in practice, because it's not very good for encrypting large files, public key cryptography is only used to encrypt small amounts of information, like we'll see to sign information. So we've talked about symmetric key encryption, one of the main algorithms being AES. We spoke about authentication, making sure that we can confirm what we receive is authentic and different methods for doing that. And we introduced public key encryption, RSA being an example there, when we have two keys. Key management, let's skip over. I want to finish this topic today. The issue is, well, the issues are, how do we share a secret? If I'm going to encrypt with symmetric key encryption, I'm going to use a secret key, and you need to have that same secret to decrypt. How do I get that secret to you? I can write it down on a piece of paper and give it to you. But what if you're on the other side of the world and we're using the internet to communicate? What can I do? Send it in an email to you, the secret key. Well, what if someone intercepts the email, they'll find the secret. So distributing secrets is not easy. Similar with public key encryption. The idea with public key encryption is, for example, I encrypt some value with someone else's public key, and they decrypt with their private key. But how do I get their public key? I give you on a piece of paper, here's my public key, or you submit in the homework your public key. But what if you want to send it across the internet? In your email, you send your public key to your friend, and they will use that value. But what if someone fakes that public key and say, here's the public key of Steve, and in fact it's the public key of some malicious user, but you think it's my public key, then that can cause problems. So they become issues of how do you get someone else's public key in a trusted manner, such that you know it's theirs, it's not someone pretending to be, say, this is their public key. And issues of, okay, when do you change keys? Do you keep the same key for the rest of your life? Or do you change it on a regular basis to be more secure? There are ways to practically overcome some of these problems, sharing secret keys, we can use public key encryption to do so, and that's what you'll do in your homework. And there are ways to verify public keys, and we'll see that in later topics after the midterm about digital certificates and the issues with verifying public keys. So that's it, key management. Interesting topic, but we're not going to go into the details. This one I want to cover to finish this topic, digital signatures. And it's related to public keys and authentication. So we've basically done it already. The aim is authentication but a special case. When you sign something, a piece of paper, so there's a document and you sign your name on it, that is some confirmation that you approve of that document. So if you sign some contract, then that acts as some proof that you, that person, approve of that contract, agree to that contract. And then later when we see and you say, no, I don't agree to the contract, we can come back and see, well, here's your signature. If you signed it, we can prove that it was from you. That's the idea of a normal signature. We want to have the same service in computing and communications and that's where digital signatures come in. To be able to prove that, prove to anyone in the world that a message came from or is approved by a particular user. That is, I send a message to the class saying there'll be, the extension, the homework deadline is extended by another week. You receive an email. Then one thing you'd like to do is to be able to prove, did it really come from me or did it come from someone pretending to be me and just tricking you to submit late and then you get zero for the assignment. So can we sign a message? Well, that's where we use digital signatures. What about using symmetric key cryptography? The idea we saw that we can use symmetric key cryptography for authentication. What happens is that we have two users, A and B. They share some secret key K, that's symmetric key cryptography. So we take our message that we want to send and one of those users encrypts with the secret key and let's say user A receives the message. User A receives the message. If it decrypts successfully, it means it must have been encrypted with that secret key and the secret key is known by A and B. So if A receives the message, they can verify that this message must have come from B because only B has that secret key. It's shared between A and B. So that's okay, that's authentication. A can prove from its perspective the message came from B. But another user, some user C, they receive a message. Can they prove that it came from B or with symmetric key cryptography they cannot? Some message encrypted with a key shared by A and B could have been encrypted by either A or B. We don't know which one. How can we prove that A encrypted it or B encrypted it? Both A and B have that same secret. So there's no way for someone else to prove exactly which user created this message. And with a signature, we want to be able to prove that one particular user created a message, not one of two. With symmetric key crypto, we can't do that because a key is shared between two users. So if we have a message encrypted with that key, we don't know which of the two users created that message. It could be either of one of them. The message that says it's from A, signed by A, but maybe it was B pretending to be A and performing some attack on A. So symmetric key crypto cannot be used for a digital signature because we cannot prove who this message originated from. We use public key crypto. Because with public key crypto, with a key pair, every user has their own public and private key. Only one user in the world has this private key. So by the definition of being private, only the user knows that value. So let's see how it's used. The concept is, and we've actually seen it before, we have a message. I want to sign that message. I'm user A. I have a contract in a Word document and I want to digitally sign it. So what I do is I take that message, that document, and I encrypt it using public key cryptography and using my private key. So I sign with my private key and I get some value as an output and we call that the signature, S. And usually what I do is I attach the signature with the original message, the document. So I send you, here's the Word document and here's the attached signature. And then you need to verify that it was signed by me. And the way to verify is you take the signature and you decrypt it using the public key of me, of A in this case. If it successfully decrypts using the public key of A, then it means it must have been encrypted with the private key of A which means it must have come from user A because only user A has the private key of A. So this is proof that this message came from user A. So what we do is to verify as we decrypt the signature using the public key of the sender, we get some value as an output. If that value matches the message received, then we assume that the signature is valid. It's verified. So this is the concept of signing something. Anyone in the world can now prove when they receive the message that it came from A. So A sends this signature and the message to everyone. They know A's public key. So to verify the message, they decrypt with A's public key, get some message and compare against the original message. If they match, then it means that it must have come from A. And that's a digital signature, or the concept of a digital signature. In practice, so the E and D are public key algorithms like RSA, for example. But we've said before that public key cryptography is very slow. So if my message, I want to sign a large file like a DVD, 5 gigabytes. RSA is too slow that it takes a long time to apply the encryption operation on a large file. So what we do in practice is we take a hash of the message and sign the hash value. So this is the practical way that digital signatures are used. There's no need to encrypt the entire message, we encrypt just a hash of a message. And remember we defined hash values, hash functions before, we said some of the properties is that the hash of two different messages will give us two different hash values. It should be impossible to find two messages which are different, which produce the same hash value. So what I do to assign something now is I have my DVD, the message. I want to sign it so that when someone receives it, they know it's original, it came from me. I take the content, the message, I calculate a hash of that message. I remember hash functions produce a relatively short hash value. MD5 produces 128 bits. SHA 512, 512 produces a 512 bit hash value. So my 5 gigabyte DVD, the hash value is just hundreds of bits. So very small. I take the hash value, encrypt it with my private key, encrypt the hash value with my private key and the result is called the signature. The signature of this message. Then what I do is I send the message and the signature to whoever wants to receive it, B for example, and it's then there task to verify. And same to verify, we take the signature, decrypt with the public key of the signer, A signed, decrypt with A's public key. We get some hash value as the output. If this value H matches the hash of the received message, then we assume that this has been verified. That's the practical way that signatures are applied. Let's try and draw that as an exchange and see what can go wrong and see how our hash functions work. So we have our user A and say sending a message to B. The aim is for B to verify the message came from A. So the basic approach is that A calculates the signature of a message S by encrypting using A's private key. Only A has his private key of the hash of the message and send that along with a message to B. So we say we send the message concatenated with the signature. So that's sent across the network. A has a message to send to B. They sign the message and send both the original message and the signature to B. So the two vertical bars again are concatenation. We combine them together. So we send both of them in the packet, for example, or the email or the file. This doesn't provide confidentiality. Anyone can see the message. That's not our aim. I don't care if someone sees the message. What we want to make sure is that confirm that it came from A, not someone pretending to be A. To verify B, they receive the message. They must know the hash and encryption algorithm use. They take the received message and calculate the hash. So M that's received, they calculate the hash of that. And then they decrypt the signature. So they decrypt using the public key of A, the signature. And what do they get with public key cryptography? If something is encrypted with a private key of A, this was encrypted with a private key of A. If we decrypted the ciphertext with a public key of A, we get the original plaintext. So we'll get a hash of M as the output. So if nothing has been changed and we're using the right key, then we'll get the original input from the encryption, which is the hash of M. And they should be the same. So that's the normal operation. A signs the message, sends the message and the signature to B. B verifies by taking a hash of the received message and checking the signature by decrypting it with a public key of A. And if those hash values match, then it means the messages match because our assumption about hash functions is the hash of two messages will produce two messages which are the same, will produce the same hash value, or the hash of two different messages will produce different hash values. So if the hash values are the same, the messages must have been the same. Let's see what an attacker can try to do and see where it goes wrong. So let's say an attacker pretends to be A. So B is going to receive a message. An attacker, our malicious user, is going to send a message to B pretending to be A. So they're going to send a message concatenated with some signature. What is the signature? Well, we need to encrypt the hash of that message. What can the malicious user encrypt with? They want to pretend to be A. Normally we encrypt with the sender's private key. So the malicious user doesn't have A's private key, so let's say they use their own private key, the private key of the malicious user. So the malicious user has some message, they want to pretend to be A. They send the message to B, M, and they send a signature of that message which was obtained by taking a hash of M encrypting with the private key of the malicious user that's sent. B, when they receive a message, when they think it's from A, they verify. The verification steps take the received message, calculate its hash, and then decrypt the signature. And how do they decrypt the signature? How does B decrypt the signature? What key do they use? Public key of A. If you think the message comes from A, then to verify you decrypt with A's public key. And then we compare the result of the encryption with the hash value, we compare these values. They will not match. Because note that we've encrypted with the private key of the malicious user, we did not decrypt with the corresponding key in the key pair. And when we decrypt with the wrong key, we will not get the original plaintext out. That's our assumption, and it's generally true. That is, here's our plaintext. We encrypt with a private key. If we decrypt with the corresponding private key, we'll get the plaintext back. But if we decrypt S using the wrong public key, we will not get the plaintext back. That is, we'll not get the hash of M as the output. We'll get some other value. And when we compare it to here, they'll be different. So the output of the decryption will not equal the hash of M. And that's how B knows something's gone wrong. This message didn't come from A. Someone's either pretending to be A, or someone's modified something along the way. So it's some assumption that we're relying on here. And I've tried to capture these assumptions in the handout that I'll pass around this morning, is that both with symmetric and public key encryption, if we decrypt with the wrong key, we will not get the original plaintext. We'll get something else. So here we've decrypted with the wrong key. So we do not get the original plaintext, which means it doesn't match the hash of the message. B detects something's gone wrong. What else can you do as an attacker? What can you try? That was trying to pretend to be A. Any other attempts? I'm trying to change the public key of A to be the public key of... Okay. That doesn't know that they are changing. Correct. So in this case, B thought that the message was from A. A malicious user was pretending to be A, so B decrypts with the public key of A. If somehow the malicious user could make B think that the public key of A is not in fact public key of A, but is the public key of the malicious user, this would be a successful attack. That is if... Let's say B has the public key of A. It has a list of public keys. Maybe it was published on a website. A's public key listed on a website. But in fact, it wasn't A's public key. It was the public key of the malicious user. B thought this was the public key of A. Then what B does when they receive a message from A, they decrypt with the public key of A. But in fact, the public key of A is the public key of the malicious user. If we use that in the decryption, the result of the decryption will be the original hash value. And B would be fooled into thinking that this message came from A. So a successful attack is possible if we can somehow make B think that the public key of A, or that this public key of the malicious user is in fact the public key of A. That's the issue of key management. I say, here's Steve's public key. How do you know it's really mine? Or you see on a website, this is Steve's public key. How do you know it's my public key? Well, we need some way to verify to manage keys that way. So yes, that's an issue. We'll come back to that issue and we look at digital certificates and see still it's a challenging problem. But assuming we can distribute keys correctly, assuming that we can't do this attack, what else could the attacker try to do? Let's try and see if we can modify a message first, a simple attack. Let's say A sends a message to B and malicious user tries to modify the message. So A takes some message and the signature and we send the message, but it's intercepted by the malicious user. And the message is again, as original before, the message and the signature, where S was calculated by A as encrypt with the private key of A, the hash of M. So this is the same as our original scenario. A generates the signature, attaches that to the message, sends it to B, but malicious user intercepts. Before it gets to B, they get a copy and can they modify the message and send it to B? Send it to B and make B think that the message came from A? Well, let's try. Let's say we change the message to M prime, a different value than M, the original message M prime. We don't know, of course the malicious user doesn't know the private key of A. So we will not try to recreate a new signature because it won't work, we saw from the previous case. If we try to encrypt something with the private key of the malicious user, it won't work. So let's just attach the original signature. That is, all I do, let's say the signature is 512 bits, the message is a megabyte, what I do is I, as a malicious user, I change the first one megabyte to my new message M prime and I just copy the last 512 bits and attach it to the end and send that to B. B receives, thinks it's from A, they verify. And the verification, same as before, we take a hash of the received message M prime and we decrypt the signature. What key do we decrypt with? Again, we think it's from A, we decrypt with a public key of A, the signature. When we decrypt the signature, what's the output? So this is B verify. Receive a message, take the hash, receive a signature, decrypt the signature with A's public key and we will get H of M. The signature was created by encrypting H of M with the private key of A, when we decrypt it with the public key of A we'll get the original plain text back, we'll get H of M, the hash of the original message. Do they match? Are they the same? That's the verification step. Hash of the received message, decrypt the signature, compare. Are they the same? Hands up for yes. Yes, they are the same. Hands up for no. Okay, they're not the same. The hash, and this comes back to our hash functions, the hash of two different messages, M and M prime are different, will produce two different values. So when we compare them, we'll see not the same, therefore we've detected something's gone wrong. So this relies on our hash function property. The hash of two different messages gives us two different hash values. If the malicious user could find a message M prime, which was different from M but had the same hash value, if they could do that, this attack would be successful. But with strong hash functions, it's practically impossible to find two messages which have the same hash value. So that attacks unsuccessful. So signatures generally work by using public key encryption and hash functions. The hash functions are used to reduce what we encrypt. Going back to our slides. In the concept, we only really need to encrypt the message. But with public key crypto, encrypting large messages is too slow. So in fact, we encrypt the hash of the message, and the hash of the message is quite short. It makes it convenient in terms of performance. Verification, decrypt the signature compared to the hash of the received message if they match, verified. If they don't match, not authentic. Problem. There are different algorithms for digital signatures. RSA is common, but there are other algorithms which are also used. DSS, or the digital signature algorithm, which is part of the digital signature standard, DSS, elliptic curve, DSA, El Gamal, and there are others. But RSA is still quite common. And you can use different hash algorithms. But the common ones are in the past MD5 and now SHA, SHA2. So the different hash functions. And that's why we need these properties of the hash functions that we introduced on Tuesday. So because we use them for digital signatures. These properties of one way and it's hard to find collisions. Almost gets us to the end of this topic. Questions about digital signatures before we move on. Everyone's got a copy of that one page handout. That sheet, if not, there's a few more hanging around. Make sure you have one. Anyone? A few more spare. I put it on the website this morning. I try to summarize many of the key assumptions that we're going to use in later topics from this topic on cryptography. I'm going to not go through them all but let's just look at a few that will be useful. So cryptography is a very wide topic. We don't need to know all the details to look at how it's used. What we often do is make some assumptions about that some algorithms are strong, some aren't. And I've tried to list some of the key assumptions. They normally hold. They are assumptions. They're not always true. There may be special cases when these assumptions I state here are not true. If they're not true, then we may have security flaws. But usually we'll know when they're not true. So we'll normally make assumptions. For example, A7. Assumption 7 here. If we decrypt some ciphertext using the incorrect key. So some ciphertext was created by encrypting plaintext with one key. If we try to decrypt that ciphertext using the wrong key, then we assume that the output will not be the original plaintext. And the decrypter, the person doing the decryption, will know. They'll recognize that the key that they just used is wrong. What is the wrong key? If we're using symmetric key encryption, the wrong key is the shared secret that wasn't used for encryption. If we're using public key encryption, it's not the key in the corresponding key pair. So that's actually covered in the previous assumption. The wrong and right keys. So this is an assumption that we make. If we decrypt something and we're using the wrong key, then we'll get some... We will not get the plaintext as output. Some other assumptions. From now on, we're going to assume that brute force attacks are not possible when we're using strong ciphers. But it's the easy way to stop a brute force attack is make your key longer or make the parameter longer. So, right, it varies as to how long a key should be to stop a brute force attack, but, well, anything... 128-bit key, impossible. 100-bit key is considered impossible. So I said anything above 2 to the 80 operations or attempts, then let's assume that it's not possible to do that in a reasonable time that is under 100 years. And some other assumptions. We assume the attacker knows everything that's public. The algorithms, the language that the plaintext was in, let's assume that they know that, that they can use some way to find out that. So we can't make it more secure by saying, oh, let's make the plaintext entire instead of English. That doesn't help. Because in practice, the attacker can find that out. Yeah? Okay. Okay. You said if the attacker can know the private key from the public key before you continue, let's go to one of the assumptions. An attacker does not know secret values. So if we say we have a secret key, we assume the attacker cannot find that secret value. If they can, it's no longer secret and our whole system fails. So in public key cryptography, the public key is public. The attacker does know it. The private key, and especially D, is private or secret. And we assume the attacker doesn't know it. And we assume, from now on, we're going to assume that the algorithms that we use are strong enough such that there's no way for the attacker, even if they know the public key to find the private key. So if the attacker somehow manages to find the private key, then we assume the system fails from a security standpoint. So the attacker can't find the private key from the public key. You say if they can, well, if we analyze the algorithms, if we use strong algorithms, they can't. There's no known way. It's secure. If we use an insecure algorithm, maybe they can, but if we use secure algorithms like RSA, AES, for symmetric key encryption, there's no way for the attacker to find the secret. No practical way. Brute force, again, we don't put exact values, and this 2 to the 80 is not an exact value. My point is that in practice, to avoid a brute force attack, we can make the key large enough such that it will take billions of years. How can you reduce that to be manageable? Well, even if you double the speed of your computers, then billions of years down to, well, still billions of years. That's because I saw in the... Okay, that's a different thing. And that comes in the next topic, passwords and locking accounts, but that's not brute force attack. Although we can use that technique to slow down brute force attacks. But we'll see when we come to passwords, brute force attacks are possible. But when we're talking about our encryption algorithms, it's just all we're talking about at the moment, then they're not possible. But yes, we'll come to your question and comment in the next topic. The assumptions about hash functions, digital signatures, key management we didn't really cover. Well, quite simply, let's assume that we have a way to exchange a secret between two entities. If we need to both have a secret, let's assume we have some way to get that secret to the other person. And there are practical ways to do that. Physically deliver the secret, not very convenient, but there are protocols that allow us to go across the internet. And similar, we're assumed from now on that we can obtain the correct public key from the entity. So if you have Steve's public key, it really is Steve's public key, it's not someone pretending to be Steve. That's what we'll assume in our subsequent discussions. And some other assumptions. The principles are just a few other issues. It's definitely not all principles used in security, but some that may have come up from our discussion. I mentioned this one before. Let's assume that generally the better known an algorithm is, the longer it's being used in practice, the more secure we may consider it. The less chance it will have flaws. You come and design a new algorithm. You say it's 10 times faster than RSA and you write some software and distribute it to the world. Then people will not trust it until they've used it a lot and done a lot of analysis of it. Not just used it for one day, but used it and analyzed it over a period of years usually. So the longer it's being used and there are no known flaws, the more secure we consider it to be. So we don't just upgrade to the newest version as soon as it's released. Performance, symmetric key algorithms are much faster than public key algorithms. You use public key crypto to encrypt large amounts of data. It's too slow. Increase the plain text and you increase the time to encrypt. That's the number three. We should distribute keys using automatic means. For example, I ask you to give me your public keys. One approach. You write them on a piece of paper. I give you my office and I get your public key and all 40 students come to my office over a period of a week and I got your public keys. Well, that's a manual means. An automatic way would be to let's say publish the public key on some website or use some software or some protocol to exchange public keys automatically. Manually doing things is inconvenient, especially in a large area across the world. Coming to my office is okay for you but what if we want to distribute to users in other countries? We need automatic means. We haven't really covered this but a principle that we'll see come up is that the more times you use a key, a secret key, generally the greater the chance of that key being discovered by an attacker. We'll come up to that with passwords, the same things apply. You should change your password on a regular basis. Change your secrets. It decreases the chance that it is discovered. Use multiple security mechanisms that overlap in what they try to do so that if one fails you have some backup. Don't put all your trust in one security mechanism because if that one fails your whole system will fail. We'll see that principle come up and we'll see some other principles over the topics. Have a read through those assumptions, the rest of them we haven't covered and really if you can, we'll use them as the basis for the next topic. So even if you don't understand all the details of the cryptographic techniques we've used we can still understand the subsequent topics by using those assumptions. Random numbers. It's hard to create random numbers with computers but let's assume that we've got some ways to do so. So we're not going to cover random numbers. They're important. Very important in security. Turns out it's not easy to create random numbers. Computers are deterministic. They follow some steps, some algorithm. So how do you get randomness in a computer? Most random numbers created in software are what we call pseudo-random numbers. They're not truly random. They follow some sequence. So we have pseudo-random number generators. True random number generators use some non-deterministic source like some radiation events or measure some physical event in the environment. But it becomes very inconvenient to get your computer to measure radiation to generate random numbers. So we usually use what's called pseudo-random numbers. We will see random numbers are used in many different parts in IT security. That is, we rely on them to be secure. Sorry. Time. Random... Get your computer to generate a random number. How do you do it? With a time. Time's not random. Time's predictable. But how? You need some algorithm. An algorithm that a computer uses is deterministic. We know the algorithm in advance. Again, how do you produce a random number? Write a piece of software that creates a random number. Just choose a random number. But how do you choose that random number? How do you get a computer to choose a random number? When we use software, we call some function RAND or something, and it returns a random number. But how is that function implemented? That's what we care about. How does that function choose a random number? Well, in many cases, there's just an algorithm and it uses some equations to generate one. But it's not truly random. In many cases, there's a sequence of values it selects from. And it's predictable as to what the values are. So that is a challenging problem. Okay. So better random numbers start to use some other source of events. For example, a computer operating system, if you measure keyboard presses, you measure activity on the hard disk, you measure noise leaked from electrical circuits, then some of those activities exhibit randomness. And if you can measure those values, then you can generate close to true random numbers. But that requires measuring the hardware and what I can do on my computer to measure hardware may be different from what you can do on your mobile phone. So making a random number generator that measures, for example, noise from your CPU circuits on your motherboard is not easy. And not all devices support that. So yes, using some hardware inputs can lead to true random number generators. But we don't have them available on all devices. So often, we use some algorithm to generate and they are not true random numbers but pseudo random numbers. And you find that many facts on security systems have in the past taken advantage of the fact that poor implementations of random number generators. We may point to some as we come through different topics. Some example random number generators, but we're not going to cover that. But for now random numbers are important in security. In theory, they are hard to generate. It's hard to create random numbers in computers that work across many different systems. But from now on, we're going to assume that we've got some algorithms that will produce random numbers. This is a long topic covering many different things. A bit of a summary, but maybe the best summary is the handout of the assumptions that you've got. That summarizes the main points that we're going to need for the rest of the topics. Many things we haven't covered. Key management is hard. We've sort of skipped over it. Making sure that keys are from the right person. We use all these super secure algorithms and then someone implements it in software and they make a mistake and implement bugs. They have bugs in their software and it leads to flaws in the algorithm. That's common. And that leads to avenues for attacks. So we may have a perfect algorithm and then someone goes and implements it in software and they implement it wrongly and that may lead to attacks that are possible. So that's a practical avenue of attack. It's often difficult to prove the security of algorithms to say 100% this is secure. So making a judgment of this is secure or this is more secure than something else is challenging. And there are many other topics that we haven't touched upon which are interesting but we will not explore. Okay. Next topic, next week we'll move on to passwords. And we'll start to move into practical things of IT security now. The cryptography will be used throughout the rest of the topics but mainly the assumptions that we have here will be used.