 Hash functions take a variable size input and produce usually a short fixed size output called the hash value. And the hash value should be random. So we saw some of those characteristics last week. Let's continue with how we use hash functions starting for authentication. We went through an example last week. But let's go back to the first example. This is using hash functions combined with symmetric key encryption. And the picture we'll do some analysis is this one. It's the same one as on the slide. The source wants to send a secret message to the destination b and also wants that message to be authenticated. So what they do is first calculate the hash of the message, combine it with the message, that is the hash value, and then encrypt the lot, that is encrypt the message concatenated with the hash. And the destination has two tasks to do. They need to decrypt. And since they have the secret key k, they can decrypt successfully and get the message. But then they need to verify the message and they use the hash to do that. They calculate the hash of the received message and compare it with the received hash value. If they match, it accepts them. Why do we need a hash value here? Why not just encrypt it? Why not just rely on symmetric key encryption to provide the authentication as well? What's the benefit of also using a hash value? More secure, not specific enough. Under what conditions is it more secure? What is more secure? But more generally, why not just use symmetric key encryption? Remember with symmetric key encryption, we said if the message successfully decrypts with the correct key, it implies that the correct person sent and it implies that the message hasn't been modified. So why incorporate a hash? Well, in fact, symmetric key encryption doesn't always work like that. That is, for symmetric key encryption to be used for authentication, remember that the message that we decrypt or the message that we obtain when we decrypt must be recognizable. For example, an English message or some structured message that we know this is the correct plain text. If we can do that, we don't need a hash value here. But in some messages and in general, we cannot always decrypt and get a message that the receiver B will know is correct or not. For example, if the message is a random secret key, when B decrypts and gets that random sequence of bits, how does it know whether it's the correct or incorrect value? And that's why a hash value is of value here. It's of importance. Using the hash inside the encryption allows B to be sure or with very high probability that the message hasn't been modified and it came from A. So let's consider what the attacker can do. So the way to understand these schemes is to consider, what if an attacker changed something? Will the authentication be successful or not? So we can think that what's sent from A to B is the message concatenated with a hash of that message encrypted with the key K. We'll just write that again and see what the malicious user can do to try and change something. That's what A sends to B here. Let's say our malicious user intercepts and makes some changes and then sends the modified message onto B. We're trying to see why this scheme works. What can you do as a malicious user? What would you try to do to try and fool B into thinking the message is correct? Can you see the message? Why can't you see the message? It's encrypted with the key K. And as the malicious user, we don't have the key K, so we have confidentiality in this case. So what can we do as the malicious user? What can we try to do? All right, try to find the key. We know under our assumptions that the key is long enough, we can't find the key, so we can't do that. What if we change the message? So this is a sequence of bits. Let's say the length is 1,000 bits and it looks random because when it's encrypted, the output of the encryption should be a random sequence of bits. Let's say the malicious just sends something else, something other than the message. What they send is they can create a completely different message and encrypt with their own key. They must use a different key in this case because they don't have the key shared between A and B. And let's say they create a different message, hash m' and calculate the hash of that message and send to B. What does B do? They try to decrypt. Does it work? How do you know it doesn't work? But that's the important point. Now let's assume we don't know whether the decryption works or not. Let's say we decrypt. So we perform the decryption of B and it's gonna get a bit long. Decrypt using K of everything that we received. Enough brackets, one more. That's what B does, they decrypt first. What's the output of that decryption? Something was encrypted with K of the malicious user and decrypt with a different key. What do we get as output? Random bits in this case. We don't get m' concatenate with hash of m' because it's using a different key. But we don't know that yet. B doesn't know whether it's the wrong key because it could be that the random bits that we get out are the correct bits. The point is that B may not know in this case. So they decrypt and they get some output. Let's say it returns some value m double prime. We don't know what it is yet. Concatenate it with a hash value, or write as lowercase h. It's called a triple prime. We'll see why I use different notation in a moment because now what we do, what does B do to verify? They decrypt it. They get the output, some sequence of bits which they think may be the message and some sequence of bits at the end which they think may be the hash value, the received hash value. Now they calculate the hash of the received message and compare it with the received hash value. So that's what they get after decryption and now we do the calculation. The hash of the received message, m double prime, does that equal? So we compare with the received hash value. Again, B decrypts, gets some message, some hash value, calculates the hash of the received message, compares the two hash values. Under what conditions are they the same? Or are they the same in this case? Does the hash of m double prime equal the received hash here? Hash triple prime. Well, it comes back to, well, what is this hash value? Hash triple prime. Well, because we decrypted with the wrong key, these bits here, m double prime, hash triple prime are random bits. There's no relationship between them. When we decrypt with the wrong key, even though we don't yet know that, we get a random sequence of bits. So it'd be very, very unlucky if this sequence of bits here in hash triple prime turned out to be the actual hash of m double prime. So they shouldn't be the same. That is, when we compare them, they will not be the same. This will not be the hash value of the message because the message is random and the hash value is random. It wasn't calculated. So they would not be the same and therefore b would detect an error. So the reason we use the hash value here instead of just encryption is that when we decrypted, maybe we weren't sure whether this was correct or not. If the message was an English phrase or sentence or paragraph, then for sure at this step we'd know there's something wrong. But if the message was a secret key, and we decrypt and we get some bits, we don't know if it's correct or not. So that's why we do the comparison of the hash values and that should give us very high probability that it is incorrect in this case. And we'll look at those probabilities later when we look at the analysis of cryptographic hash functions. So we can use hash values with secret key encryption. Can the malicious user pretend to be A? Why can't they? They don't have the key. They wouldn't in the same way here. They would not be able to generate, even if they encrypt something with the wrong keys, they would not be able to generate a value such that when we decrypt, the hash of this part will match this part. Because when we decrypt with the wrong key, these will be random bits. So this scheme is considered secure for confidentiality and source authentication and data authentication, data integrity. Any questions before we move on to scheme two, the next scheme? Right, the malicious user will not be able to generate the hash value because they cannot, well in this case, they can generate a hash value. They don't know the message so they cannot generate the same message because that's encrypted. So all they can try to do is generate some say, say new message and they can calculate the hash of that. But once that's decrypted, we won't get the original input when we decrypt. We'll get some other random values. And because they're random, the hash of the message part will not match the received hash value. So in this case, it's because of the decryption doesn't work and the hash value is just a confirmation that the decryption does not work. This is encrypt the message with the hash code using symmetric key encryption. I think the next one, the next one is a slight variation. Instead of encrypting the entire message, encrypt just the hash value. Let's try that. Here we take the message, calculate the hash of the message, encrypt that with a shared secret key between A and B, send the message plus the encrypted hash to B and B verifies. The message is sent in the plain, in plain text, in the clear meaning this is not providing confidentiality. Someone can see the message, but it should provide authentication. Again, let's consider what the attacker can do. What does A send? They send, we can denote it as M concatenated with encrypt, secret key K, and let's be more specific, KAB of hash of M. Although our picture here doesn't say KAB, we know it's shared between A and B. Militious user intercepts, let's see what they can do and try and send it to B. So the message is in the clear, so the malicious user knows the contents of the message in this case. What can they try and do? The first thing, try and change the message, see what happens. Let's say we change it to M prime, but we keep the encrypted hash value the same, we don't change those bits. Note that we don't, as the malicious user, we don't have to encrypt here, all we do is let's say the message is 1,000 bits and the encrypted hash value is 100 bits. So we have a total of 1,100 bits, all we do is the malicious user change the first 1,000 bits to our message of choice and just keep the last 100 bits the same. So we can do that quite easily, just copy and paste those bits and just change the message. Now B verifies, so look at the steps that B takes to try and check, has anything changed and did this come from A? B decrypts the hash value, that's the second part, they decrypt this part, and I forgot the key, decrypt using key KAB of the second part, what comes out? No primes here, what comes out? We start with the hash of M, it was encrypted with KAB and now decrypted with KAB, so the hash of M should come out. So we'll denote that this is the hash of M, now we compare the hash of M with the hash of the received message, M prime. Do they match? Where I use prime to denote a different message, M prime is different than M. Does the hash of M equal the hash of M prime? Our rules or our conditions of our cryptographic hash function are that the hash of two different messages will produce two different outputs. So the hash of M will not equal the hash of M prime and that's how B knows that there's an error here. Any questions so far on our use of an encrypted hash value as opposed to the entire message being encrypted? Okay, sure. All right, any questions at the back? So the point in this case is our error will be detected by B because of the properties of the hash function, the hash of two different messages gives two different outputs. So when the message was modified by the malicious user but not the encrypted hash value, when we verify at B, we compare the hash of the original message with the hash of the modified message, two different messages, two different hash values, so they're not equal and B recognizes there's an error. What could our malicious user try instead? Can they try something else to try and fool B into thinking that the message received is correct? If you modify the message, you must modify the hash value, okay? Otherwise B is going to easily detect. So there's an important point. If the malicious user changes the message, under most conditions, they must also then change the hash value. I say most conditions because in some special cases, if there may be two messages which when the hash produced the same output, that's called a hash collision. But we're assuming it's not possible at this stage. How can they change the hash value? What do they do if they wanna change the message as well? M' encrypt what? The hash of M' with what key? We can't use KAB, that's the point. So yes, we can change the hash value but we must encrypt that and it should be encrypted with a key shared between A and B. So what we're going to do at B now to verify, we can do it, we'll decrypt with KAB. We think the message comes from A. We decrypt the second part and we get some output. Let's denote it as H decrypted, D. Doesn't matter, just to denote it's some value. The decrypted value and we compare that value with the hash of the received message. Are they the same? If not, why not? Here in the encryption and decryption process, we have the hash of the message, the modified message in this case. It was encrypted with the key of the malicious user but we decrypted that using the key shared between A and B. So we'll get a different output. We will not get hash of M' as output. If we encrypt the hash of M' with one key and decrypt it with another key, the original plain text will not come out. If we encrypt and decrypt with the correct key, the original plain text will come out but if it's the incorrect key, we'll get a different output. So the output, HD in this case, will not be the same as H, the hash of M'. So again, B will detect that. They're not the same. And B knows there's an error. In all of these cases, the B doesn't necessarily know what was the change. They don't know whether the malicious user changed the message or changed both the message and the hash value, used a different key. All they know that something went wrong. But that's sufficient for authentication because B now says don't trust this message. It's not authentic. So I hope you're starting to see some of the same things happening that is for authentication to work, we have a few conditions or assumptions we're relying on. In this case, and it's coming back to symmetric key encryption, decrypting with the wrong key produces a different output than the original plain text. That's what happens here. Plain text is hash of M'. Encrypt with one key, decrypt with a different key, the output will not be the hash of M'. And as a result, HD will not equal to the hash of M'. And in the previous case, it was relying on the fact that the hash of two messages will not be the same. Hashed two different messages, you will not get the same hash value as output. If that's true, then the authentication works. That is, we can check whether the message is authentic or not. Anything else you want to do as a malicious user? You can't modify the message without modifying the hash value, but in this case, if you try to modify the hash value, you don't have the key to encrypt it correctly so that it will be successfully decrypted. Similar, if the malicious user wants to send a fake message saying this is from A, so A doesn't even send a message, the malicious user will not be able to encrypt it with the right key. They will not have KAB to encrypt it, therefore B will detect this message is not from A. Compare these schemes, A and B. They both provide authentication. The first one also provides confidentiality. The message was encrypted. Scheme B does not provide confidentiality of the message. Scheme B can be useful when we don't care about confidentiality of the message. We care just about authentication. Not encrypting the entire message can be much faster than just encrypting the hash value. If your message is one gigabyte, then in the previous scheme, encrypting that one gigabyte takes some time. In this case, we just calculate the hash of that one gigabyte and encrypt that small hash value, say 256 bits, which is much, much faster than encrypting a gigabyte. So avoiding encryption of large inputs where possible saves in performance. This scheme we went through last week. This was the one we covered last week where, again, there's no confidentiality. Everyone can see the message, but it provides authentication in a different way in that A and B have a shared secret S. So it's the same as a key shared between A and B, but it's not used for encryption in this case. It's sent inside or combined with a message and hashed. And the hash value is sent with the message and the verification compares. And we saw last week in this scheme that modifications by the attacker will be detected by B. The fourth scheme combines the third one with encryption. So we do have confidentiality of the message. So it's essentially the same as the previous one. Take a secret S, combine it with a message, hash, but then encrypt all of that with a shared secret key as a different key in this case. So that the entire message is secret. So just different ways to provide both authentication and confidentiality. All of these are using hash functions and optionally three of the four are using encryption. Symmetric key encryption to be precise. So we've said multiple times now, using just symmetric key encryption without hash functions or even without message authentication codes may provide authentication. But there are a number of cases when we want to avoid encryption, especially encrypting large inputs. The reason being is the encryption can be slow. Okay, you encrypt a large file, it takes some time that we may not want to have to wait if that encryption is not necessary. If you want to encrypt in hardware, it's usually much, much faster than software, but you need the hardware to do that. So there's some financial costs involved of using hardware that encrypts. Some hardware that is designed especially to encrypt may work for large amounts of data, but may be quite inefficient and have a large overhead for small amounts of data. Another problem with some encryption algorithms are there may be some licensing costs. The algorithms themselves are patent. There's a patent on multiple different algorithms, which means that in theory, if you want to use it, you may need to pay someone a license to use it. So that's another reason why sometimes if we don't need encryption, we'll try to use other approaches. And the other approaches we're seeing for authentication, we've gone through some approaches using hash functions. In the previous topic, we went through message authentication codes, MACS. I've actually rearranged these slides, so when it says MAC is covered in the next topic, it was actually covered in the previous topic. MACS and hash functions provide a similar capability, provide authentication. And sometimes we refer to message authentication codes as keyed hash functions. They like hash functions, but also take a key as an input. Hash function just takes the message and MAC function takes the message and a key, so a keyed hash function. And in fact, HMAC is simply using existing hash functions and introducing a key. So there's a lot of similarities between hash functions and MACs. The authentication techniques we've seen with hash functions all use symmetric key cryptography so far. Let's look at using public key cryptography. And we'll arrive at what we call digital signatures, which are commonly used nowadays for authentication. What do we mean by a signature? We usually use a signature to be able to prove, so we sign a message with the intent of later being able to prove that that message came and was approved by whoever signed it. So when I sign a document, maybe in a year's time, someone can see that document with my signature and use that to prove that I agreed to that document and it was authorized by me. So that's a handwritten signature, similar concepts with digital signatures, but we want to do it on data, on files, for example. We'd like a message and the person who gets that message would like to be able to prove that it came from a particular entity and it's been approved by that entity. You would only sign a message that you approve of. Importantly, symmetric key cryptography cannot be used for this purpose. We need to use public key cryptography and that was one of the motivations for the development of public key cryptography to provide digital signatures. Why can't symmetric key cryptography be used for this purpose? With symmetric key crypto, two users have the same key. A shared secret key between A and B is known by two people. So if someone receives a message, so if someone gets a message, they can verify that message. So if A sends B a message, B can verify that it came from A, that works fine, but we can't provide the service that someone else, say user C, cannot prove that a particular message came from just one person in the world. Because with a shared secret key, a message encrypted with a key shared between A and B may have came from A or B because both A and B have that key. So a third party, user C, has no way to prove that a particular message came from just one user. With symmetric key cryptography, they can prove that it came from one of two users, A or B. But that's not good enough in some cases. We'd like to better prove that it came from A only. For example, if A encrypts a message with a key shared between itself and B and sends that message to B, B can decrypt and B knows that it came from A. B knows it didn't send it itself. Therefore, the only other person in the world that could have created this message is A. But B cannot prove to someone else, such as user C, that that message came from A because that message could have come from B. B could be faking and saying, ah, A sent me this message, but in fact it was B that created it. So because two people can have the same key, there's no way to prove that the message came from one user. And that's one of the benefits of public key cryptography. For the private key, there's only one person in the world that should have that private key and will use that to encrypt. Let's look at the concept of signing or digital signatures. The concept is that we sign a message at the sender and we can verify at the receiver. And signing involves encrypting the message with your own private key. If I want to sign a message, I'll encrypt it with my private key. And then that signature is usually attached to the message. So in this slide, I say S is the signature and that will be concatenated with the message N. It doesn't have to be encrypted depending upon whether you want confidentiality or not. Verification, if it was encrypted with the private key of A, then verification involves decrypting with the public key of A. We know with public key cryptography, we use the keys in the key pair to encrypt and decrypt. So a user verifies a message by decrypting the signature. We receive the signature. It was encrypted with the private key of A. We decrypt with the public key of A. We should get some message as output and we compare this message with the original message. If they're the same, then everything's accepted. The signature is verified. If they're not, then we don't trust that message. So the concept of digital signatures relies on encrypting with a user's private key, verifying with a user's public key. But very common in practice is to not do it this way but to introduce a hash. Rather than having to encrypt the entire message and sending that signature with the original message, we take a hash of the message and encrypt just the hash value. And when we verify a game, we decrypt with the public key of the sender. We get a hash value as output and compare that hash value with a hash of the received message. So the concept relies on public key cryptography. But in practice, we incorporate also hash functions to make the performance much better. So let's go through some examples of how we can use a digital signature in the practical manner and then see what the attacker can do. We have user A, B. Each user has its own key pair. In addition, we assume users know other users' public keys. So what's known by A also includes the public key of B and B knows the public key of A and other users if we need. And the malicious user, we'll see in a moment, should know the public key of both A and B. They know the public values. So let's say A wants to send a message to B and wants to sign that message. So let's see the steps that they'll do. What they send from A to B is, they take the message that they want to send and they calculate the hash of the message and then they encrypt that hash value with a key. For A to sign a message to send to B, what key do they encrypt with? What key, so some people are saying the correct thing, what key do they encrypt with? It's not P, U, B. There's only four to choose from. P, R, A. If I want to encrypt, if I want to sign a message, I sign up using my private key. The same as I sign a document, I use my handwriting. The only person in the world, the idea is the only person in the world that has my handwriting to write my signature is me. So here, the only person in the world who has the information to sign this is A. We use the private key of A. So that's the signature component. To note that sometimes it's S, but we don't just do that, we also send the message. So we can catenate the message with the signature. Sign the message and then attach the signature to the message, the same as a document. You have the document and then at the bottom, you attach your signature. B receives this, in this case, the normal case with no attacker. They verify. What they do to verify is they take the signature and decrypt that signature using which key? The public key of the person whose claims come from. P-U-A. And what do they get as output? Hash of M. That is, S is obtained by taking the hash of M, encrypting with the private key of A. If we decrypt with the public key of A, if nothing's been modified, then the plain text should come out. Remember, public key cryptography, encrypt with one, decrypt with the other key in the key pair. And we've done that, so the output will be the hash of M. Now we compare that with the hash of the received value of M. M was received, nothing's been modified here. And of course they match. That is, when we decrypt the signature, we get the hash of the original message. And when we take the hash of the received message, if nothing's been modified, those two hash values will be the same because the hash of the two inputs produces the same hash value. And that shows B that this message is okay. Why not avoid the hash function? What happens if we just use the entire message instead of the hash function and signing? Here, if we have message encrypted with the private key of A, we get the same effect, but there's a significant performance degradation here. Let's say the message is one gigabyte. If we encrypt, if we take the hash of that message, that one gigabyte message, we get a small fixed length hash value as output. And for example, a SHA256 hash algorithm produces 256 bits as output. So if we use a hash function that produces a 256-bit hash value, the signature is just the encryption of that 256-bit value. Quite fast to encrypt small inputs and only a small amount to send across the network. If we didn't have the hash function, would A need to encrypt the entire message, one gigabyte? That takes a lot more time. And B, we need to send the message plus, well, we don't need to, but in this case we'd send the message plus the entire message encrypted again, a large overhead. We could get away with that case, but encrypting a large value is much, much slower than encrypting a small value, especially with public key cryptography. So using the hash is a significant benefit here. And when we talk about a digital signature, that's just how we'll define it from now on. Let's consider what an attacker can do to try and change things. A is going to, let's sign the message, so we'll do not write here the signature, encrypt using the private key of A, our message, that's the signature, and we're going to send the message concatenated with the signature each time. And the malicious user is going to intercept, and we've got to modify something and send it on to B, with the intent of fooling B into accepting the message. And what's sent by A is the message concatenated with the signature. What can you do? Malicious user receives, intercepts, again, let's try and modify some things. Let's first modify just the message. We'll go through all the cases. The malicious user modifies the message to be M prime. They don't change the signature. Again, let's say the message is one megabyte, the signature is 1,000 bits. All the malicious user does is changes the first one megabyte. The last 1,000 bits are just copied from the original message. So they don't change the signature. Send on to B, what does B do? Decrypt to verify. Decrypt using what key? Okay, so decrypt using, we received a message from A, we use P-U-A of the signature. What do we get as output? What do we get of the output of the decryption? Have I made a mistake? Of course, this is the hash of M. You'll fix yours better than mine. This is the hash of message. The concept is that we only need to include the message, but in practice we always do the hash of the message. So the output, the input was the hash of the message encrypted with the private key of A, decrypt with the public key of A, hash of M. Now we compare that value with the hash of the received message M prime. Do they match? Since M prime is not the same as M, our properties of the hash function mean that the hash of two different messages produces two different outputs. So we get an error here. They're not the same. Okay, so in this case, let's say the hashes weren't the same. How could the malicious user defeat this? How could it go undetected? Well, from here, if we could change M to M prime such that the hash of M does equal the hash of M prime, then it would not be detected by B. So a successful attack involves the malicious user finding some message M prime, which is different than M, but produces the same hash value. A hash collision, correct. So if we can find a hash collision, this scheme fails. We'll note that because that's an important, that leads to that requirement that we should be hard for someone to find hash collisions. So you're right in that, we'll see the requirements stated later, but if they can find some M prime, that's not the same as M, but the hash of M and the hash of M prime are the same, then they are ineffective. That is, if the malicious user could find this M prime, where the hash values are the same, then this digital signature scheme would not work. It would fail and that B would receive a message, think it's okay, think it hasn't been modified and it comes from A, but in fact it's been modified. So yes, that's the problem or the security of this scheme would depend upon how hard it is to find hash collisions. And it turns out for most secure hash algorithms, the difficulty of finding hash collisions depends upon the length of the hash. So you need a hash value long enough such that it would take too much time to find a collision. Hence, char 256, char 512, the hash length are 256 bits, 512 bits, they are considered secure in that it takes too long for an attacker to find the collision in this case. That there's two ways to attack here. Brute force attack is on the hash collision or find a weakness in the algorithm. MD5 is considered weak. Okay, so if you use MD5 here, like we saw in the previous lecture, we saw that you can have collisions. So people have found collisions. But if you've got a strong algorithm, then make the hash value large enough and a brute force will not be possible. Can the malicious user change the signature? They can change it, but will it be detected by B? Yes, because the signature uses the private key of A. So if we try and change the signature, we need the private key of A to make it decrypt successfully when B uses the public key of A. But we as the malicious user don't have the private key of A. So in a similar way as the other schemes, trying to change the signature component won't work either. Let's check that. Let's say we do try and change the signature. S' is calculated as encrypt. Malicious user receives M concatenated with a signature from A. They change M to M' and they try and change the signature to S' where S' is calculated by encrypting a hash of M' with what key? PRA? The point is that we don't have PRA. We'd like to use PRA, but it's private only for A. So it needs to be some other value. B receives and B verifies. Decrypt with the public key of A, S, which is really the same as decrypting with the public key of A. What is S? Sorry, S' S' is what's received. What is S'? It's this. Private running out of space. And I think there's one more bracket there. The point is B to verify takes the received signature, decrypts with the public key of A. What was the received signature? It was the hash of M' encrypted with the private key of the malicious user, or at least some value which is not the private key of A. What's the output of this? Or maybe easier, what is not the output? What is it not equal to? It's not equal to the hash of M' We start with the hash of M' It was encrypted with the private key of the malicious user That was decrypted with the wrong public key therefore we will not get the plaintext as output. So we don't get the hash of M' as output so then when we compare it with the hash of M' they will not match and B's detected the modification. So B will compare this value with the hash of the received message The hash of M' will not equal this value and again an error. Can someone create a message pretending to be A? Sign a message saying you are A. What do you need to do? Well you need the private key of A. So if A doesn't send a message but the malicious user sends a message pretending to be A the same problem arises as here. We don't have the private key of A So when someone tries to verify that message it will be detected as an error. So to defeat digital signatures the malicious user if they can find hash collisions then they can defeat it and that leads to a security requirement for digital signatures and hash algorithms in general it should be computationally hard to find hash collisions and we've stated that in a couple of points along the way which ones so we said here if the malicious user can find some different message such that the hash of the modified message and the original message are the same then they will defeat the security of digital signatures and going back to last week well jump back to what we had last week did you not see that? Maybe we will. I can't find last week's. Right, you'll look it up. Let's go to the slides that'll be easier. So we've gone through digital signatures we'll say a bit about the algorithm shortly but we're leading to some of the requirements on hash functions we just saw it should be hard to find hash collisions last week from memory we saw there was a case where it should be hard to go backwards the one-way property given the hash value it should be hard to find the original message that was used in scheme B of the authentication where we hash the message and the secret so we've seen two requirements on hash functions hard to find collisions and hard to go backwards and that leads to the one-way property and collision-free properties we'll introduce them today and then look at the algorithms and finish so let's define some terminology we'll need to to understand the properties of hash algorithms so we say the hash value H of which is the output of taking the hash of some input message X another way to talk about that is to say X is the pre-image of H so some terminology calls X the pre-image of H H is the hash value X is the message X is a pre-image of H the hash function is a many to one mapping there are many possible inputs which will map to one output because the hash function takes a variable size input and uses a fixed small output we know that there will be some inputs that map to the same output there must be that is there must be with our hash function it must be such that H some hash value H has multiple pre-images another way to think of that multiple messages will map to the same hash value we will have collisions so we say a collision or define a collision as occurring as if two messages X and Y are not the same but the hash values are that is the hash of two different messages produces the same hash value we call that a hash collision or simply a collision our hash functions in theory produce collisions but we're seeing in the requirements for security it should be hard to find collisions so they're undesirable in theory collisions happen in practice that should be hard for someone to find collisions so there's a difference there how many collisions are there maybe we can calculate that as an example for example we have a a message I think we may have done this with a a thousand bit input and a hash value which is twenty bits in length how many collisions on average so we need to think of how many possible messages so let's say the message is fixed in this case every message is a thousand bits long possible message two to the power of one thousand with a thousand bit message seven of one thousand and possible hash values possible outputs, two to the power of twenty note that the hash function should produce random outputs so we talk about on average how many messages map to the same hash values two to the power of one thousand inputs, two to the power of twenty possible outputs, on average There'll be 2 to the power of 1,000 divided by 2 to the power of 20, whatever that number is, that many messages will map to the same hash value. That's many collisions will occur on that hash value. So there will be collisions. And remember, messages are normally variable size inputs, so a message may be a gigabyte. Hash values are usually short, say 256 bits. The security of hash functions will depend upon it being hard for someone to find those collisions. Even though we know they are there, find two messages that make sense that produce collisions is not as easy as it looks, even though there are many possible collisions. So on average, there'll be 2 to the power of b minus n, where b is the number of bits in the hash, in the message, and n is the number of bits in the hash. So the hours would be 2 to the power of 1,000 minus 20 to the power of 980. So in theory, there are collisions. Let's look at the requirements for hash functions for cryptographic users, and we've mentioned them along the way. H should be any size normally. Now hash functions should be designed so it can work on any size input. And the hash value that comes out is fixed and usually small. For the practical purposes, we usually make the hash value small. For performance, it should be easy to compute the hash function. It shouldn't take too long to calculate the hash, even for large messages. Too long may be, even if it's seconds and longer, that's okay. It shouldn't take years to calculate the hash value. The next three requirements will actually jump to the bottom one. We expect that the output of the hash function, it produces random hash values. Two slightly different inputs should produce two completely different outputs, random outputs for the hash function. And these three are security requirements. We'd need these under different cases to make sure that when we use hash functions for security, that they cannot be defeated. Let's just state them today. Any questions before we finish up? Still dreaming of using Windows and IP config? Any questions at the back? Let's look at these three security requirements to finish today, and we'll analyze them on Friday. First, the one-way property. We need it such that if we know the hash value h, it should be computationally infeasible to find the input y. That is, it's easy to calculate the hash of y to find lowercase h, but it's hard to take the hash value and get the original message. That is, the hash function is easy in one way, but hard or basically impossible in the opposite way. This is called the one-way property or sometimes called pre-image-resistant. We'll talk about the meaning of these definitions later, but we've said that so far, we've called it the one-way property. Another property we need for security purposes, if we're given some message x, it should be hard to be able to find some other message y such that the hash of y equals a hash of x. It should be hard to find collisions, but this is referred as the weak collision-resistant property. The hash algorithm should be resistant for collisions, but this is the weak property. The next property, strong collision-resistance, it should be hard to be able to find any two messages x and y where the hash of both of them are the same. Maybe the hardest thing of understanding these three security properties is the difference between weak collision-resistance and strong collision-resistance. They seem the same, but they're slightly different and there's an important difference, especially with respect to measuring the security of hash algorithms. Weak collision-resistance means that here, I give the attacker some message x, and their challenge then is to go and find some other message y, a different message which produces the same hash value as the hash of x. So what the attacker must do then is find a message which produces the same hash value as a given value. The second property of strong collision-resistance says the attacker can go and choose any two messages x and y and try and find any two messages that produce a collision. If you're an attacker, which one's easier for you? Is it easier to be given a message and asked to go find another message that produces the same hash value or you'll have the freedom to go and find any two messages that produce the same hash value? The second one is easier for the attacker. It's easier if you have that freedom. You can find any two messages that produce the same hash value whereas the strong collision-resistance from the attacker's perspective trying to find any two messages that produce the same hash value is easier than the challenge of here. Here's a message, go find some other message that produces the same hash value. And it turns out if we need strong collision-resistance then that is one of the limiting factors on the security of a hash algorithm and we'll compare hash algorithms with respect to that. What have we missed? We skipped over this slide saying that digital signatures, when we encrypt with the private key and then decrypt with the public key what algorithm do we use to encrypt and decrypt? RSA is a common one but there are others. DSA, the digital signature algorithm and some variants, El Gamal and a few others. RSA is very common but the others are also used in some cases. There are different algorithms used for public key encryption for digital signatures. We'll stop there, have a think about those three security requirements, especially the difference between weak and strong collision-resistance and we'll come back to them on Friday and do a little bit of analysis about the security of them.