 So here we have, so we're going through three or four different examples of using hash functions for authentication, but they have different purposes. In this example, the purpose is not for confidentiality, it's just for authentication. So there's no encryption of the message. In the previous example, there was also encryption of the message. Sometimes we want to encrypt the message, sometimes we don't need to. In practice, if we don't need to encrypt the message, it's more efficient in terms of performance not to, because encrypting a message takes time and effort. So in this case, we encrypt only the hash function, and in all of these mechanisms, the way to understand them is to look at what can an attacker do? How can they break it? So what we went through, and I'll try to draw it again, the two different approaches. The first approach, so the attacker intercepts the message. So this is sent across the network. The attacker intercepts, so they take a copy of, or they intercept before it gets to be, and as we saw on the board, one option is the attacker can modify the message. So instead of being M, it's M prime, a different message. So they change the message. But they don't change the rest. That stays the same. And then they send that, all of this, on to B. And then we look at what does B do? Well, all B does is follows these steps. When they receive the message, they take a hash of that received message, and then they compare that calculated hash value with the received hash value, which is H of M. In fact, in this case, they must first decrypt this received value, and they'll get H of M. They'll compare that with the hash of M prime. And one of our desired properties of hash functions is if we have two different input messages, we'll get two different hash values. So here we have M is the input of the hash function. And at the receiver, M prime is the input of the same hash function. Two different inputs, we'll get two different hash values. When we'll compare, they will not be the same. And when they're not the same, the receiver assumes something's gone wrong. They don't trust what they receive. If it is the same, they assume it's okay, there's nothing gone wrong. So there's no way to detect what's gone wrong is just detecting something has gone wrong. That's sufficient to provide security. So the top is showing one simple approach for an attack, just change the message. It doesn't work in the receiver detects that. The bottom case is, well, can we do something else? Let's change the message and recalculate the hash of that message. So take M prime, calculate the hash. But we need to encrypt that with some key because the receiver expects it encrypted with some key. Here, the attack fails in that the attacker doesn't know the secret key K. So they must use some other key. Let's say they randomly guess a key. Try and guess K. Of course, we assume that K is large enough such that they cannot guess it in reasonable time. So it will be some other K here, K prime. This is sent to be the receiver. B follows the same steps. Receive message, M prime, hash of M prime is at this side, hash of M prime. But here, this is what's received. Decrypt using K, it was encrypted using K prime. When we decrypt using a different key, either we'll detect that there's something gone wrong. It doesn't successfully decrypt or we get a different output. Because when you decrypt a ciphertext with the wrong key, the plaintext you get out will not be the same as the original plaintext. So if that's the case, the original plaintext was H of M prime. If we decrypt this with the wrong key, we'll get something different than H of M prime. We'll get the wrong plaintext and therefore we'll compare it to H of M prime and again they'll be different. So the receiver detects because they're different something's gone wrong. Any other way that the attacker can try and defeat this security mechanism? What else can they try? So here was change M without changing the rest. Here change M, change that, recalculate the hash. But we don't have the key to encrypt correctly. Anything else? Encrypt with a different E. A will not help because we'll get the strange output here. And remember the algorithms are known and agreed upon. If M uses AES with a 256 bit key, B will use AES with a 256 bit key. And the attacker knows they're using AES with 256 bit key. The algorithms are publicly known, so we assume that. So even changing the algorithm will not help. It'll make things worse. Anything else? You're an attacker, how can you defeat this system? Well, I think we've covered the two main approaches. What else can we do? What else can we change such that it will not be detected at receiver? I don't think there's anything in this case. If you're lucky, you change the message. Okay? Or let's say in this case we change the message. And the hash of M prime is the same as the hash of M. So if it's such that the hash of these two different messages, M and M prime are the same, then it will go undetected. The attack will be successful. So that leads us to why we have one of these properties. If, using the top attack, we just change the message, if the hash of M prime as here equals the hash of M, then the receiver compares them, finds them the same, and assumes everything's okay. And that's a failure of this security mechanism that the receiver has received a message M prime, which is different from what was sent. So that's why we require hash functions to hash, produce different hash values with different inputs. If it didn't, if it produced the same hash value with different inputs, this mechanism would not work. Or another way, a more practical requirement is it should be hard for the attacker to find another message M prime with the same hash value as M. So if you know some message M and you know it's hash value, it should be hard to find another message with the same hash value. In theory, it's possible what we need in practice, it will be practically impossible. That is, it takes too much time to find. So we'll return to that when we look at our properties. The previous case, example A here that we went through first, is effectively the same, except we now encrypt the entire message. So again, if you look at what the attacker can do, you'll find that there's nothing they can do that will go undetected. If we modify the message, well, it's all encrypted. We cannot do that. In fact, in this case, we don't really need the hash function because the encryption itself provides a form of authentication. Because if we encrypt something and we send the ciphertext and someone changes that ciphertext, then when the receiver decrypts, they'll get an error or get the wrong output. Assuming they can detect that the output is wrong. That's where the hash function comes useful. The key in this case provides the authentication that it came from the right user. If it successfully decrypts, then it must have come from someone who knows key K. And from B's perspective, the only other person in the world that knows key K is user A. Because by definition, K is private. It's shared between the two. No one else has K. There are other variations as well. We'll just go through two or three examples. This is a different one. Here we're going to use a hash function to provide authentication. No confidentiality. The idea is that both A and B have some shared secret S. A has the value S, like a secret key. So does B. No one else knows S. We're not going to use S for encrypting anything though. As you see in here, there's no E or D. There's no encryption. What we do is we take our message. A takes that secret value S and concatenates with the message. And we take a hash of that. So the hash of the message and the secret. Combine that with the original message and send across the network. What B receives is the message concatenated with a hash of the message and the secret. And what B wants to do is confirm this message came from A. That's the objective of this one. Make sure the message came from A. So what they do is they take the received message combined with their known secret S, take a hash and compare with the received hash value. If they match, B assumes this came from A. If they don't match, assume something's gone wrong. Try and defeat this. You're the attacker. Show me how to defeat this. Try it. Look and try and find the different things the attacker can do such that when B receives a message, it thinks the message came from A, but it came from someone else. Think about what can possibly go wrong. So the way to think about it is try and draw on here on your own notes and think, what if the attacker changed this value or changed this other value? What would happen at receiver? Would it be detected or not? Try and work through it for a few minutes. Discuss with your friends and see if you can defeat this. Remember, the objective here is for the receiver to confirm that the message came from A, that we cannot perform a masquerade attack. Pretend to be someone else. Anyone have a way to attack this? There's one theoretical way that we can defeat this. In this case, we're not trying to modify the message as an attacker. We're trying to, let's say, send a message and B should think it came from A, but in fact it came from the attacker, if we can draw some notes. So in this case, the attacker, if they generate a new message and send it to B, what can they do? So the attacker generates a new message and send to B. What can they send such that B thinks it came from A? Well, let's just make note. What is sent here? It's the message M concatenated. So we take M and we concatenate here with a hash of the message concatenated with a secret. That's what's sent normally. The hash function is applied on the secret and the message, and we also send the message. What can the attacker do? Anyone want to attempt? So let's say the attacker wants to send a message and pretend to be from A. They take some message, any message. It doesn't have to be the same as M, M prime. So what they would do is concatenate that with the hash of M prime concatenated with what? S prime, let's say. Because we don't know S. So if that's what the attacker sends, the receiver follow the procedures of take the message received, M prime, combined with their value of S. So here we would have M prime combined with the secret S known by B and shared with A, and take the hash of M. So here we would have the hash of M prime and S. And we compare with the hash, this value would be the hash of M prime and S prime. That is what's received by B. The hash value received is this. And B takes the received message, M prime, combines it with their value of S, concatenates, takes a hash, and now compares these two hash values. Are they the same? No, because S is different. Here we have S, we have S prime. The attacker needs to know S for this attack to be successful. If they don't know S, then when B receives the message, they will compare, and since B uses its value of S, their secret, they will know that it came from someone else, it didn't come from A. So that's how this one works. How can we defeat it? What does the attacker need to do to defeat this? Find S. How do we find S? We cannot guess. Assume S is a 200-bit value. Guessing is a brute force attack which won't work. So how do we find S? Try and find S, given the public information here. Find S. How could the attacker find S? Again? Remember this is what was originally sent. If the attacker intercepts that, they know M. That's fine. M is not supposed to be private. They know M. Let's assume that they know the length of the hash value, because the hash functions have a fixed length. So if the hash function is known, the length of the hash value is known. So the attacker knows this value. So the hash value is 128 bits, and the message is the rest, let's say 10,000 bytes. So they want to find S. What do they do? Try every value of S that takes you 10,000 years. I'll wait for you to finish. Anyone else want to do it a bit faster? It's not a... We'll assume that brute force attacks will not work. So on a key or on a secret, we just say it's large enough that trying every possible value is not a feasible approach. What else could we try, though? Interrogate. Again, we assume that if A has S and B have S, they've somehow secretly exchanged them. They don't have to send them across the network. So let's say, in a private letter, they exchange the value of S. So we assume that secret is secret, and there's no way to find that through other means. We cannot interrogate someone and ask them for value of S. Again? Inverse H. The attacker knows H of M concatenated with S. If we could calculate the inverse, what could we do? Yeah, if we could calculate the inverse, we'd get MS. That is, the attacker knows H, the hash of M concatenated with S. If we could calculate the inverse, then we would find M concatenated with S. And then it's easy to find S. Because if you know M, and we do, and if you know M concatenated with S, then S is the leftover. Okay, and that leads to our requirement. But we cannot calculate the inverse. We've said one of our desired properties of the hash function is that it's one way. We cannot calculate the inverse. And here's an example of why or when we'd require that property. So if we could calculate the inverse, let's see how it works, and then come back to our requirements. So this is known. This is known by the attacker. So what we do is we take the inverse, and let's say this value here is lowercase H. So that's uppercase M, the message, and lowercase H, the hash value is known. So what the attacker does, takes the inverse hash function of lowercase H, and what's that return? M concatenated with S. So if we could calculate the inverse, we'd get M concatenated with S. So we know M concatenated with S. Therefore, it's easy to find S. Because if M is 10,000 bytes, M concatenated with S is 10,000 bytes plus 128 bits. That last 128 bits is S. There's no encryption going on there. It's just joining two values together. So it's easy to find S if we can calculate the inverse. And once the attacker knows S, then they can pretend to be someone else. It's no longer secret and no longer secure. So this is an example or a demonstration that if we want this mechanism to work for authentication, we need a hash function where calculating the inverse is impossible, or practically impossible. So our hash functions have different requirements, depending upon their use. In this case, we require the one-way property, the inverse to be true, the inverse to be hard to calculate. And we saw in the previous example, we required this property that the hash of two different values should not lead to the same hash value. If that was true, then the previous mechanism worked. If it wasn't true, the attacker would break the previous mechanism. So demonstrating those properties Any questions on that one before we move on? So this is only broken if the inverse hash can be calculated. And in practice, if most hash functions, it cannot be calculated. It takes too much time. This is a similar one. We will not go through in any detail. It's the same as the previous one, but we then also encrypt the message. So we use the secret in the same way, but we also encrypt the message. So we can combine them. There are other combinations as well. Why would you choose one over the other? Well, there are different advantages, especially avoiding encrypting the message is good for performance. So if you want to provide authentication, if you need to encrypt the entire message, that takes time. So if we can provide authentication without encrypting the entire message, that can lead to better performance. So sometimes we don't want to encrypt the message. Either we encrypt a hash of the message, which is smaller, or we don't use encryption at all, like we saw in the previous one. So that's the comment here, that sometimes we'd like to do authentication without having to do encryption. Because encryption takes time. It requires resources. It can be slow in software. If you need hardware to do it, then there's an extra expense. If we, in some cases, even encryption hardware is not very efficient if we have a small amount of data. It's only efficiently large amounts of data. And of course, encryption algorithms, some have patents. You need to pay a license to use it. So there are costs involved with doing encryption. Therefore sometimes we want to avoid that. So the previous four used hash algorithms to provide some form of authentication. And a later topic, the next topic, we'll see an alternative form which is called message authentication codes. We will not talk about that now. That's in the next topic. But there's a relationship with the hash function. The previous cases, when we did use encryption, we used symmetric key encryption. Here we'll use encryption but public key encryption. And we'll arrive at a digital signature to see the process. The first one we're going to provide a signature. And in the second one we're going to provide a signature and provide a secret message or a confidential message. So the second one, the second diagram here is also provides a signature like in the first diagram but we also encrypt the message. So let's just go through the first one. Here we're trying to send a message from A to B. And when B receives the message, we want to confirm that it came from A. We don't care if someone else sees the message. And we'd like to be able to prove that it came from A. And like a signature, when you sign a document, handwritten signature, that's some form of proof that it came from you that you recognize that document or that text. And later it can be proved that you agreed to that document. Because in a year's time when someone sees that piece of paper with your signature on it, that's proof that you agreed to that. We want a similar thing with a digital message. Let's see how it works. We take our message and we want to sign this message. So to sign it, we take a hash of the message and the main reason here to take a hash of the message is for performance. We take a hash of the message and then we encrypt using a public key algorithm using the private key of the source. So this is the source user A. We encrypt using, for example, RSA and the private key of user A. So we encrypt the hash value of that message. We can catenate with the message and this encrypted hash value across the network. The encrypted hash value is referred to as the signature, the digital signature. So in fact, we have a message and a corresponding signature. We send them. The receiver receives and the receiver uses the public key to verify that signature. So we take the received message, calculate a hash of the received message and compare with the received signature and we can decrypt the received signature using the public key of A. If they match, then everything's okay. If they don't match, then we don't trust that signed document. First, before we go and look at what an attacker can do, here we're using a private key of A to encrypt the hash and then a public key to decrypt. We say that using the private key the private key provides some signature of a document or of a file, of a message. We could have used... What if we used symmetric key algorithm here? Instead of RSA, use triple des for example and use k here and secret k here as well with a symmetric key algorithm. What's the difference? Same except here is k and here is k and a symmetric key algorithm is used. Okay, you're correct. If we use symmetric key algorithm where... same approach, take a hash of the message encrypt with secret key k decrypt with secret key k with symmetric key, they both have the key k it will work except... how many people have the key k? Two. Two people have the key k. Now, this is a problem because B receives or B has this document Now, later we want to prove to someone else that this message came from A. Can we do that? The message could have been encrypted by B because B also has key k. So, with symmetric key cryptography both two users in the world have key k and hence the message could have came from either A or B. There's no way to prove that the message came from A because both A and B have the key. So maybe B is a malicious user. They create this signed document using the key k there's no way for anyone else to prove that that signed document was from A because it could have also been from B. Whereas with a public and private key pair if we encrypt with a private key only one user in the world has this private key user A. So now if we have this signed message if we decrypt with the public key and it successfully decrypts it confirms that that signed message originated at A and that's proof that the message came from A and no one else in the world. And that's what a signature is assuming that your signature cannot be forged then your signature on a document means that this document is signed by you no one else in the world can sign it. But with a symmetric key cryptography it possibly came from two users with public key cryptography it must have come from just one user. So that's the difference of using symmetric and in this case public key cryptography. So a digital signature is provided just by using public key cryptography. We cannot use symmetric key cryptography. Of course the other problem with using symmetric key cryptography is that if someone else, a third party wants to confirm, here's a signed message. We want to confirm. So user C wants to confirm that it came from A. So in this with public key cryptography they just need the public key of A. That's public. But if we use symmetric key cryptography we would need the secret key K to confirm that it came from either A or B. So here by having a public key anyone can confirm that this message is from A. And that's what a digital signature is. Confirm that some message comes from one particular user. Any questions about that concept of a digital signature? And that's important. We see signatures are used in a lot of real network communications. So importantly a signature is when we encrypt using the private key of the source. This is user A, encrypt with A's private key. A signs the message. In fact we don't need the hash function. We could encrypt the entire message. But for practical reasons A when we use the encryption of the hash value it's much smaller than the message. So if the message is 10 megabytes and the hash value is 128 bits encrypting the hash value is much faster than encrypting the entire message. So for practical reasons we encrypt the hash value. So a signature is encrypting the hash of a message with the private key of the source. And it can be verified by anyone that has the public key of that source. Try to attack that scheme. Think about what an attacker can do to make someone think that the message came from A it was signed by A but it's a different message for example. Change M, see what happens. Or see what's required to change M. If you change M then you're right. You need to determine the private key. What else can happen? Or one way is to determine the private key. Anything else? There's another approach. Try and work out what can an attacker do such that B has a message and thinks that message has been signed by A. See what the attacker can do. I think you'll see quickly if you know the private key of A you can break that. But we assume that the private key of A to use it A, no one else knows it. What else can we do? Try and work it out. See what an attacker needs to do to get something undetected at the receiver. I'll try and come up with a diagram. Anyone have an attack that you can perform? Or what's required for an attack to be successful? First let's see what happens if we try some simple things. What if we just change M? So the attacker intercepts what happens? So we intercept the message and the attacker changes what was originally sent was M concatenated with the signature of M. So let's just change M to be M' and let's not change anything else in a simple attack. So concatenate M' with the previous value that was sent which was encrypt using the private key of A of the hash of M. So in this case, so here's M' what the attacker has done has taken the receive value sent by user A and they've just changed the message. So I'll just write down what was sent by A. So that's what's originally sent by A the attacker takes that and just modifies the message. They don't need to change the signature here. Let's say the message again in an example is 10,000 bytes and the signature is 128 bits because our hash value our hash function produces a 128 bit output when we encrypt that we get 128 bits of ciphertext so we can say we have 10,000 bytes of message and 128 bits of a signature that is sent by A all user the attacker does is takes the first 10,000 bytes and changes them to whatever the message they like M' and take the last 128 bits and concatenate that with their new message M' then they send what happens at the receiver B B takes the hash of the received message so we get here the hash of M' and then they take the received signature which is E the encryption using the private key of A the hash of M they decrypt using the public key of A so the output here is the hash of M and compare and this is M' are they the same? No assuming our property of our hash function is true in that here we have two different messages M and M' we take the hash of them we should get two different hash values if we do not and we compare then now B detects something's gone wrong and therefore doesn't trust the message if they are the same hash value then B would assume that the message is signed by A and would trust the message but in this case they should be different assuming our hash function has this property that two different messages two different hash values so that attack of just changing the message doesn't work it doesn't fool the receiver what else can we do and it's similar to what we've seen in the other mechanisms what else can we do is the attacker again okay if we receive this we modify the message recalculate the hash value H of M' and then encrypt with some private key but what private key we don't have A's private key it's private if we encrypt with a different private key the private key of C and then send you'll see similar that the receiver will take the M' get hash of M' and then they will take the signature which was encrypted with the private key of C they when they think it came from A decrypt with the public key of A something goes wrong here we'll get the wrong output here so if we if the attacker tries to recalculate the hash and use a different private key because B uses the public key of A again we'll detect because we're using different keys for encrypt and decrypt and anything else we can do so we see that the attacks that we've and we've seen them in the previous examples as well are unsuccessful and what's what's a requirement on our hash function in this case and I think we've seen it before so this demonstrates this requirement that the two different hash hash values of two different messages need to be different if they weren't then we could defeat this signature more precisely if the attacker can choose a message M' which has the hash value which is the same as the hash of M then they can defeat this security mechanism so if the attacker can find M' where the hash of M' is the same as the hash of M then the signature mechanism does not work and that's one of the important requirements of hash functions it's not just that we have different hash values in fact in theory some messages will have the same hash value in practice what it should be is practically impossible for the attacker to find in this case to find another message with the same hash value of the existing message given M or given the hash of M it should be practically impossible to find another message M' which has the same hash value and that leads to some more details about our properties we will not go through this simple hash function today we will go through that another time let's return to our properties but in a more formal manner so we have said so far we have demonstrated two properties the hash of two messages should produce different hash values and the other property is this one way property if you can if you know the hash value it should be practically impossible to find the original message we have seen examples of how if we don't have those properties the security mechanisms will fail let's look at them in more detail and we introduce some new terminology pre-images and collisions for some hash value h lowercase h which is the hash function of x, x is called the pre-image of h some terminology we have said that the function h is a many to one mapping in that assuming and always in the case of how we use hash functions because our input message can be larger than the size of the output hash we will always have multiple input messages that map to the same hash value which defeats our or is against our requirement of security see if we can draw that this many to one mapping is simple for the first example let's say we have a hash function that produces a 2 bit hash value h is 2 bits in length and our hash function let's say takes 4 bit messages as input in fact it may be a hash function should be able to take a variable length input 4 bits, 3 bits, 5 bits, whatever let's consider the example our hash function takes a 4 bit message and produces a 2 bit hash value how many possible messages as input how many possible messages 16, 2 to the power of 4 you can list them all give them all 0 all up to 1111 so there are 16 possible messages say m1 m2 m3 up to m 16 in this case with a 4 bit message how many possible hash values how many hash values 4 so we take a hash function h takes a message and produces a hash values output and the hash values h1, h2 h3 h4 in this simple example will we have two different input messages that produce the same hash value yes or no yes we must in this case because we've got 16 inputs we've got 4 possible outputs some of these inputs must map to the same output which what is the mapping well that's what the hash function determines but we know for sure that we cannot have unique outputs for all the possible inputs for example this message with our hash function may map to this hash value m2 maybe to a different value m3 to 1 value some of the messages must map to the same hash value on average how many how many messages map to 1 hash value 4 because here we have 16 inputs 4 outputs so on average 4 messages assuming it's a random mapping 4 messages map to the same hash value we've just gone through mechanisms and we said a requirement for our security is that we cannot that we do not map to the same hash value but here in this simple case if the message is larger than the hash length then we will map to the same hash value using our terminology we say for example in this case m2 is a pre-image of h1 so in the slide x is a pre-image of the hash value and we have a many to 1 mapping many messages map to 1 hash value in this example on average 4 messages map to 1 hash value when multiple messages map to the same hash value we call it a collision a collision of those messages we have a collision we don't want those in practice so this is the definition a collision occurs if the 2 messages are different but the hash values are the same collisions are undesirable we've seen from our security mechanisms if we have collisions our signature and so on will not work so what do we do about that well how many collisions do we have images in this case we saw 4 on average in general if h takes a b bit input so the input is b bits in our case it was 4 bits there are 2 to the b possible messages in our case it was 2 to the power of 4 or 16 possible messages and for an n bit hash else it was a 2 bit hash and where b is greater than n which is normally the case because in practice we need a small hash value and we allow we need to support many possible messages and usually variable length messages so the size of the message is larger than the length of the hash 2 to the n possible hash codes or hash values and on average if we have a random distribution 2 to the b minus n pre-images for each hash value that is 2 to the b minus n messages mapped to the same hash value if m is 5 bits how many messages 32 messages on input and h is still 2 bits still we have 4 possible hash values how many collisions 32 mapped to 4 so we have 8 on average 8 messages and that is 2 to the power of 5 minus 2 equals 2 to the 3 equals 8 in general 2 to the power of b minus n we would like that value to be small we will see some other requirements shortly so we have some problem where in theory we can have collisions we will have collisions if the length of the message is larger than the hash value which is useful for practice but for security we don't want collisions so let's look at some requirements some more specific requirements of our hash functions cryptographic hash functions hash functions used for security for cryptography so for practice we need a variable or we need a function h that takes any length input the message can be 10,000 bytes 10,000 bytes a megabyte we would like our hash function to work on any length input and we would like a fixed size output and usually small output because when we have a small output it will reduce the overheads when we send that across the network so for performance reasons usually a fixed size small output so the hash value small the message any size the value is n bits the message is b bits in the previous slide it needs to be easy to compute the hash of some message the hash of x should be easy to compute so fast and then we have these properties that we saw already needed for security we will define them in a bit more detail last one and we will go to that quickly the hash function should produce a random hash value so the mapping from the message to the hash value should be a random mapping or a pseudo random mapping if you take the hash of 000 and get 00 and then the next value and get the same 00 the first four map to 00 the next four map to 01 the third four to 0 to 10 and the last four messages to 11 that would not be a random mapping there will be some structure in that mapping and that's a problem as well because we can predict easier what the hash value or what the message would have been so generally we require the hash function to produce some random hash values output let's look at these three properties we will just introduce them today and we will cover them in more detail next week the pre-image resistant property is also called the one-way property we have two different names mean the same thing and that's what we've seen already this requirement that if we know the hash value lowercase h it should be computationally infeasible or practically impossible to find the message y so this is the inverse operation should be hard if you know some value x it should be easy to calculate hash and get the hash value but if you know the hash value it should be hard to find the original message that's called the one-way property our hash function should work one way easy one way impossible the other way also called pre-image resistant the next two properties are about collisions and again they have different names you will see different names it's a bit confusing I prefer to use the ones in brackets here one-way property weak and strong collision resistance these weak collision resistant and strong collision resistant are about the ability to not have collisions no collisions the first one says if you have some message x it should be practically impossible to find some other message some other different message y which has the same hash value as x so you give me message x I can find the hash of x that's easy given the message x it's easy to find the hash of x but it should be practically impossible for me to find some other message y that also has the same hash value as x this collision in this case it's not a good example but message m3 0010 map to 10 given that I know the message m3 I know the hash value 10 what I'd like if my hash function has this weak collision resistant property it means it should be practically impossible for me to find some other message that produces the hash value 10 in this case it's easy for me to do it because the number of messages is quite small and the hash value is a small but in general it should be hard for me to find some other message that produces the same hash value a collision in this case we need this for our security mechanisms to work our signature relies on this property if our hash function did not have this property if I could find another message y with the same hash value as x then I can defeat the security mechanisms that we went through in the previous slides the last property is an extension of that it's called strong collision resistant it should be computationally infeasible to find any pair of messages x and y which have the same hash value and this is the confusing part distinguish between these two properties the first one weak collision resistant the attacker has some message x they need to find some other message y with the same hash value the second one strong collision resistant is the attacker has the opportunity to choose any two messages x and y to have the same hash value which one is easier for the attacker to do strong a weak or strong collision resistant from the attacker's perspective which one do you think is easier strong collision resistant why the names get confusing but think from not the name but what the attacker needs to do which one do you think is easier for the attacker hands up for weak hands up for strong easier for the attacker strong collision resistant is easier for the attacker to break that is so the attacker's objective is to find a collision in weak collision resistance the attacker is given some x what they need to do is find some other message with the same hash value as x with strong collision resistance the attacker gets to choose any two messages and any two messages that produce a collision and that's easier because they've got more opportunity to find any combination of messages that produce a collision there are messages that produce collisions in the attacker just needs to find any pair that produce a collision with weak collision resistance they need to find one message that produces a collision with another given message so it's easier for the attacker to defeat this property the strong collision resistance what we'll see next week is that some hash functions depending upon where they used may have a selection of properties these three properties some don't need any of the properties some hash functions need one-way property we've seen an example when we took the hash of the message concatenated with a secret we required the one-way property and some hash functions need the weak collision resistant property and others need the strong collision resistant property depending upon the scenario when they used some more examples an explanation of the strong versus weak collision resistance looking at the birthday problem is one example but there's a lot of hard thinking to understand that so let's stop there and let's finish with one example about something else just quickly first before I show an example just a packet capture with Wireshark just a quick one any questions about what we've covered these properties we will cover next week we'll go through some more discussion for those four actually there were six diagrams showing the authentication mechanisms it's useful if you can think about them from the attacker's perspective these types of diagrams they show an authentication mechanism to see how they work one way to analyse them is think from the attacker what can I do what can I do that will defeat the receiver and that's a way to see how secure they are and also understand the properties of hash functions you generally don't need to remember these mechanisms for example in exam I'll give you this and then ask what can the attacker do to defeat this mechanism quick example to finish for today an example of a hash function although it's in the later slides some hash functions the names of them MD5 is one SHA is another the secure hash function the end of the slides there's SHA1 and there's SHA2 let's try them and what I'm going to do is I've got a file on my computer it's the lecture notes the PDF of the the slides that we've been presenting it's 616 357 bytes long that file one way or one place where you see hash functions applied is to check the integrity of files that you receive sometimes if you download a file you may see on the website the hash the MD5 hash of that file it's a way to check that the file you received matches the file on the server let's just quickly demonstrate that on the command line in Linux there are some functions to calculate the hash of a file not a file name but the contents of that file one of them is MD5 sum followed by the file name so what this program does is uses the MD5 hash algorithm so that's a specific hash function takes the contents of this file so the message is the 616,000 bytes and produces in this case a 128 bit output it's shown in hexadecimal this D5210 00C there are 32 hexadecimal digits each hexadecimal digit is 4 bits so in binary this is 128 bits so MD5 produces 128 bit output hash value and takes a variable size input in this case the input was my file of 616 kilobytes let's change the file let's just look at the the contents of that file minus L in XXD shows me the first 50 bytes of the file so this is the first 50 bytes of this PDF file in hexadecimal and this is the ASCII PDF this is just the PDF version and if we look further we'd see the PDF contents let's just change one bit in the file and you've used this in one of your homeworks we can use sed to replace a value I'll replace the value 50 in hexadecimal with 51 that is I'm going to take this file where there's a hexadecimal 50 which is in fact the second byte replace it with hexadecimal 51 and output let's say to a new file new.pdf and let's look at the start of that new file this new.pdf file the one I just created the original file except of this byte is different here it's 50 in the original file it's 51 in the new file so same size in fact just one bit is different just check same size so both files now 616, 357 bytes let's even rename the file and just note actually we're not rename it let's do it now out of time I'll calculate the md5 hash value of the original file and we get this d5210 hash value and now calculate using the same algorithm of the new file which differs by one bit what am I going to get a different hash value a completely different hash value and a random looking hash value so this is a simple demonstration that two different inputs but differ just by one bit produce two different hash values if I copy that file so now I have three files one is a copy of the other and of course if we calculate what am I going to get different or the same as what's the first letter going to be in the hexadecimal hash of this okay the md5 is applied on the contents of the file not the file name that's got nothing to do with it so all this is showing is that we're using a hash function in this case it's md5 there are others this program takes the contents of the file and calculates the hash value with md5 it's 128 bit hash value or 32 hexadecimal digits d52 up to 00C modifying that file by just one bit produces a completely different hash value 4e19 so on same contents, different file name of course we get the same hash value okay so we'll see in later examples maybe you will see and use char1, char2, md5 you've probably seen md5 in a number of places on the internet password files and so on you will have a homework where you will need to use and understand hash functions and see how they used in signatures enough for today, next week we'll come back and look at these properties for collision resistance