 So we're looking at how to authenticate to confirm we know who we're communicating with and to make sure no one changes the data along the way. So you think when you're communicating across some network between you and the destination there's many opportunities for someone in the middle to change what you send. So we need to be able to be sure that nothing's changed along the way. Similar, there's many opportunities for someone to send you something pretending to be someone else. So we need to be able to be sure that the person who's sending us something is who they say they are. Authentication. And we started looking at, first we looked at, said symmetric key encryption can provide authentication. And then we started looking at message authentication codes, a MAC. We went through this example usage of a MAC. When you want to send something to user B, you want to send a message, then you calculate the tag for that message. And what this diagram shows, the top one, it's got some bad notation, I've copied it from the textbook but it's a bit confusing. Think of this C as the MAC function. If you think replace C with MAC, MAC, that is in this diagram it's called C, but it's clear if we call it the MAC function. The message, we take the MAC function and apply it on the message in a shared secret key and we get what we'll call a tag. Sometimes called a MAC, but to be clearer we get a tag as an output. And we send both the message and the tag to the destination. So this tag is used so the receiver can verify the message that they receive is authentic. So we send both the message and the tag. And then the receiver verifies what they receive. They've received a message and a tag. They use that information to check and to provide them confidence that the message they received hasn't been modified and it hasn't come from someone else. And the way that they check and they always go through this step, you receive a message and a tag. So in this diagram the message is the white rectangle, the gray rectangle underneath is the tag concatenated to it, or the MAC with the key in the M. So what the receiver does is takes the received message and recalculates the MAC. That is using the MAC function, their key shared with who they think sent them the message. They get some calculated tag and they compare the calculated tag with the received tag. And they should be the same. And the idea is that we need to have a MAC function such that if they are the same, we know that the message hasn't been modified and it came from the person that has the key K. If they're not the same, then it means that it has been modified or it came from someone else. So we need a MAC function such that that's the case. And we'll return to the properties of the MAC function in a moment. What are these other two diagrams show? Well, the first diagram shows sending the message and the tag in the clear. The message is not encrypted, therefore someone can see the message. If we don't care about that, that's fine. But sometimes we care about also providing confidentiality for the message. And the next two show two different ways to also encrypt. So the top figure, authentication only, the next two figures, both authentication and confidentiality. And I think we're not going to details. There are some subtle differences between them. But both of them use a MAC function. Here there's C and here there's C. And they also encrypt with a symmetric key cipher, E and E. And we use two different keys. The key we use for the MAC function and the key we use for encryption, the keys are different. They're independent. So these are just examples of that we can also encrypt our information. We need to understand how the MAC function and the properties are important. Where did we get to yesterday? I'll write down the properties. I don't think they're clearly on the slide. So what we had yesterday was that the properties that we require of our MAC function, that one property is that if we apply our MAC function on two different messages, we must produce two different tags. That's our property and we saw why that was needed yesterday, not yesterday, Tuesday. And the MAC, similar, the MAC with two different keys. So we wrote this down as a set of equations yesterday. Different keys, different tags as output. So that's what we need for our MAC function. Now, when I say different message, we assume different message but the same key get different tags. Or a MAC with the same message but different keys gives different tags, different outputs. We need a function such that those properties exist. Now, another thing, if we've got it on the slide, how do we calculate the MAC? We take the key and a message and we get a small fixed size block of data called the tag. So note that the tag is small, right? And that's a 128 bits and usually fixed or always fixed. So with a particular function, we'll always get the same number of bits in the output tag. But the message, we need to be able to apply our authentication on any message. Whether the message is a short hello or if it's a large 5 gigabyte DVD, we should be able to use the same MAC function on different length input messages. What does that tell us about the properties? The message may be any length, the tag is fixed and short, shorter than the message at least. What does that say about our properties here? This is the properties we require but what we just said about the length, what's the issue? Look at the first property. We need a function with two different messages on input. We'll always get two different tags, sorry? Different length, yes. And the practical thing is that the tag is small, fixed length, let's say 128 bits as an example. And the message can be any size. So I should be able to take my MAC function and apply it on a large message or a message of any size, really. One gigabyte, one byte, one terabyte. So this property, is this property possible? It's not possible to have, so this says with two different messages, we get two different tags. Which suggests with different messages, we'll always get a different tag. It's not possible. Because if the message length is larger than the tag, there must be some messages that produce the same tag. So we have a conflict here. So the point is that the message in practice, M, can be any size. And let's say it's, as an example, 1000 bits in length. And the tag T, let's say, is the simplicity 100 bits in length. Then we need a MAC function, some function that takes using some key, takes a message of 1000 bits in length and produces a tag of 100 bits in length. And the property says that if we use two different messages, we should get two different tags. Which means we should produce a different tag for each message. So for each unique message, we should get a unique tag. But that's impossible. Because how many messages are there? If each message can be 1000 bits, the number of messages is 2 to the power of 1000. 2 to the power of 1000 possible messages. The number of possible tags as output is 2 to the power of 100. And the MAC function does a mapping. It maps those 2 to the power of 1000 messages down to tags. That is, it takes as an input one message and produces the output one tag. Now there are fewer tags than there are messages. That means some messages must map to the same tag. When there are more messages than tags, then some of the messages must produce the same tag. But our property said two messages should produce different tags. So in practice, for that property to be achieved, we say, although in theory, two messages may produce the same tag, what we need for security is that for an attacker, it must be practically impossible them to find two messages that produce one tag. It must be hard to be able to find them. And the way you do that is to make sure you have many tags. 2 to the power of 100 is probably enough in this case. So even though many messages map to the one tag, it's very unlikely to be able to find two meaningful messages that produce the same tag. And that's what the attacker needs. We'll see this more when we look at hash functions. It's the same concept. A hash function takes an input and produces a fixed-length output. So with a MAC function, we say this is our desired property. Two different messages will produce two different tags. In theory, we know that's not the case because there are more messages than tags. But for a secure MAC function, we need one such that in practice, it's very hard, practically impossible to find two messages that produce the same tag. And that will be sufficient for security. So then it's the goal of the attacker, find two messages that produce the same tag. If they can, then they can defeat the authentication. Any questions on that concept? If you've got more messages than tags, then must mean some messages produce the same tag. So what we do when we consider message authentication with max and later with hash functions, we normally make this assumption that this is impossible for the attacker to achieve. And the way to make that impossible is to make the tag long enough. It turns out that for the attacker to find messages that produce the same tag, it depends upon the tag length. Let's see yesterday, again yesterday, Tuesday we went through several attacks or several approaches for the attacker to try and defeat the authentication. We had the case, they changed the tag. What happens? It didn't work. Changed the message but not the tag. Then we had a third attack of masquerade as someone else. Let's do a fourth attack where the attacker tries to modify the message and make B think that the message has not been modified. That's the fourth and I think the final attack in that case. So A sends a message to B but it's going to be intercepted by our attacker C. A calculates the tag of their message and sends the message combined with the tag but it's intercepted. C gets a copy before it gets to B and they're going to modify and then send on to B. We covered this case in the previous lecture where we covered the case where they changed the message only, didn't work, changed the tag only, didn't help, didn't work. Let's try and change the message and change the tag. So C changes the message from increase Steve's salary by 10,000 baht to increase by 1 million baht and the new message M' different than M and then they add a tag where that tag is calculated based upon that new message and they send the new message and the new tag to B. What key do they use to generate the new tag? C, the malicious user, modified the message. They need to recalculate the tag. What key are they going to use? CB, okay, good. Note, they cannot use key KAB because they don't know it. So we always make this assumption a shared secret key is only known by the people who shared that secret. So C cannot use KAB here. They must use another key. All right, let's give this other key a name, KCB. They calculate the tag, attach it to the message, send to B. What does B do? B verifies. To verify, they calculate the MAC of the received message M' with what key? Key AB. Remember, this is C modifying the message and trying to make B think it came from A. So B receives a message, ah, it's from A. The from address says A. Therefore, they use KAB to authenticate. If you think it's from A, use the key that you shared with A to authenticate. You get some tag, let's call it T double prime. We get a calculated tag. We compare it to the received tag. Are they the same? No, they're not the same. Why? T prime is calculated with the MAC function on M' with KCB. T double prime is calculated on the same message but a different key. And we set it our properties. Two different keys with the same message produce two different tags. So now when we compare the tags, we see they're not the same and we realise something's gone wrong. We'll find that the calculated value is not equal to the received value. So we don't trust it. So that was a fourth attack that we can attempt but we see that it doesn't work. And that's why the authentication works. Of course, it depended upon that property. Two different keys produces two different tags. If we had a MAC function where C could find a key that when they use it with M' they produce a tag which was the same as when they use KAB. If they could find a value that produces the same tag, then they could trick user B into thinking this is from A. So that's why we have this required property, that they can't find a key that produces the same tag. And similarly the other way we saw in the other attacks, the user can't find a different message that produces the same tag. If they could, they could defeat the authentication system. Any questions about this and even the other attacks from last lecture? The message that produces the same tag, well it depends upon if the tag produced by the malicious user is the same as the tag that B calculates, then if the message was different or even the key was different then the attack has been successful. So, but how can you produce a tag that matches this one? Well the properties of our MAC function mean that we can only produce it if we use the same key and the same message. Because we say if we have the different key we produce a different tag and if we have the same key but a different message we also produce a different tag. So it's about making sure that the attacker can't find a message and or key that produces the same tag. Other questions? If we know the algorithm of the MAC function, can we find the key first? Yes, we know the algorithm. In all of our cryptography normally we assume that the algorithm, the MAC function, is known. So the attacker knows the actual function. Same with encryption, they know the cipher. Then the next thing is if we know that, can we find the key? All right and that leads to the next thing. We must make sure that with the function that they can't find the key. Let's go move on to the requirements and see that yes, that will be a requirement that they can't find a key and what they would need to do to find the key. Now if the attacker can find the key, then they'll defeat the system because it's no longer a shared secret. Let's go to a practical example. How do they find the key? Okay, so coming back to that question, if they can find the key they can defeat the system. How do they find the key? Then we need a function such that given some message and tag that you cannot work back and find the key. It's the same property of an encryption function of a cipher. Let's consider the case. So we need to make sure that a function, again, repeat, we need a function such that if the tag is known, if the attacker knows the tag, they know the function that's given, they know a message. So they know the function, the message, the tag. It must be hard for them to find the key. So that's another property of our MAC function. Even if the attacker knows the algorithm, a message, and a tag, it must be practically impossible for them to work backwards and get the key. And that's the same requirement as for encryption. With encryption, we encrypt using some key, some message to get cipher text. If the attacker knows the cipher text, the algorithm, and the plain text message, still it must be impossible for them to find the key. So the same requirement there. We need a function which makes it hard to work backwards to get a key. And many MAC functions use the same concepts and actually reuse some of the normal encryption ciphers, DES and others. So they use the same principles. What's one way to find the key? What's the dumb way to find the key? Brute force, okay? Brute force, try all the keys. Always we can try that. If we try all the keys, so if I know a message and I know the corresponding tag, that's easy because someone's just sent a message and a tag. I know the function. What I can do is try and work backwards or try all the keys and work out which key produced that tag. And if I try all the keys, I'll quickly find the answer. Let's try a brute force. I have a MAC function. It's a very, very basic MAC function I've implemented. We'll try it and use it. This MAC function I've implemented, it takes a 3-bit key as input and an 8-bit message just for our demo. And it produces a tag, a 4-bit tag. So in this demo, this MAC function takes the message, the second parameter, the first parameter is the key and produces some tag. We don't care about how it does it at the moment, just as long as it has our properties. Assuming it has the properties that we've just listed before, a brute force attack would be to try all keys to find a key that someone's used. That is, let's say you have observed a message and a corresponding tag. That is, someone has sent previously this message and this tag across the network and you as the attacker have intercepted and got a copy of that. What you want to do is find the key, because if you can find the key that those two users used, then it's no longer secret and you can defeat the authentication. So given this information and given this MAC function, how do you find the key? I'm not going to tell you the function, it's not important. Try the dumb way to find the key. What do you do? Try all the keys. So what we do is we take key one and apply the MAC on this message, we'll get a tag, it doesn't match this tag. If not, then that's not the key and try it with the next and the next and so on. So a brute force on the keys and I've got a program that would do it for us. It takes the message and applies our MAC function using all possible keys. How many keys? Eight keys in this case. Three bit key to keep things simple, so we just try all eight keys. It takes the MAC of the message with key one, gets this tag, key two, this tag, tries all eight keys. What's the key? So our tag we knew for this message is one, one, zero, one. So we look in this list, find one, one, zero, one. Here we have it. So we've found our key in this case. So that's a brute force attack against the MAC function. How do we stop that? How do you stop someone for doing a successful brute force in practice? Make it longer. Make the key longer. In our case, the key is three bits. There are eight keys to try. Set the key to be 128 bits. There are two to the power of 128 keys to try. And that will take forever. So same as encryption, set the key large enough and a brute force is not possible. So we need the key to be large. Actually, this attack can actually produce the tag multiple times for different keys. So not in this case, but it's possible that two different keys with that message produce the same tag. If that's the case, then we need to try on another pair of message tag, similar to we did a meet in the middle attack. We tried on the second pair just to confirm. It turns out that a brute force attack on the key takes normally two to the power of K attempts, where K is the length of the key. So one attack on a MAC is to do a brute force attack on a key. The effort to find the key is approximately two to the power of K. So if you make K large enough, say 128 bits, then the brute force is not successful. But if K is large, there's another potential attack. And it's on the tag, and it leads to a requirement on the tag length. Let's say there's some way that the attacker can get the user that knows the key to calculate tags for messages. For example, I'm the attacker, you're the person who has the key, I send you a message, you calculate the tag for that message, and you tell me the tag. I don't know the key that you've used, but I know for this message, using your key, you get this tag. And I can send you another message, and you use the same key and produce a new tag, and I learn the tag. So let's say there's some feature where the attacker can get the user to calculate the tags for a particular set of messages. Some sort of service, for example. Let's say it's a website, I submit a message, and the website returns the tag. If that's possible, then we can do another attack. And I've got such a website, or service, not a website. I've got, let's say as the attacker now, we know a tag. I don't know the key. I don't know the key as the attacker. I've got a tag though, and I want to find a message that produces that tag. Because if I can find a message that produces that tag, then I can use that to pretend to be someone else, or to modify a message and make it go undetected at the receiver. So if I can find a message that produces that tag, then that can be useful for the attacker. So now as the attacker, I want to find a message that produces that tag. So what I do, I don't know the key. I submit a message to the user, and that user tells me the tag for that message. And if it matches this one, then I've found my answer. I've found a message that produces this tag. If it doesn't, I submit another message to that user, and they tell me the tag. And I keep going until I get a message that produces the tag that I want here. How many times do I need to do that? How many messages would I need to submit to the user until I get this 4-bit tag? 2 to the power of 4 on average should be sufficient, because how many tags are there? 16, 2 to the power of 4, and maybe we haven't mentioned this, but we assume that the messages produce evenly distributed tags. They don't all produce the same tag. They produce different tags. So if I submit the first message, I should get one 4-bit tag. And if that's not it, I submit another one, and I get a different 4-bit tag. Well, the worst case, I'll have to try 16 messages, because after I've tried 16 messages, I should, on average, get all of the 16 possible tags. One of them should be this one, 0, 1, 1, 1. And that should give me the message. So a characteristic of our MAC function should be such that, yes, for all those messages, they'll produce random tags, which means that the set of messages that produce the same tag will... The number of messages in that set will be the same as for every other tag, and that they'll be randomly distributed. How do we ensure that? Similar to the way that we use encryption ciphers, like DES, how does DES produce random output? By going through that design of the S-boxes, the X-ORs, the permutations, we design an algorithm such that the output is random. And if it's random, then it means one characteristic of randomness is it's randomly distributed. So it's about the design of the MAC function that it is randomly distributed. And we haven't looked at the MAC functions yet. I haven't shown you an example of a MAC function. We will see many of them use DES, AES, and similar, but there are some others as well. And other MAC functions use hash functions, yes, so yes. We will see that one way to do a MAC is to use a hash function plus somehow combine it with a key, correct? Everyone knows hash functions. You told me on Tuesday you do. You know hash functions. You take some input, you calculate the hash, and the hash value is some short values. And what you want for the hash function is this even distribution of the hash values. Two different messages produce different hash values. Similar properties as what we've got here. And the next topic is about hash functions. So we said the attack here, just submit messages until we get a tag that matches this one. And on average, we'd have to try all tags. And I've got something that will try all tags. That is, it submits 16 messages to this service and produces the message and the tag as output. Were we successful? Yes, here and here. We may have multiple. So what's happened? This tells us that if we use the key, we still don't know the key, but if we use the key that the user has, this message 101, 1, 1, 0, 1, 1 will produce this tag. And in this case, there are multiple, in fact. That's not always the case. So here we've found the message that produces this tag. That is working backwards, given the tag find the message. And again, if we can do that, we can defeat the authentication. So how do we make it hard for the attacker to do this? Make the tag long. Here, the tag is 4 bits. It took us 16 attempts. Well, on average, it would take 16 attempts. Make the tag 128 bits. And therefore, it would take too long to find a message. And that's the second part here. One attack is to try and find the key. Another is to give a tag, or a MAC value, find the message. The amount of effort depends upon the length of the key in the first case and the length of the tag, n in the second case. And that's how we normally measure the security of MAC functions. So to defeat the MAC function, if your key is 50 bits and your tag is 60 bits, what do you do? The key is 50. The tag is 60 bits. You're the attacker. What are you going to attack? The key. Because there's less attempts to attack the key. There's just 2 to the power of 50. So attack the smallest one. And that's why we say that the effort to break a MAC is the minimum of 2 to the power of k, where k is the length of the key, and 2 to the power of n, where n is the length of the tag. So as long as we make the key and the tag large enough, doing a brute force on either of them will be too hard. I think they're the main points that we want to cover about MACs at this stage. Any questions? With authentication, you need to start thinking about, here's a scheme. You need to understand it. You think, well, what could an attacker do to try and defeat that scheme? In this case, we've gone through different attacks that they try to do, change the message, pretend to be A, change the tag, change the message and the tag. And we all see, in all cases, if our properties are true, those attacks will be detected. The remaining, OK, so there are many different MAC algorithms. And we're not going to go through any in detail. I'll just quickly mention a few. And there are some attack-specific algorithms. But generally, if you choose a good algorithm, there's known algorithms which are considered secure, and the best way to break them is brute force. So therefore, you just need to make sure the tag and the key is long enough, and the MACs are considered secure. What MAC algorithms? Some are based on ciphers. So there was a popular one based on DES, but now considered insecure. Others are based upon triple DES and AES, and they work. So they use symmetric key encryption. C MAC, O MAC, P MAC, U MAC, V MAC, and others. So they use existing ciphers, but modify them to produce a fixed-length MAC based on any length input. Another popular one today is H MAC. This one uses a hash function. So we'll come back to this after we cover hash functions. So H MAC takes a key in a message and uses a hash function using that key in message as input somewhere. It's very popular in network security today. So we'll come back to that after we go through hash functions, which is our next topic. Bad dream? Again, same as Tuesday, who knows hash functions? Hands up, or maybe try again. Hands up if you don't remember hash functions. Where may have you covered hash functions? Any subject before? Data structures and algorithms, maybe? I think you may have used them. What did you use them for? Hash functions. Sorting and doing operations on data, storage of data, and so on, use it with databases, and how to store things in arrays, and so on in an efficient manner to look them up. You take the data and you calculate some sort of index, the hash of that data. We're using the same functions, but we... Well, the same concepts, they may be different functions, but we'll use them mainly for authentication. And the functions would differ, but they'll have similar properties to what you've used in data structures and so on, and we'll often refer to as cryptographic hash functions. So we'll look at what are our requirements of hash functions, and once we have those hash functions, how can we do authentication similar to using a Mac? So it's an alternative to using a Mac. And then especially digital signatures. So hash function H takes a variable length input block of data, M, some arbitrary-sized message, and the function returns a fixed size, usually small, hash value. So if M is our one megabyte file, and our hash may return, say, 128-bit hash value, what's an example of a hash function? Give me a name. Hash function. Sorry? Yeah, okay, give me one that you see on the internet or see in applications quite often. Three-letter acronym. You probably may have seen it. If you've downloaded software, especially source code, you may see some extra information with the download. No one's ever downloaded anything off the internet? Anything useful, I mean, like some open-source software or what about MD5? Sometimes you may see MD5 as the idea is when you download the software, what's on the website is also a hash value of that software such that when you download it, you can use that hash value to confirm, to authenticate, that nothing's being modified between the download and your computer or no one has put some fake software on the website. So MD5 may have been one that you've seen, yeah? No? No. Like most of the things we deal with cryptography, we'll deal with binary values, but we will represent them often in hexadecimal values. So yes, you may have seen hexadecimal values on a website that say download this file and then you can check and confirm that the file hasn't been modified. MD5 is one that you may see. SHA is another one. Now, our requirement for the hash function is that M may be any size. Therefore, we have many inputs, many possible inputs. It should produce an evenly distributed and random looking outputs. The idea is that you take the hash of one message, you get a hash value, lowercase h as output, you take the hash of a slightly different message and you get a completely different hash value as output. That's what this means. So random looking, the hash values aren't structured in any way, they're random and evenly distributed in that just because the inputs are similar, it doesn't mean the hash value should be similar. Let's give an example. Two messages. One stored in the file plaintext.txt and the second message is in p2.txt. What's the difference between them? Are they the same? It's hard to see. No, it's a bit confusing but because of the way that there's no new line character. But the first message goes up to here. Goodbye, full stop. The second message, goodbye, no full stop. Just one byte different. Let's calculate a hash of those two files and there's software which is quite common that will take the input message, the contents of the file and return the hash of that value and MD5 sum is one of them. It calculates the MD5 hash of the contents of the file. MD5 is the algorithm. MD5 sum is just the software name. There's the hash. 911d to AC5. So this is in hexadecimal. MD5 produces 128-bit hash value. So what's that? 32 hexadecimal digits. So there's the hash value. Now, we take MD5 sum of the second input message which differs by just one byte and we get a completely different hash value. They both look random and just because the inputs that differ by a small amount doesn't mean the hash messages differ by a small amount. That's the even distribution of the outputs. In fact, we modified the message by one byte. Even if you modify it just by a single bit, you'll see that same effect. So two different inputs produce two different outputs. Can one message, or can two messages produce the same hash value? Can two different messages produce the same hash value? Again, in theory, yes. In practice, they should be hard to find such messages. Same principle as before. The message can be much longer than the hash value. Hash value is 128 bits, the message longer than 128 bits. Therefore, since the hash function maps the message to a hash value, if there are more possible messages than hash values, then it means some of those messages must map to the same hash value. So in theory, yes, two messages can produce the same hash value. In practice, we will require that it's hard to find such messages. And that leads to one of our general requirements for a cryptographic hash function. Two different messages should not produce the same hash value. I stated on the slide as saying, it's computationally infeasible, practically impossible, that we take two different messages that do produce the same hash value. That is, it should be hard to find messages that produce the same hash value. And messages that produce the same hash value, the concept is a collision. So we refer to this property as the collision-free property. We need a function that it's practically impossible for someone to find two messages that produce the same hash value. Yep, question, scratching head. The other property, it should be practically impossible to go backwards. It was easy, as you see, it was easy using this software to take the message and produce the hash value. But it should be hard if I give you just the hash value, find the message. Okay, that's the one-way property. Easy to calculate the hash of some message, hard, practically impossible, to find the message given a hash value. So there are two general properties that we require for our hash function to use it in cryptography for authentication. Later, we'll come back to these properties and look at them in a bit more depth. Any questions about the two of them now? So for now, let's assume we have them, hard to go backwards and hard to find messages with the same hash value. And then if we have a function with those properties, how can we achieve authentication? As with a Mac, we're going to use a hash function to determine whether data has changed or not for message authentication. But we'll see that hash functions are not just used for message authentication, we'll see a more specific case of digital signatures, storage of passwords, used in antivirus and intrusion detection systems to try and detect something's been changed or something's different, used in random number generators as well. So hash functions are used not just in authentication, they're used in other aspects of security. So we can visualize it like this. We take a message as input and that message is L bits in length. And often for convenience, we attach the length at the end. So if my message is 1000 bits, at the end I attach the value of 1000 and we calculate the hash of that and produced a short, fixed length hash value. So if our properties hold, how can we use hash functions for authentication? Similar that we use Macs, same concepts. We want to verify the integrity of the message. We receive a message, check that the data hasn't been modified and that the sender is who they say they are. The hash of a message is often, or sometimes referred to as a message digest. So that's another name that the hash value is called, message digest. So we'll go through a set of examples of using hash functions. Maybe there's a better one to start with. No, we'll start with the first one. Here's an example of using hash functions and combined with symmetric key cryptography. So the aim in this example is to perform authentication and keep the message confidential. A wants to send a message to B. A takes the message, calculates the hash of the message, gets the message and the hash value combined together. And in this case we're going to encrypt with symmetric key cipher. Encrypt with a shared secret key with B. So what we get to be sent across the network is the message concatenated with the hash of the message encrypted with secret key k. We send that to B and B verifies. Actually they first decrypt and then verify. So we do the opposite steps for the decryption. So decrypt with the ciphertext with k. We get the original plain text and similar to our Mac, we now verify. So we have a message and a hash value. We calculate the hash of that message and then the calculated value we compare with the received hash value. If they match, we assume everything's okay. If they don't match, we assume something's gone wrong. And this is the same concept we saw with the Mac where we do this verification step. With the Mac, we calculate the Mac of the message with some key. With a hash, there's no key as the input here. It's just their message as input. We'll go through several and then come back and look at some attacks on some of them. This first one, we encrypt the entire message. Sometimes we don't need to do that. Encrypting a message, let's say it's again a 10 gigabyte file. Encrypting it takes time. Let's say we don't care if someone sees a message. We just want to provide authentication. Then an alternative is to encrypt just the hash value. Calculate the hash of the message, encrypt the hash value, attach the encrypted hash value to the message and send the message in the clear plus that encrypted hash value and then verify the receiver B. Let's for this one do an attack and see what the attacker can do. Any suggestions? What can you do as an attacker if this scheme is used? So you can intercept the message, try and modify things. What could you try and do? So the best way to think of it is try and modify something and see what happens. Will it be detected at the receiver? Spend five minutes attacking this scheme. That is on the slide, try and think, what if you change the message? What will happen? You intercept the message, you change the message from give all students in the class an A to give all students in the class an F. Are you going to be detected? Why? You change the message only. Why will it be detected? If so, why will it be detected? So we change the message and again the verification. We take a hash of that modified message at the receiver. We get some hash value. We decrypt this part with the key and the result of decrypting gives us the hash value that A calculated. So we've got two hash values here. The hash of the modified message and the hash of the original message. And again, our property of the hash function is that the hash of two different messages gives us two different values. Therefore, if we have two different values here, we assume that the message we received is not the same as what A sent and therefore we've detected a modification. But if that message is modified and the hash function of the modified message leads to a collision with a message. So we'll do that. So one of the properties remember, we say that for our hash function, the one that we use here should be such that it's computationally infeasible, practically impossible for someone to find two messages that produce the same hash value. That's our requirement. We know in theory, that's not true. We know with many possible messages, some of them will produce the same hash value. Therefore, what if someone can find two messages that produce the same hash value? Let's see. See what they can do. So on this case, let's see if that requirement wasn't true. Depends on the hash function. If the hash function is cryptographically secure, then in practice it will not happen. But if you're using a bad hash function, it may happen. No, the function needs to be secure. We'll look at a little bit about the functions. Let's say in the last five minutes, this is what A has sent to B. And from our slide, they send the message concatenated with the encrypted hash value. And I'll write KAB, the key shared between A and B. A sends this to B. And I've just written it in a form that the message M concatenated with encrypting H, the hash of M, with K. So really, M concatenated with this. That's sent to B. The attacker intercepts. They intercept and then they send a modified message to B. And let's say they change the message to be M prime. Now, what if we change the message to M prime? We don't change this part. If I'm the attacker, I cannot encrypt the hash of the new message, because I don't know KAB. So the attacker doesn't know KAB. So they could not try and calculate the hash of M prime and then encrypt because B would notice that because it was used with a different key. So what the attacker can do is take the modified message and just take the original value here, this original encrypted hash value, send it as is. Doesn't mean they know KAB, it just means they take those bits that were produced from the encryption. They send that to B. B tries to verify. So the first thing B does, look at our approach, we take the hash of the received message, we take the hash of M prime and then the second thing we do is decrypt the received hash value with KAB, decrypt it using the key shared between B and the whom we think sent it, A. So we decrypt the second part of what we received and run out of space. What's the answer when we decrypt that? What do we get? And I've made a mistake, I should have included KAB in here. Sorry. Decrypt using KAB, this part. We get H of M as output. H of M was encrypted with KAB, we decrypt with KAB, therefore we get the original H of M, the hash of M. So we have the hash of M and the hash of M prime. M and M prime are different messages. Now we compare and the outcome, if the hash of M prime equals the hash of M, then accept. Otherwise, reject, that is, don't trust it. If they are the same, we trust the message. If they're not the same, else don't trust. And this leads to our requirement that it should be practically impossible for this attacker to find some other message M prime that has the same hash value as the original message M. Because if they could, then they have tricked B into receiving this message. Because if the hash values are the same, when B compares, notice the same and therefore trust the message, which would be a successful attack. So our property is that it should be impossible to find two messages like this. In theory, it's possible in practice, it needs to take a long time. And that would depend upon, really, for most hash algorithms, depends upon the length of the, for secure hash algorithms, depends upon the length of the hash. The longer the hash, the more time it takes to find. Any questions on this concept? So we've just used the hash to perform authentication again. Message modified, detected by B because the hash values are different. We couldn't recalculate the hash because we don't have KAB. If we tried to recalculate the hash and encrypt, we'd have to encrypt with a different key and again. If we encrypt it with say KBC, when B decrypts with KAB, they'll get something other than the hash value here because when you encrypt something with one key and decrypt with a different key, the output will not be the original plain text. It'll be something different. So if we used the different key here, it would also be detected. Any questions? Ready for a quiz? I think we should have a quiz next week. We've covered a few new things. Authentication. I'll look whether we have an in-class or an online quiz that starts to test so you can look at these diagrams and start to ask yourself, what can an attacker do? If they try to change this, what happens? How is it detected? Next week, we'll continue through these examples and we'll look at a few different examples of how we can use hashes and see what an attacker can do and why it will be detected.