 So really up until now, we've focused on confidentiality, encrypting our data to make sure it's kept secret. We've just seen another technique. Diffie Hellman is not about confidentiality of messages. It's about sharing a secret between users. And we saw that we need some form of authentication. So this topic, although it's titled Message Authentication Codes, we'll first introduce in general authentication. And the next topic is also about supporting authentication. So we'll talk genuinely about what is authentication, what do we need, and the current ways we know of authenticating. And then next week, we'll introduce some new ways for authenticating. Just to remind you, some of the attacks, those, we had six attacks and remember six different services. There's eight listed here. It's just slightly different. Disclosure of messages. That was one attack where we send a message across a network and someone gets that message which is intended to be confidential but is released. It's disclosed. How do we stop that? Encrypt, like I say. So the mechanisms for stopping disclosure are mainly encryption. If we encrypt it, no one can decrypt it. Although we haven't talked about it much or seen it in many examples, encryption can also be used to support stopping traffic analysis. If we encrypt our data, it's harder for someone to analyse. But in fact, we need other mechanisms to stop traffic analysis, but encryption is just one of them. But there were many other attacks, many of those active attacks that we talked about. Masquerade, pretend to be someone else. Modification, send a message, someone modifies it in the middle, like we just saw, the man in the middle. Repudiation, or those messages that someone denies receiving a message or denies sending a message. How do we stop such attacks? Well, they require authentication. So message authentication is about when we receive a message, let's check. Did this come from who they say they are or is it someone pretending to be them? Or has this message been modified along the way? That's what authentication includes. Checking that it's not someone masquerading or it's not someone modifying the message. Modification may be modifying the content of the message. Maybe someone sends five messages. Modification attack may involve changing the ordering of those messages. That could be an attack, so in sequence modification or even the timing, delaying a message. That could be some form of an attack. And authentication is the mechanism used to stop that. Non-repudiation, remember, someone denies that they sent or received a message. The way that we stop such actions is to use digital signatures, which are really a subset of authentication. So the next few topics are on authentication. What do we mean by authentication? The receiver, when they receive a message, they want to verify that the contents of that message haven't been modified. So I get a message. I want to be sure that that message is exactly how the source sent it. And that the source of the message is who they claim to be. It's not someone pretending to be someone else. So there are two forms of authentication there. We usually want both. So often the techniques provide both services. Authenticate the data, or sometimes we call it data integrity, and authenticate the source, or just authentication. How do we do it? There are different approaches. The techniques we've seen for a symmetric key encryption provide some inbuilt authentication. But there are other techniques which can be more efficient, can be performed better, and can be easier to use than encryption. And they include message authentication codes, or MACS, use of hash functions, and using hash functions to combine with public key encryption to create digital signatures. So we'll talk about MACS hash functions and digital signatures in this topic and in the next topic. But let's just today look at how symmetric key encryption, which we've already studied, can be used for authentication. And let's use an example to consider that. There's an example we've done before. I have a plain text, a message, and I've encrypted that to get some ciphertext using OpenSSL. When we receive the ciphertext, so think of the person who receives the ciphertext, they decrypt it. So this was encrypted using symmetric key cryptography. In fact, it was just des, des with ECB. When the receiver tries to decrypt it, what will happen if they use the correct key to decrypt? We should give the plain text back. What if we use the incorrect key? We'll get something else. Will we know that? Probably we'll know that if we decrypt with the wrong key, that it is the wrong key. It's not the original plain text. Let's just see that. Let's decrypt that message, that ciphertext, using some different keys. I used des with ECB minus d to decrypt. The input is the ciphertext, and the output is, let's say, received one dot txt. We'll have a look at it in a moment. Now, with encrypting with des, we need to specify an initialization vector, and I'll tell you what it is, because I've done it before, or I'll type what it is, the IV, I hope. Let's say it was known, it's not secret. And the other thing we need is, of course, the key. And we'll type that in in a moment, and we'll explain it later, but we'll use the option no pad. What's the key to use? There are 16 hex digits in the key. It's 64 bits with des. What key do you use when you decrypt? Well, if you don't know it, then, well, let's try any key, maybe, first. Let's try this key. I don't know it at this stage. Let's say someone tries to decrypt this message with the wrong key. I know it's wrong, because I know the key. Let's see what happens, and we'll use XXD to look at that received message, and maybe zoom out a bit. Was that the correct key or not? Why not? It's unreadable. That is, if you look at the ASCII encoding, the dots means unprintable characters. We get some characters. It doesn't make sense to us. So this is a form of authentication. Because it doesn't successfully decrypt, and because we recognize this didn't decrypt correctly, what does it imply? The key is wrong, or if the key was right and it didn't decrypt, what does that imply? If the key was correct and it didn't decrypt, it implies the message is wrong, the ciphertext is wrong. We'll see that in a moment. Let's decrypt it with the correct key, and let's call it received too, just to get a different output. What happened? What do we know? The key is wrong, or the ciphertext is wrong? We don't know. Something is wrong still, okay? It is the wrong key. That's a one. Let's try again. What do we know now? The key is correct, okay? We get a message that makes sense. So we assume the key is correct, and we assume that the ciphertext hasn't been modified. Why? Let's try and modify the ciphertext and then use the correct key. How do I modify the ciphertext? Let me find a good ciphertext to modify. Try again, and I'll just... Correct key. So I haven't changed the key. I changed the ciphertext. I hope this will work. Let's call it received three. So I used the same key as before. That is, I know it's the correct key, the correct IV. What I did was I changed the ciphertext by just one character, and when we decrypt now, we again know something's wrong, okay? Because we don't get intended the expected English message. We don't get a message that we can understand. So the point is that when we use symmetric key encryption, when we decrypt, if it successfully decrypts, we know it was the original ciphertext and the correct key. That is, we know that the ciphertext hasn't been modified because if the ciphertext is modified and we use the correct key, we will not get a message that we can understand. So that provides authentication of the data. If the ciphertext is modified or the message is modified and encrypted again, then we would detect that at the receiver when we try to decrypt. We would detect we get something that doesn't make sense. Similar, if we decrypt with a key that doesn't match the key that was used to encrypt, it will not make sense when we decrypt. That is, if we encrypt with one key and decrypt with a different key, the resulting received message will be random. It will not make sense. This acts as source authentication. That is, if it successfully decrypts with the key I shared with A, that implies A encrypted this message, not someone else pretending to be A. So in fact, symmetric key cryptography has built in authentication. If it successfully decrypts, we assure that the message came from the person we shared the key with and we're sure that the message hasn't been modified along the way. Because if either of these two things change, that is, someone else encrypted it or someone modified the message, it wouldn't successfully decrypt. Try and understand that logic of why symmetric key encryption provides authentication. Using the wrong key or using the wrong ciphertext will produce something that we cannot recognize and that will imply that something's gone wrong. Now there's some assumptions here. How do you know that this dot c9r is incorrect? How do you know this is not the correct plaintext? You don't understand it. What if the message that I sent was a random key? I wanted to encrypt a key and send it to someone. So I took a key, which was just random bits, encrypted it and I send that, someone decrypts it. So what do they get as output? Random characters. How do they know if they are the correct random characters? They are the correct key. It won't work in that case. That is, it's possible in some cases that the message that was encrypted is random, especially if we encrypt keys. Keys are usually random, like secret keys. So if the message is random, then we cannot tell if this is correct or not. Because it could be, we cannot identify. How can we get around that problem? How can I know that the message when I decrypt it is correct or not? What we need to do is attach some other information to that message. For example, include an error detection code inside there such that when we decrypt it, we can check, compare against, say, some parity check or some CRC check. And that can be used to almost guarantee that the message matches what the original message was. So basically you need to add extra information inside the message such that when it's decrypted, we have some way to check. An error detecting codes are one general form for doing that. There are some others. So authentication is provided using symmetric key encryption. We'll go direct to, that's just symmetric key encryption, which we've seen before. We assume that the person decrypting can recognize the correct plain text. If we can't, we have a problem. So symmetric key encryption provides confidentiality. Only B and A can recover the plain text. Sort authentication. If it successfully decrypts, it implies the message came from A because the only person who has the same key is A. And if it successfully decrypts, that means the message hasn't been modified. So data authentication. And the example we showed, a similar one is given here, a simpler one is recognizing the correct plain text. I think if you receive this cipher text and you decrypt with a key and get this plain text, you will assume it's correct. What are the answers to those two questions? Was the plain text encrypted with a key K? What do you think? Yes or no? That is, the cipher text receives this DPN and so on. B decrypts it with some key K that it's shared with A and gets this plain text. Do you think the plain text was encrypted with the key K? Yes. Do you think the cipher text received is the same as the cipher text sent? Meaning no modification? Yes. Because if it was modified, we wouldn't get a message that makes sense here. So if we can recognize the correct plain text, authentication is provided. In this case, different cipher text, different plain text, do you think that the correct key was used? What's the plain text? Can you read it to me? Does it make sense? No, so here's the case. If it doesn't make sense, we think either the wrong key was used or something was modified along the way. We don't know which, but we know something's gone wrong and that's sufficient for authentication. So if we can correctly recognize the plain text, we can provide authentication. What about this case? Now it's harder. We don't know what we're expecting to receive. It's binary, maybe it's a random key. So only when we can recognize the correct plain text does authentication get provided. So in example one and two, we assume the message was in English and it worked. In example three, it's much harder to know whether what we receive is correct or not. We will not cover that. So the point is, to finish today, symmetric key encryption can provide authentication, but not in all cases. In many cases, but not all cases. In the cases that it can't, let's say sending random messages, we can adapt by adding some other error detecting code to help. But it turns out it's not so efficient in some cases. It maybe doesn't perform well. So people have developed other approaches, message authentication codes, and in the next topic, hash functions and digital signatures. We will cover them next week. We introduced authentication last week under this topic, message authentication codes. So we'll just recap because you didn't have the lecture notes last week, but the idea is that we want to be able to check when we receive a message, two things. We want to check that the person who sent the message is who they claim they are, and we want to check that nothing's been modified since it's been sent. So authentication of the source and authentication of the data, or sometimes we call data integrity. So in general, we talk about authentication. Today we'll look at one form of authentication using message authentication codes. What we said last week, so we said message authentication is needed to prevent a number of attacks, there are different approaches. We will look at max, message authentication codes. Then next topic is on hash functions and then we'll see how hash functions are applied for authentication combined with public key encryption to produce digital signatures. And what we said last week is that in some cases we can use just normal symmetric key encryption for authentication because when we decrypt something, if we recognize that it's correct, we know that we've used the correct key, which means it must have been encrypted by the person who has that key and that's some proof that it came from that other person and that it hasn't been modified because if the message is modified or encrypted with the wrong key, when we try to decrypt it, we'll detect that something went wrong. But it doesn't always work. So with symmetric key encryption, we can do it, but it relies on the assumption that the person decrypting can recognize the correct plain text. In some cases it's possible and in some cases it's not possible. So if we know that it's an English message, yes we can, but if we have some binary message or some random message even, then we cannot detect whether we've successfully decrypted or not. So the way to solve this such that when we receive a message, we can determine, has anything modified? Has this come from the right person? One way is to make sure that when we do encrypt a message that when we decrypt it, we can be sure if it's correct or not and one way to do that is to add some structure inside. That is, let's say I have a message to send, it's a sequence of a thousand random bits, maybe it's a key of a thousand bits long, that has no structure in it. So when we decrypt, we cannot detect if it's correct or not. So one way we can support the authentication is add some structure, add a parity check or a more effective error detecting code like a CRC check or some frame check sequence. So add some structure to the data such that when we decrypt, we've got something to recognize, that structure that we've added in there. If we're talking about network communications, most messages we send across a network already have some inbuilt structure, the header of the packet. Because if we encrypt a packet, even if the data is random, the header is structured. So when we decrypt that packet, we should be able to recognize the correct header fields because there's only certain values that are available. So we can use symmetric key encryption for authentication by making sure we introduce some structure into the message. But we don't always want to use symmetric key encryption for authentication because of these limitations of having to recognize the correct plain text. Also because symmetric key encryption can be slow. So for performance reasons, we may not want to encrypt the entire message. We'd like to use maybe a faster algorithm that doesn't have to encrypt the, let's say, one gigabyte file. That can be faster to add some information to support authentication and faster to verify. So there are techniques developed that don't use symmetric key encryption but also provide authentication. So that's what we look at today. The technique is called message authentication codes or MACS. What else does MAC stand for? M-A-C? M-A-C, not M-A-X. Media MACS is controlling computer network. Sometimes we talk about MAC protocols or the MAC address. It's not the same thing. This is a message authentication code. So just be aware that acronym comes up in multiple instances. How does it work? We will look at the general concepts of using MACS for authentication. We will not go through any detailed algorithms, so any real algorithms. I'll mention the names of some. There are different algorithms. We'll just look at the concepts and talk about what are the security requirements. So the approach is what we do is if we have a message we want to send from A to B and we want to make sure at B, when they get that message, that it hasn't been modified along the way and it came from someone who's claiming to be the sender. So what we do is at the source, the sender A, before they send the message, they calculate the message authentication code for that message. So they take the message M in this case, M for message. The MAC is a function. So here MAC is a function and it takes two inputs, the message and a secret key K. And the function takes the message and the key and produces what should be a random but short output which we call a tag T. And unfortunately it's also called a message authentication code, a code. So sometimes I'll try and refer to this as the tag as the output, but more generally we call the output is the MAC. So the output of a MAC function is a MAC. So that gets confusing. The idea is that that message authentication code or tag is short and fixed length. So short, usually in order of hundreds of bits and it's always the same. Doesn't matter what length the message is on input. If we have different lengths, different length messages on input, we'll still get the same N bits as output in a tag. And in general, we'd like the tag to be random, that there's no noticeable structure in the tag and that some requirements that, for example, similar to encryption, two different input messages with the same key will produce two different output tags. That was our requirement for encryption. If we encrypt two different messages using the same key, we should get two different ciphertexts as output. We still have that requirement here for our MAC function. In fact, it's similar to symmetric key encryption. If you write an equation for symmetric key encryption, it's effectively identical. Let's write it. And many of the requirements are the same. That is, think of a symmetric key encryption we talk about, we take, we encrypt using some key, some message and get some ciphertext. We use P for plain text or M for message. This is symmetric key encryption. If we write the equation for a MAC, we take a key, a secret key also, some message as input and it should be a function that produces some output we'll denote as T. So there are similarities between symmetric key encryption and using a MAC. The requirements of two different input messages, if we use the same key, but have two different input messages, we should get different outputs. Those requirements are the same. And similar, we would like that, given the same message, if we use a different key, we'd get a different output. With encryption, if we encrypt one message with key one, we get a ciphertext. If we encrypt the same message with key two, we should get a different ciphertext. The same requirement is expected for a MAC function. And in fact, some of the MAC functions that are used use concepts of our symmetric key encryption algorithms. So there's a lot of similarities there. A big difference is that the tag is usually short and shorter than the message. With symmetric key encryption, the ciphertext and the message, or the plaintext, are the same length. With a MAC function, usually we can have any size input message, doesn't need to be fixed, and we'll get a fixed size output tag, and usually small. So we could calculate the MAC of, say, a 10,000 byte message, and it will produce a, say, 128-bit output tag. Whereas with encryption, we talked about encrypting some message of some size, and we'll get ciphertext, which is the same size as output. That's different here. The other thing is decryption. If we remember the requirements for decryption for symmetric key encryption, we require such that we can decrypt easily. That is, D, using the same key of the ciphertext, we'll get some M prime as output, which should be the same as the original M. That's our requirement for encryption, that we can decrypt. So when we design a symmetric key encryption algorithm, we must design it to meet those requirements, such that when we get the ciphertext, that we can decrypt it. But that's not a requirement for a MAC function. A MAC function doesn't need to go backwards. An encryption function does. And that leads to the fact that it's generally easier to design better performing MAC functions than encryption functions. So we don't need to be able to take the tag and do the opposite of the MAC and find the original message. So we're not right that reverse operation. In fact, one of the strengths of or the security features is that it's hard to do that. So with encryption, it should be easy to encrypt and easy to decrypt. Assuming we know the inputs. With a MAC function, it should be easy to calculate the MAC of some message, but it should be hard, given the tag, to find the message. So this should be easy. This is normal encryption. So should this. And calculating the MAC should be easy. But with a MAC function, it should be hard to write it differently. Given T and K as well as the function, find M. That should be hard to do. That's a requirement of our MAC function. And we'll state it in the slides. Again, finding the key K should be hard. That is, it's a secret key, the same as encryption. So someone shouldn't be able to guess that. And calculating the tag without the key should be hard. So we'll see this requirement of what's hard and easy stated in some of the later slides when we look. But we don't need it to be able to go in the inverse in a MAC function. We will see in the next topic, about hash functions are an alternative for authentication. And a MAC function is similar to a hash function also, sometimes called a keyed hash function. But we'll return to that after we go through hash functions. So let's look how MACs can be used. We'll go through maybe just two of these cases. The bottom two are related. We'll go through the top one first and look at how a function that has the properties that we stated can be useful for authentication. And look what an attacker can do. So just focusing on the top one, we'll see what we can do, how it works and see what an attacker can do and why this provides authentication. Let's just explain what the picture shows first. The way that we interpret this picture is that, we have our message M. The source A is sending a message M to destination B. And in this case, we want the service of authentication and data integrity. We don't want the service of confidentiality. I don't care if someone reads my message. I just want to make sure no one modifies my message and that the receiver is sure it comes from me. That's what we aim for in this approach. So these pictures show that the source A is on the left side and B on the right and these are the steps that are taken at A before they send the message. So they have some message M. What they do is they calculate the MAC of that message and it's using different notation here. It says C, calculate short for calculating the code. So C here, think of that as the MAC function. So the MAC function takes two inputs. It takes the message and a secret key. And this secret key is shared between A and B. So the assumption that A knows the key and B knows the key. No one else knows the key. Same as symmetric key cryptography. We calculate the... Using the MAC function, we calculate the tag. So I'll write it here as the output in our notation is T that comes out. So just be aware that different notation is used. Sometimes a tag is called, sometimes it's called a code. And we then concatenate that message with the tag. So the two lines there mean concatenation. Just join them. And then we send the result across the network. We send it to B. So what we can say is that sent to B is... the message concatenated with the tag. Where the tag was calculated in this step as the MAC using key K of the message M returns T. So this is just a graphical illustration of the steps that A takes. The message in the clear is sent to B and attached with that message is a tag. And that's shown there that the grey box at the bottom of the message is the tag. What B does is verifies. And the way that they verify is that they take the received message M and using the same MAC function and the same key K calculate the tag of the received message and compare that with the tag received. The part that they receive from A. If they match, it assumes everything's okay. If they don't match, something's gone wrong. So let's just denote what those steps are. So the message and this is again... we calculate... I'll use slightly different notation... using the MAC function key K. And let's denote this as the message received. M prime. It should be the same as the message sent unless an attacker has performed an attack and modified it. And we'll see that in a moment. So that's why I denote it as M prime, meaning the message received may not be the same as the message sent. So what B does, which doesn't know that yet, they calculate the MAC of M prime and they get a tag as output. I'll denote as T prime. And they compare T prime. Now I've made a compare two equals with the received T here. What can we call that? I've used the not-good notation. Let's call this T double prime. I'll explain why in a moment. If they match, we assume everything's okay. If they don't match, we assume something's gone wrong. Let's go back and see the notation we're using. A sends the message plus the tag T. B receives a message plus some tag. But they may not be the same. If an attack takes place, they could be different. That's why I've denoted, let's say the received M is M prime. The received tag is T double prime. I should have done that better, but that's what we've got, T double prime. B calculates the tag for the received message M prime and compares that to receive tag T double prime. So if they match, it means we'll assume that both the message has not been modified and that the message came from A. And if they don't match, we'll assume that either the message being modified or the message did not come from A. So that's the idea here. Let's look at what an attacker can do and see how this authentication works. And I'll maybe change the notation instead of double prime. We'll try something else that's a bit easier to see. I'll draw it again, but in a slightly shorter version. We have our user A is going to send to B and the user A does the same thing again at the source that they calculate T equals the MAC of using K. And let's call it KAB to be precise of M. That is a key shared between A and B. And they send to B the tag plus the message. So what's sent is the tag and the message. Now let's perform an attack on this and let's say a third party has intercepted this before it gets to B and that they modify the message. So the message and the tag haven't got to be yet. So it's intercepted by someone in the middle. Some other user, our malicious user intercepts. Mal for our malicious user. And let's say that they modify the message M and then send the modified one on to B. They don't modify the tag yet. They just change the message to be M Mal, if you can read that. B doesn't know this yet. Let's say the message M was 1,000 bits long. The tag was 100 bits. What the malicious user does is changes the first 1,000 bits to be the values that they want. But they keep the last 100 bits to be the tag which is the same as the original one sent. So they just modify the message with the intent of performing an attack such that B will not detect this modification. What does B do? B goes through the verification steps of calculate the tag of the received message, compare it to the received tag. So we calculate the tag of the received message, the steps, the received message M Mal using K, A, B. And what do we get? Let's call it T calculated. I'll use different subscript to just denote all the different values we got. The one that we calculate at B. And we compare with the T received. There's the question for you. Does T, the calculated value, equal the received value? Yes or no? Yes? If yes, why? Does the calculated value match the received value? No, why not? What requirement would we need or would we rely on for that to be no? And think about encryption, similar operations of encryption. Why does it not match? Okay, let's look at how T was calculated. T was calculated as the MAC, the MAC function using K, A, B of some message M. T-CUC here is the MAC function, same MAC function, K, A, B, same key, M of the malicious user, different message. Similar with encryption. If you encrypt two different messages using the same key, you'll get two different ciphertexts. And that's the same requirement that we expect for a MAC function. If we apply our MAC function on two different messages, M and M of the malicious user, using the same key, our requirement for security is that it will produce two different output tags. So if that requirement is met, then this will be false. They would not match. And if it's false, when we compare them, that implies to be something's gone wrong. Are they equal? No, and that implies to be some error. Don't trust the message you just got. Either someone has modified it or someone is pretending to be A. We're not sure what went wrong, but we're not sure something went wrong. So this relies on the fact that we'll state it just to be clear. The MAC of two different messages with the same key, I don't think it's clearly on the slides, produces different tags. That's our requirement of the MAC function. Same key, different messages will get different tags. And if that's the case, then that's the two T values, T and T calc will not be the same. Any questions on authentication with MAC so far? Then try an attack where you're the attacker and change the tag as well. I'll give you five minutes to try the same scenario. A calculates the tag, sends M concatenated with T, but the malicious user intercepts. And what the malicious user tries to do, again, they want to go undetected. What they try to do is change not just the message, but change the message and the tag. Try that. Try and draw it, see what happens, or see whether an attack can be detected or not. In this case, we just A sent a message to B, but the malicious user now intercepted. And the malicious user made an attempt to change the message. Maybe the message was decreased Steve's salary by 100,000 baht, which would be down to zero or negative. And they changed it to increase Steve's salary by 100,000 baht with the intent that B, when they receive it, thinks it's the valid message. So in this case, they changed just the message. They didn't change the tag. And the result was that B, they check. When they get the message, they calculate using the same MAC function as A would have used, the same key that A should have used on the received message. And if nothing's been modified, if we use the same key and the same message, we should get the same tag. But in this case, because the message has been modified, we get a different tag. I denote it here as T calc, the calculated one by B. It's different from the received tag because the two messages are different. So therefore, B detects something's gone wrong. Don't trust the message. And that's how we achieve authentication. We don't, we either, when we get a message, we check whether it's trusted or not. If it's not, then we discard it or take some action. And the reason that worked is because we have some requirement on the MAC function that the MAC of two different messages, in this case M and M of the malicious user, if we use the same key, KAB and KAB, the function should be such that it produces two different tags. And T calc will be different. What can the malicious users send on to B, try and go undetected? Let's say first they modify the message, his own message, and they need to attach a tag because B is expecting a tag. What do they attach? Not the old T, different T. And what's the value of T mal? Well, we could try to calculate it as the malicious user. We know the MAC function, the algorithms are public. And remember a MAC function takes a message as input and a key, or the message is our new modified message. What key do we use to calculate that? Key, not key AB. That's the main point. We cannot use key KAB because the malicious user doesn't know that shared secret key. Remember, a shared secret key between A and B cannot be known by anyone else, otherwise it's not a secret. So that's where this attempt by the malicious user fails because what do they use here? Well, it's not KAB, let's denote the key of the malicious user, some other value. B receives, and they check. So what do they get? They calculate, they apply the same MAC function using what key? No, they use the key that they think is shared with the person that they think it came from. IMB, I received a message from A, so I use KAB. And I'm applying it on the received message, M mal in this case. And then I compare that calculated value with the received value. Do they match? Why don't they match? Here, we have the same message, M mal, M mal, same MAC function, but a different key. KAB was used here in the calculation, K of the malicious user was used here, which is not KAB, and our requirement of the MAC function, which is the same as for encryption functions, two different keys, produces two different outputs. So B knows there's an error in that case. Don't trust that message, something's gone wrong. The MAC of the same message using two different keys produces two different tags. Similar to our previous requirement, which was the MAC of the, using the same key of two different messages produces different tags. The same message using two different keys produces different tags. Those are requirements of the MAC function. So what if the malicious user can make B think that the key to use is K mal, then the malicious user will be successful. This relies on B knowing that I need to use KAB, because I know KAB is shared with user A. If that doesn't hold, then yes, the malicious user can be successful. You can find that case and see. It works quite easily. So there's some assumptions that we have here, and that's that continuing assumption. A secret key is known and secret between A and B. One last case. This was modifying the messages. Mal modifies the messages along the path from A to B. What about the malicious user pretending to be A, sending a message to B saying, here's a message, it's from A. What can we do? It'll be the same, so I will not draw it, but let's say A is not involved. It's just the malicious user sending a message to B saying, here's a message, it's from A. B will follow the same steps, but B will use KAB to check, and the malicious user will not have KAB and will not be able to fool B into thinking it came from A. Maybe we'll draw that, but we'll see it's the same as this case. Let's draw it to be complete. The malicious user sends a message to B, where they have some message. I will not denote it as M or the malicious user. It is, it's from them, but there's no other message in this case. Where T is calculated as a MAC of the message using what key? The one that the malicious user knows and not the one that A and B share. B receives it, thinking the message is from A. This is a masquerade attack, so when they think the message is from A to verify, they calculate the new T of the received message using which key? KAB. Are they equal? Again, we have the same message, two different keys, therefore they will not be equal, and B detects there's an error. It's like the same requirement of the MAC of two, the MAC of the same message using two different keys will produce two different tags. So this is the case that A is not involved, the malicious user just sends a message, pretending to be A. They cannot be successful because they don't know KAB. So that are the three cases, I think, of the attempt at attacks which are detected by B. As a result, B cannot change anything along the way. You can try other cases. We tried changing the message only. We tried changing the message in the tag. They didn't work. And we also tried pretending to be A. It didn't work. You can try other cases. There's no other cases that differ from these. So the end result is that if something's changed or we pretend to be someone else, B would detect that. And that provides authentication because we can be sure that the message came from A and it hasn't been modified if it passes as yes that they do match. How can the attacker be successful then? Yep. Only change the tag is the other thing. So we covered two of the three cases. We changed the message but not the tag. Then we tried and it didn't work. We tried changing the message and the tag. It didn't work because we didn't have the key. Keep the same message, change the tag. What happened? It would not work again because if we changed the tag and we tried to check it with KAB, we'll get a different tag. So it will be detected by B. So changing the message on its own didn't work. Changing the message and the tag doesn't work. Changing the tag only will not work. There's no other option. Don't change anything. Well, there's no such attack. And also pretending to be A doesn't work. So that's all cases covered. Of course, those requirements that I stated in green must hold for our MAC function. So the security of this depends upon that this is true. Two different keys, two different tags, or two different messages, two different tags. So the MAC function must be designed to meet those requirements. And they are similar requirements to encryption. And there are algorithms that produce that. The first one was said the MAC of two different messages with the same key produces different tags. True or false? Is that statement true or false? So with our MAC function, we need a function such that two different messages, as input, when we're using the same key will produce two different tags. Is it always true? Sometimes true. Always false. We would like it to be always true because the security of our scheme depends upon this being true. But in fact, it's not always true. Why? Because one of the practical requirements, if we return to our slides, we said that the tag T is usually a small fixed size output. So the thing that comes out, the tag is usually small in terms of, let's say 128 bits. And it's always that length, no matter what messages input. And usually it should work for different length messages, larger than 128 bits. Let's see why our property will not always hold. So let's give some example numbers to illustrate this new concept. Let's say our tag T is, let's start simple, five bits long. It's always five bits. So what we do, we calculate the tag using a MAC function. And as a simple example, if we take one message and using the same key, we get tag one. If we take a different message, same key, we expect or we hope that we'll get a different tag. That's what we said we require for this to be secure. But is it always true? Let's say the messages M is larger than the tag length. Let's say the messages are, I'll just make up a number, a thousand bits. Let's make a smaller number, sorry. Ten bits. We'll do a more realistic example in a moment. So let's say we've got a MAC function that takes a 10-bit message and always produces a five-bit output. How many possible input messages? How many possible values of messages are there? Different messages. Two to the power of ten. With ten-bit messages, there are 1,024 possible combinations. So we take a message as input, using the same key, apply our MAC function, we get a tag as output. How many possible outputs? Two to the power of five. What does that imply? We have a function that has 1,024 possible inputs, 32 possible outputs. It means that some of those inputs must map to the same output. There can be no way in which we get a unique output for every possible message. So that implies that there will be some messages that produce the same tag. But we said our requirement for our MAC function was that two different messages will produce a different tag. So we have a conflict here. So in theory, our requirement is not met. Two different messages may produce or will produce the same tag in this case. You can think if I draw all the messages, let's say M1, M2, M3 up to M1,024 and list all the tags, T1, T2, T3 up to T32. A function is a mapping. It maps the inputs to outputs and we have more inputs than outputs. Some of those inputs will map to the same output. Let's say M1 becomes T1. There must be some M that maps to the same tag and on average multiple M's will map to the same tag. So that's a problem. For security, we require that it not possible to have two different messages and two different tags, but in theory, if the message length is larger than the tag length, it is possible. So how do we avoid this? How do we make it unlikely to occur? That is, two different messages map to the same tag. What will you do again? So set the input message length to be less than the tag. That will ensure that we can avoid the same mappings, but not very convenient because what we want in practice is a small tag so that when we send the message in the tag across, that we have a small amount to send and in fact, if we have a small message and a small tag, it's very easy to find the corresponding tag even if you don't know the key. So we have a problem if we have a small tag that will be easy for the attacker to guess that tag. So in fact, what you do is try and make the tag longer. Let's put some numbers to it running out of space. Let's say the message can be any size and the tag is 128 bits, more realistic values. Then again, we will have messages that map to the same tag. How many possible tags? Two to the power of 128. What we rely on is it must be hard for the attacker to find two messages that do map to the same tag. When we have a large number of tags, two to the power of 128 is a large number of tags, it should be hard for the attacker to find two messages that produce the same tag. So in theory, two different messages can produce the same tag but in practice, if we design the algorithm correctly and have the tag which is long enough, many possible values, it should be hard for the attacker to find messages that produce the same tag. So the design requires the tag to be of a reasonable length in terms of bits, 128 bits or longer is usually used, depending upon the security requirements. We will return to this concept of messages mapping to the same tag when we look at hash functions. Same concept of when you hash some value, you get a hash value as output, we want to avoid collisions of those hashes. So we'll do some more analysis of that when we look at hash functions. Let's return to our slides and see what we've missed. We went through just the first case. Look at the second case. We do not need to analyze because there's almost the same. This one introduces encryption as well. C is the MAC function. In the first case, there was no confidentiality, just authentication. In the second case, let's say we want our message to be private as well. I don't want anyone to see my message. So this is a case where we calculate the MAC on the message using one key, concatenate it or join it with the original message and then encrypt all of that using some symmetric key algorithm like AES using a second key, K2. So the same concepts apply in terms of the security requirements of the MAC function, but in addition, we also encrypt the message just so that the attacker cannot see the contents of the message. So this is do the MAC and then encrypt. And it's similar at the receiver. We decrypt and do the same as comparison as the previous step. The third case is similar to the second case, but they're in the opposite order. Here we do the encrypt, encrypt the message using our symmetric key cipher and then apply the MAC on the ciphertext as output. Send them both. That is the ciphertext and the MAC across the network. The difference between the bottom two pictures in this slide, the middle one is MAC, then encrypt, and the last one is encrypt and then MAC. And we're not going to explain the differences. There are subtle differences in terms of security in those. That is, some are recommended for particular purposes in real applications. So those who sat through the lab on yesterday and are going to do it tomorrow, the others, we did a capture of SSL. No, SSH, SSH. And if you saw the algorithms in there, if you saw what was encrypted, I think it was called something like, I just remember now, it was AES128 with CBC and it was using those that haven't seen it, we'll see it later. It was using HMAC and ETM. ETM. Encrypt then MAC. That's what the ETM stands for. Encrypt then MAC. HMAC is a MAC function. So SSL, SSH to secure shell into applications, SSL will come later. SSH uses encrypt then MAC. But we're not going to analyse the differences between them. Just be aware, these two are combining both a MAC and encrypt, but the ordering differs. And there are some subtle differences as to which one is more appropriate under different conditions. Usually encrypt then MAC is recommended, but it has some limitations. On Friday, we'll look at the security of MACs. We'll do a little bit more analysis, just with a few examples. Today I'll just finish on the MAC algorithms. The security requires a little bit more concentration than we all have today. What MAC algorithms are there? There are many. There's not just one standard MAC algorithm. Data authentication algorithm, DAA is one of the old ones based on DES. So there's similarities to symmetric key encryption. So the MAC algorithms are based upon the principles or use the symmetric key ciphers. So DAA was based upon DES, but no longer used. Cipher-based MAC, C-MAC, allows you to use triple DES or AES or other ciphers to create a MAC. There's O-MAC, P-MAC, U-MAC, V-MAC and others. So there are many different MAC algorithms in used in practice. There's H-MAC, which we will commonly see in network protocols. H-MAC actually turns a hash function, like MD5 or SHA, into a MAC function. It uses O-pads and iPads and XORs. We will see H-MAC in many examples and we'll come back and explain that after we've looked at hash functions in the next topic. So we're not going to go through any specific MAC function, but we'll discuss the concepts.