 with authentication, the idea is that we want to be able to check at the receiver that the message we've received is not being modified and it comes from the person who they claim they are. They're not pretending to be someone else. So we want to prevent modification and masquerade attacks. And the approach, the general approach we do is we take our message that we're going to send and we attach some extra information. That extra information allows the receiver to detect if anything's been modified or not. So we can't stop someone modifying the message, but we can check that it's been modified. So this is detection. So one approach for authentication is to use message authentication codes. We said before, even without Macs, that if we just use a metric key encryption, we can have authentication if the receiver can recognize the correct message. But there are problems with that. So in practice, it's better to add some extra information like a Mac so that for sure the receiver can check if the message is modified or not. So a Mac function takes as an input a message and a key. So here in this example, the top one, we take a message as input and we apply a function on that message using a secret key, not an encrypt key, but a Mac key in this case. And that function produces a fixed length output, random looking output. And we have some required properties of that function is that if we take two different messages in, it will produce two different outputs. Or if we use two different keys, it will produce two different outputs. So if we change M, even using the same key, the output Mac will be different. If we change the key using the same message, the output Mac will be different. That's the properties that we want of our Mac function, such that in the top case, we calculate the tag or the Mac value and we send both the message and the tag attached to the message, just concatenated to the end. And what the receiver does, they receive the message, they receive the tag at the end. They know the key, so it assumes that we've shared a secret key, a secret Mac key in this case between sender and receiver. And what the receiver does is calculates using the same Mac function on the message using the known key and they get a calculated tag as output. And that calculated tag should be the same as the received tag. Because of these properties, if the message was modified, the two M's were different, then the tags would be different. So in the reverse, if we see that they are the same, the tags are the same, we assume the message has not been modified. And similar that if we're using the correct key, we're using the key shared between, let's say user A and user B, then if the tag receive was calculated using that same key, came from user A, then if the tags, when we compare them match and it tells us that the receiver, the message hasn't been modified and it came from user A. It came from user A or user B. But we are user B, we know we did not send it to ourselves so we can imply it came from user A. So that's what we do to apply a MAC. If they don't match, something's gone wrong and we don't trust the message. We discard it and maybe take some action to inform A. The message that was just sent didn't validate. Now, in practice, a MAC function, we said we require the property two different inputs, produce two different outputs. Because in practice, the MAC function produces a small length output, but takes a large length message as input, then it's possible in theory that two messages will map to the same tag. The security of a MAC function depends upon how hard it is for the attacker to find such two messages. So we have a large tag length, it's very hard for them to find two messages that map to the same tag. And we'll come back to the bottom to finish this topic, but the security of the MAC, there are two measures. If the attacker can find the key, they can defeat the system because they could create a modified message and calculate the tag of that modified message because they have the secret key. So one way for the attacker to defeat the system is to find the key. How much effort's required? A key of length k, two to the power of k operations. So how to stop that attack? Make the key long, in the same way as encryption. Another attack is try to find a tag without the key. And there are some attacks that try to do that. Maybe the important point to summarize here, if the tag, the MAC value produced as output, is n bits long, then the amount of effort such attacks take is about two to the power of n. So the attacker has two ways to defeat a MAC. Find the key or find the correct tag given a different message. The amount of effort it takes depends upon the key length or the tag length. And the security of a MAC, therefore depends upon the minimum of those two. So down the bottom to defeat a MAC, if the key length is k of 40 bits and the tag length is n of 30 bits, the effective strength of this algorithm is equivalent to 30 bits because that's the easiest approach for the attacker to take. So it's the minimum of those two. So in general design a MAC scheme where the key is large enough, such a brute force is not possible, and the tag is large enough. And large enough typically is in the order of 128 bits. Brute force 128 bits is not practical nor is trying the brute force attack on the MAC value with 128 bits. And MAC functions can operate on 128 bits and also larger like 256 bits. Let's go back to those two schemes down the bottom. So two examples of applying a MAC. The top example, and we went through in some depth of this top one, we went through the case, what if the attacker modifies a message? What if they try to masquerade as someone else and we saw that, well, we detected if our properties hold. Note that the top scheme doesn't provide confidentiality. It's only for authentication. The attacker can still see the message it's sent in the clear. Sometimes that's okay for us. Sometimes I want to send a message and make sure it's not modified. I don't care if other people know what the message is, but I want to make sure that it doesn't change along the way. The bottom two schemes both providing also confidentiality. They encrypt the message. So the bottom two schemes, what we commonly refer to as authenticated encryption, we encrypting the message for confidentiality plus we're doing authentication. Remember, if we just encrypt a message, authentications only provided if we can correctly recognize the plain text, but we don't want to rely on that. So these two schemes at the bottom are quite common today. For encryption, we don't just encrypt the message, we also calculate the MAC on the message. And there's some subtle differences between the two with respect to security. Let's go through what they show first. The middle scheme. We take a message. Note that C in this picture indicates calculate the MAC, produces a tags output, E encrypt. Let's say using AES, a symmetric key encryption algorithm. The middle scheme, we take a message, we calculate the MAC using some key, K1, a secret key that's known by AMB. We take the message, concatenate with the tag, the MAC value, and then we encrypt all of it together using a second key, K2. So we actually have two shared secret keys here. We send the ciphertext to B and B decrypts, and then validates the MAC. So in this approach, we're calculating the MAC and then encrypting. And the approach for this authenticated encryption is called MAC then encrypt. Calculate MAC, encrypt it all. So what's sent across the network is shown here, it's the message concatenated with the MAC of that message using key K1, and all of that is encrypted using key K2. So what the receiver B does, they decrypt, and as a result of the decryption, they get the message concatenated with a tag and similar to above, they compare the received tag, this part, with the calculated tag, and if they match everything's authentic, if they don't match, don't trust the message. So this one's similar to the top scheme. Calculate the MAC of the message, but before we send the message in the tag, we encrypt it first. As a result, the attacker that intercepts this, they cannot see the message, it's encrypted with K2, they cannot decrypt because they don't have K2, and we can verify the receiver that nothing's been modified using the MAC. Any questions on the middle scheme? We're just combining our MAC from the top scheme, plus also encrypting our data, so we have confidentiality. We're assuming the keys are secret, okay? So when we discuss this scheme, and same with encryption, we assume that if we have a shared secret key, that it cannot be discovered by the attacker. If we have a key long enough and it's chosen randomly, then they need to do a brute force. Now, we know from before we have a problem of exchanging a secret, and we have Diffie-Helman and it has its limitations. So, but for now in this scheme, we assume somehow we have exchanged a secret. So there's no way to find the secret. In here we have two secrets, K1 and K2. They could be just two random values. So user A generates K1 and K2, two random values, maybe 128 bits each, and somehow has in the past distributed those values to user B. So user B also knows K1 and K2. If the attacker finds either one, this scheme can fail. Of course, using the Mac here is better than not using it at all. If we don't have the Mac at all, we just take the message encrypt and decrypt. The problem with that approach is that, how does the receiver know when they decrypt if the message they get is the correct one or not? Well, because some messages have structure, like an image or a text message, we can check, but that's a lot of effort to manually check that the message is correct or not. So the Mac provides this automatic detection. So including the Mac means the receiver for sure can check if this message is correct or not. The bottom scheme provides about the same functionality as the middle one. There's some subtle differences in security, but let's first see what we do. The middle one was we calculated the Mac and then encrypt. Here we encrypt and then Mac. That is, we take our original message M, we encrypt with a secret key, K2, we get ciphertext as output, we calculate the Mac on the ciphertext, we'll get a tag as output and we'll have the ciphertext and we concatenate the ciphertext and tag and send them across the network. So down the bottom, you see what is sent. This is the encrypted message, the ciphertext, and this is the Mac of the ciphertext. Whereas in the previous approach, we calculated the Mac first, the Mac of the plaintext. And send the encrypted plaintext and Mac value across the network. Here we calculate, we encrypt the message first, get the ciphertext and calculate the Mac on the ciphertext. In both cases, the message is encrypted. So we have confidentiality. And in both cases, we have a Mac. So the receiver can confirm if anything's been modified. We see in the second case, we receive the ciphertext, decrypt that ciphertext. Actually, we can check the Mac first. We receive the ciphertext, we receive a tag and we calculate the Mac on the ciphertext using K1 and compare the values. If they match, everything's okay with the ciphertext and then we can decrypt the ciphertext. Which one's better? Middle one or the bottom one? What's different or what's better about the bottom one? It does it in parallel. We can decrypt and do the Mac separately. They don't depend upon each other. That's maybe one part and closely related to that. One thing we can do is we can avoid the decryption if the Mac doesn't match. Okay, so let's say we calculate the Mac on the ciphertext and we compare them and they are different. So there's no need to decrypt because we don't trust this message. We save maybe a small little bit of computation in not having to decrypt. And it means we will not even look at the plaintext. We know something's gone wrong already. So yeah, there's a little bit of an advantage from the receiver's perspective there. Whereas in the middle approach, we must decrypt at the start. So we must do decryption and then we do a check. And we may do a check and then find everything's wrong. They don't match but we had to do the decrypt to do that. Often decryption can be slower than calculating a Mac. So there's a small performance difference there. But not significant in many cases. There are some small differences in security in that the bottom scheme can protect against modified ciphertext. So we have chosen ciphertext attacks. There are some attacks where the attacker chooses a ciphertext of a particular value to try and take advantage of the decryption process. Well in the bottom scheme, if the attacker modifies the ciphertext, we'll detect that straight away. Whereas here we do the decrypt on the modified ciphertext. That is, in the middle scheme we send the ciphertext, the attacker modifies it, the receiver decrypts that modified ciphertext. And there are some attacks that can take advantage of if you decrypt a modified ciphertext, the attacker can learn something. So in the middle scheme, we decrypt a modified ciphertext. And that can be an issue in some specific attacks. In the bottom scheme, we get a modified ciphertext, first we check if it's been modified, and we find, yes, it's modified, so we don't decrypt. So in very special cases, when the attacker can do a chosen ciphertext attack, the bottom scheme becomes beneficial. But in practice, in most cases, there's not much difference in security, both are okay. And only in very special circumstances would you need to distinguish between the two. So both are okay. In theory, the bottom one is slightly better. In practice, sometimes people have made mistakes in implementing the bottom one. So in theory, even though it's better, if you make a mistake and implement it wrongly, then you get no gain. So people make arguments for both of them. Both are okay. And in fact, both are commonly used when we encrypt data. When we encrypt data, we don't just encrypt it, we also use a Mac. In our last lecture or our last topic, we'll look at internet security, and we'll see some examples where we use AES and other ciphers, and we'll see Macs are often applied as well. So the middle one is called Mac then Encrypt. We calculate the Mac and then Encrypt, and the bottom one is Encrypt then Mac. We do the Encrypt and then the Mac after. Any questions on those two schemes? Do you have to do these? Well, you should be able to look at such a picture and explain how they work, to explain if the attacker does something like modify a message, why does the receiver detect that? We went through with the top scheme, several cases. What if the attacker does this? What if the attacker modifies a message? What if they try to use a different key? Why does this scheme work? Why does it detect the modification or masquerade? And that's what I expect with these authentication schemes. Not necessarily to remember these schemes, but I may give you such a picture and an exam and ask you, explain why the receiver can detect if there's a modification attack. So I think I don't require you to remember these pictures. You'll see in past exams. Let's finish by just briefly mentioning well, what Mac functions are there? We've talked generally, we apply a Mac function on the input message with a key. How do we implement Mac functions? There are different, there are many different Mac algorithms and there are some ones which are considered secure. That is, there are no attacks which are better than either of these two. So they depend upon the length of the key and the length of the tag or Mac value. What are the names of those algorithms? We will not go through any in step by step. Some are listed here. Many of the Mac algorithms are based upon symmetric key ciphers. Very similar, a Mac and a symmetric key encryption. Symmetric key encryption, take a message and a key, produce a random looking output, the ciphertext. A Mac, take a message and a key, produce a random looking tag. And similar properties, two different inputs, two different outputs. So most of them are based upon symmetric key encryption. There was something called data authentication algorithm, DAA based upon DES, one of the earlier ones, but considered insecure. So that was based upon the earlier DES. And then there's different algorithms that use symmetric key ciphers in different modes of operation like triple DES or AES. So cipher based message authentication code, CMAC uses existing ciphers and apply a mode of operation such that a fixed length output will always be produced. And there's others, many others. There's OMAC, PMAC, UMAC and so on. So they usually take existing algorithms and modify them just for the purpose of a Mac. CMAC, OMAC, PMAC, UMAC, VMAC and HMAC. HMAC is one we'll see come up in some practical cases and we'll need to talk about the next topic to understand it. HMAC uses existing hash functions, not existing encryption functions, but existing hash functions like MD5 in the past and char and takes those hash functions and turns it into a Mac function. What a hash function does is takes a message only as input and produces a fixed length random looking output, a hash value. So a hash function we'll see in the next topic takes just a message as input. A Mac function takes a message in a key. So what HMAC will do is use existing hash functions, the uppercase H here, take a key as input and do some operations such that it produces a fixed length tag as output. So we'll see HMAC come up in some examples, but it turns a hash function into a Mac function. One of the reasons for using hash functions is because people have studied some hash functions in a lot of depth and they understand their advantages and disadvantages, so it makes sense to reuse them. To understand HMAC, we need to understand hash functions. So that's our next topic.