 So, up until now, we've focused mainly on which security service. When we talk about DES, the classical ciphers, AES, RC4, which security service do we use as most of the examples? What security service do you think we've covered most? What are the six security services? That's an easy question from the first quiz, the first few lectures. I mean, I remember the six security services, even I have trouble sometimes. We've got confidentiality, keeping our messages secret, authentication, making sure that we're communicating with the right person, access control, controlling who can access the computer system or network resources, data integrity, making sure the messages are not changed between source and destination, non-repudiation, making sure someone cannot deny that I've sent a message, and availability, stop denial of service attacks. Which of those six do you think we've used or we've addressed most so far? I hear many different, which of the six, which of the six services have we addressed the most? Anyone? I think the map won't tell you, it may lead you out of here, but that's not going to help. Which service have we been covering when we looked at the encryption techniques? What's the main idea when you think of encryption? The main idea is, when we've used in examples, is to keep a message secret, which is the service of confidentiality. I want to send a message to someone else, I want to make sure no one else can read that message. This is the confidentiality service, that's what we've used in most of our examples. We take our message, encrypt it with desks, send the ciphertext, no one else can find the message. We, instead of using desks, we can use AES, we can use RC4, we can even use RSA. But we've just seen with public key cryptography, with Diffie Hellman, we've got a different purpose, exchanging keys. And also we saw, and question one in the quiz, was with RSA, we can also provide authentication. With RSA, if we use the keys in the opposite order, private key, then public key, we can authenticate and confirm where a message came from. For the next few topics, starting from hash functions and MAC and then key distribution, we're going to move more towards these services of authentication and data integrity and even some of the others. That is making sure what we send comes from the right person, or what we receive comes from the person who say they are, making sure nothing is modified along the way. So data integrity and authentication. So we need to know the techniques that support them. Before we start those techniques, from now on, we're going to assume that the encryption algorithms we use are secure. So we don't, we will not say that we must use AES or DES or triple des or RC4 or some other algorithm. We'll just assume that we have some algorithm and that it is secure. And the key length used is also considered secure. A brute force attack against the key would not be successful in practice. So when we discuss from now on, when we say we use some encryption algorithm, assume it's a secure algorithm, which means that if an attacker has the ciphertext, then they cannot find the plaintext or the key because to find the key they need some brute force attack or they need to be an insecure algorithm. So when we analyze how hash functions are used and other authentication techniques, we're going to assume that whenever we have an encryption function and corresponding decryption function, that it works as we expect for a secure algorithm. There's no way to break it. So now let's explain what is a hash function and what is it used for in security. Everyone remembers hash functions? Where did you cover them? Hash functions. Have you seen them before? Yes? Maybe you've seen it in a different course. Something about passwords. Where did you see that? What course? PHP. Where did you study PHP? Which course? Lab last semester. Okay. You may have seen a hash function used for storing passwords. We're not saying encrypting passwords, but storing passwords. In fact, we'll look at that in another topic. So we will look at that in more detail. Have you seen them elsewhere or heard of hash functions in a, maybe a programming course or algorithms? Possibly when you look at storing data or data structures and algorithms, maybe a course on that where you store things in a table, you may use a hash function as an index to look up on in tables. So maybe even in first year or second year, you may have heard about hash functions. We're going to look at hash functions from the perspective of providing security and what they're used for. In simple terms, what is a hash function? Some function here denoted as uppercase H that takes some input M and returns some value called a hash value, lowercase H. So it's a bit confusing. Uppercase H is the function. Uppercase H is the result of that function, the value. Normally the input is a variable length block of data. So it may be 10 bytes, it may be 100 bytes, a million bytes. The length of the input may be variable. And the output, the hash value, is usually fixed length and usually small. And when we apply some hash function on some input, we'd normally expect to get some random looking output. That is, the relationship between the input and the output are not easy to determine. We'll see, to use hash functions in cryptography, the way that they use, we require them to have the following two properties. Not necessarily have both of these properties, we'll cover later which properties are needed in which cases, but sometimes they need these properties. And they are that it's hard, that it's practically impossible if you know the hash value to find the value M that produced the hash value. So a hash function takes some input M and returns the hash value H. It should be easy to take the hash of M and get lowercase H, it should be hard to take lowercase H and find M. So we'd say it's a one-way function. It's easy to go in one direction from M and get calculate H, but it's practically impossible to go the opposite direction from H to get M. Even if we know the algorithm, it should be hard to do the inverse, that is. It should be easy to calculate H, but it should be practically impossible to calculate the inverse of H. And we'll see some examples of that. It's not so hard to find algorithms that do that. So property one there, so a hash function takes some messages and input, this is the hash function. It takes some messages and input, returns a hash value as the output. What we're saying is that should be easy. If I have M, I should be able to calculate lowercase H easy. But if I have lowercase H, calculating the inverse, that is the opposite direction, taking H and finding M should be hard. It's actually impossible if we have reasonable size values and a secure function. So we need a function that we can do in one direction but not the opposite direction. We'll see some reasons why that's necessary or under which conditions it's necessary. Another property that we often will desire is that if we have two different messages, M1 and M2, and we use the same hash function, H, then when we hash those two different messages, the values that they produce are different. That is if the hash of M1 produces H1 and we have a different message using the same hash function M2 produces H2, what we'd like is that H1 should not be the same as H2. They should be different. That's what the second property is saying. When we hash two different messages, we get two different values. Will it always happen? Is that possible given what we've said about the hash function? It's impossible. Come back to the first requirement or the characteristic of a hash function. We said that our function should be able to take a variable length input and produce a fixed length output. Assuming the input can be larger than the output, then it means that some input values must map to the same output value. If two different inputs map to the same hash value, this property doesn't hold. We get what's called a collision of the hash values. For example, if my input we said was variable length, let's say my input message is 1,000 bits, a length of the message is 1,000 bits, how many possible messages are there? Two to the 1,000 possible messages. So my hash function must be able to take two to the power of 1,000 different messages as input and I take a hash and one and I produce a hash value. If the hash value was also 1,000 bits, then there are two to the power of 1,000 possible hash value or hashes. In that case, it's possible in theory that we do not get such a collision. That is we never get the same hash value for two different messages because we have the same number of hash values as we have messages. Now the problem arises, what if our message is larger? If it was 2,000 bits in length. Because we say we should be able to allow a variable length, any size message. If our message was 2,000 bits in length, then there are two to the power of 2,000 possible messages. Our hash is still a fixed size, 1,000 bits. It means now there are going to be some messages that produce the same hash value because we have more messages than hash values and they need to map to some hash value. Therefore some of those input messages, which are different, will map to the same hash value. So we've got a contradiction here in that our requirement is that our hash function will take a variable length input and produce a fixed size and normally a small sized output. So then we say later a requirement is that it should be practically impossible that given two messages we produce the same hash value. The important thing, practically impossible or computationally infeasible. Yes, it's possible that two messages produce the same hash value. In theory, we just need to make it such that it's very unlikely. So in theory, two messages can produce the same hash value. We need to make it such that it's very unlikely that two messages will produce the same hash value and we'll come back to that and how to do that as we go through these slides. So we'll return to these two properties with some more detailed examples soon. What do we use a hash function for? Main reason is used to determine whether or not data has changed. We're going to use it and we take a hash of some data, M, we get a hash value and then when someone receives this data, they'll make use of the received hash value and the received data to try and check whether something has changed between the sender and receiver. In terms of security, the hash function is used in many different areas, not just in authentication, so it's used to authenticate messages, which means when I receive a message, I need to be able to confirm that that message came from someone else or came from who they say they are and has not been modified along the way. That's important. It's no good if I receive a message and it's been changed between the sender and me. I need to be able to confirm if it's the same or not and we use hash functions for that. Like you saw in your programming, your lab last semester, you can use hash functions in storing passwords and we'll see some examples in another topic. Digital signatures, when you sign a piece of paper, sign a document, that's some confirmation that it came from you. We have the same in computer networks and in computer systems, we have a digital signature where we can sign some piece of information to confirm it came from you and we'll use hash functions there. Things like detecting viruses. If there's a virus incorporated in a piece of software, we can apply hash functions to try to confirm if that piece of software or that file contains a virus or not to detect signatures of viruses, VIRI or however you call the multiple, the plural. In random number generators, the hash functions have been used. So hash functions are used in many different areas in security and that's why we have a topic on them. This is some visualization of a hash function h here. Take some variable, often large input, some message and produces a short fixed length output, the hash value. Because the message can be larger than the hash value, there's a chance that multiple messages will produce the same hash value. This also shows that in practice we'll often have the input message have to be a certain length. For example, a multiple of some number of bits or some number of bytes. So the hash function may take something that is multiple of say 8 bytes. So if your message is 7 bytes, you need to pad it out to 8 bytes. That's the concept there. If your message is 13 bytes, you need to pad it out to 16 bytes. So often algorithms will have a block size or a length that is required. And often the length of the original message may be included in that padding to make it out to an integer number of blocks. But that's not an important part of the security at the moment. Let's go through some cases, especially related to authentication of how a hash function is used. Here we want to verify that the sender. I receive a message. I want to make sure that that message came from the person who said they sent it. And so that's the identity of the sender, sorry. And also the data received has not been modified. So verify that the data received is exactly the same as what was sent. And this is called message authentication. Authenticate a message that I receive. And when we use a hash function to provide such authentication, sometimes the terminology used is the extra information we attach to a message is called a message digest. Let's see that in some examples. So we're going to go through three or four examples that show a hash function in use combined with our other security primitives. And let's first explain what this diagram shows. It shows our source, our sender, user A, sending something to user B at the destination. And you can imagine on this side is at the source computer. And they send something across the network, which is this gray box in this example. And the destination receives that and performs some operations. What are the operations in this example? M is the message that we start with. H is apply a hash function. So this notation used here is saying we take a message and we apply a hash function on that message. And we get a hash value as output. Some other operations we see, a concatenation operation, these two bars, the two vertical bars. That is sure for concatenation, which means simply combine the two inputs together to make one output. So here we take the message and add to the end of that message. We concatenate the message with the hash value. And the output is another operation. E is sure for encrypt. We've seen this in other topics. So we apply an encrypt function. And we use, with this encrypt function, key k. Normally when we use the notation k, we're assuming symmetric key cryptography. And it's obvious here because we'll see at the destination we use the same key k, symmetric key cryptography. D is, of course, decrypt. We take some input, decrypt using key k. We get some output. And again, we have a hash operation and compare. So that's just the notation we'll use and we'll see in these diagrams. What is happening here? In this case, we have an example of we're encrypting a message. And we're also using a hash of that message to provide some form of authentication. So the source takes their message, calculates the hash of the message, gets a hash value as output, and then they combine the original message and the hash value. So we can write that. You can draw on your slides, but here's the concatenation operation on the diagram. The output is the message concatenated with the hash of the message. And that's another way that's at this point. Take m, take the hash of m, h of m, and then concatenate them together. What I concatenate simply means if m is 1,000 bits, for example, and hash of m is 100 bits, just combine those bits together. So we get 1,100 bits. Just add one at the end of the other. And that's the output of the concatenation operation. And then we encrypt. And here we use a symmetric key siphon. For example, AES, triple S, doesn't matter. It's not specified. And using a secret key k. And our normal assumptions hold. Only A and B know the secret key k. If someone else knows it, it's not secret. And as the output of this, and we see it written under the message here, we have our message concatenated with h of m all encrypted using secret key k. So that's the ciphertext that we're going to send across the network. We send that across the network. An attacker may intercept that. So again, another assumption we make is that anything that we send across the network, an attacker, a malicious user, can intercept and see that message. So the attacker can see this encrypted value sent across the network. It may not be able to see what's inside that, what the original messages will check. What the destination does is decrypts that message with a secret key k. And when they decrypt, assuming they've got the right secret key, assuming nothing's gone wrong, what's the output? From the decrypt operation, what's the output? Or write down the output. What do you get in the notation we're getting? It's m concatenated with h of m. What we received was m concatenated with h of m all encrypted with key k, and then we decrypted using key k, therefore we'll get the original input as an output. So the output of this d operation here is m concatenated with h of m. Because again, our assumption of our encryption algorithms, we take some plain text, encrypt that plain text with some key k. When we decrypt the cipher text with the same key k, we'll get the original plain text out, which is what we get at the end of the output of the d operation. And then what the destination does is perform some authentication. It performs a check or a comparison between the message received. So now we split this up into two parts. So we have the message on its own and the hash value. How do we split it up? Well, this assumes that the receiver knows, for example, the length of the hash value. If the message was, if the length, for example, the message was 10,000 bytes and the hash value was 128 bits, which is an example length of the hash value for some common algorithm. And since the hash algorithm is known, the hash algorithm is not secret, it's known, the receiver knows the length of the hash value. So what they receive is some message which is 10,000 bytes plus 128 bits. All they do is take the last 128 bits and assume that's the hash value and the first or the remaining bytes are the message. And now they do a comparison. They take the hash value received and the message received, m, calculate the hash using the same hash function or hash algorithm as the sender used and get a hash value and compare them. That is, they compare the received hash value with the calculated hash value. The one calculated based on the received message. So in fact, we have two hash values here. H of the received m and the received hash value here which is H of the sent m. Assuming nothing's been changed, then this is the hash of the sent message. This is the hash of the received message. We compare them. If they are the same, we assume the message sent and received are the same. That's what we need to determine how this prevents the attacker from changing the message. So yes, we'll come to that. What happens or the question will be, can the attacker defeat the security of this system? What this example mechanism is trying to provide is the receiver should be able to authenticate the message. Make sure the message hasn't changed and what they do is they take the message received, calculate a hash of the message received and the received hash value. If they are the same, they assume that the message received is the same as the message sent. The reason being is one of our properties of the hash function. If we have two different messages, we shall get two different hash values. Or the other way we can think is if we get two hash values which are the same, it means the message must be the same. Let's follow that back. The lower gray box here, the hash of M is the hash of the original message, the sent message. And also we calculate the hash of the received message. So we write them down, slightly different notation. Let's say we have M sent, the message sent. And what we've received here, we have the message received, that's this M in the diagram. And this gray box at the bottom is the hash of the message sent. If you follow that back, we see the message sent, take the hash, concatenate with the message, encrypt and send, decrypt. So the output is the message received and the hash value received, or the hash of M sent. Lower case hash means the hash value here, so the hash sent, and we compare them. So we take now the hash of the message received, we'll get some hash value, and we compare these two. That's here. If they are the same, we assume that the messages must be the same, which means the message sent and the message received is the same. So this is providing a service of data integrity, but we'll see also it's providing the service of authenticating the user. We'll see the use of the secret key provides authentication of the user in this case. How do we know it came from user A? Because only user A has the secret key K. It would not, if user C sent this message, they would not be able to encrypt it with the same secret key K that we decrypt with, and therefore that provides authentication of the user. So in fact, here we have the symmetric key cryptography providing authentication of the user. So when we talk about authentication, we often mean both user authentication and data authentication or data integrity, the two services of authentication and data integrity. In fact, this one also provides confidentiality because we've encrypted the message. We may return to this one to answer the question, well, what can the attacker do to defeat this? How can the attacker change something along the way such that the receiver either thinks it came from user A, but in fact it came from the attacker, or the receiver receives a message which has been modified from what the source A sent. What can the attacker do to illustrate or to defeat that mechanism? Let's go through even a simpler example to show that because then we'll return to that one. Here's another variation of that mechanism, but without the encryption of the message. Here, similar, we take a message, we take a hash of that message, and we encrypt the hash value. We don't encrypt the message. We see the message here is sent. We encrypt the hash value with some secret key K, combine the original message and the hash, or the encrypted hash, and send them together. In this example, the message is not encrypted and therefore there's no confidentiality. In the previous example, the message was encrypted. We did have confidentiality. So let's go through the simpler one to see the attacker and what they can do, and then we'll return to the more complex one. So no confidentiality, but can we provide data integrity and authentication of the sender? We send the message and this encrypted hash value. What the receiver does is takes the encrypted hash value, decrypts it with the same key, and takes a hash of the message received and compares those hash values. And they, again, make the assumption if they are the same, then the message has not been modified and that the message came from source A and not from someone else. So the way to see if that works is to see what happens if the attacker modifies something. What can the attacker do in this case? Or what can they attempt to do? Modify the message. Okay, let's see. Let's be the attacker and what we'll do is we'll intercept this message and change something. So if we write the message down, what was sent? Here, the output of the concatenation is the message M concatenated with the encrypted hash of that message. So I'd write that as M concatenated with the encrypted using TK hash of the message. That's what is sent across the network at this point here. Let's say the attacker now modifies the message. So this is intercepted by the attacker. They change the message and then they send it on to B to know the change message, not as M, but M prime. It's a different message. The original message was reduce Steve's pay by 10,000 baht. The new one is increase Steve's pay by 10,000 baht. And let's say they do not modify anything else. So they just change the message and send the same value here, which is encryption using TK of H of M. That is sent to B. So all that's different is the message. M has changed to M prime. And let's see what B does. B takes the received message, in this case M prime, takes the received message, that's here and calculates the hash of that received message, H of received M. So they calculate the hash of M prime and they get some value as an output here. And they also take the received encrypted hash value here and decrypt that. So they decrypt this part. What do we get when we decrypt this part with key K? We get the input here, H of M. In fact, so here what was encrypted was the hash of M, uppercase H. So when we decrypted, we get the hash of M as the output. And that's here. Now we compare the two values. Are they the same? This is the hash of M prime. This is the hash of M, where M prime is different than M. Are the hash values the same? Should be unlikely. Remember, we say one of the desired properties of our hash algorithm is if we take two different input messages, we will get two different hash values. Let's assume that for now. In theory, they may be the same, but in practice we'll see that they're very unlikely. If they're not the same, and they will not be in this case because we get two different hash values, then the receiver doesn't trust what they receive. They recognize something's gone wrong. So they don't know what's gone wrong, but if they are not the same, when they compare these two values, hash of M prime and hash of M, if they're different values, they assume what they've received is not trustworthy and then must take some other action to fix that. For example, they just discard the message or they contact A in some other means. So this is detecting that something went wrong. Because the hash values don't match, because the messages are different. So in this case, our security mechanism has been successful. The attacker modified the message, the receiver detected that. They didn't detect what the real message was, but they at least detected that there's been something modified. Something went wrong. So this attack was unsuccessful. That is, the security mechanism was successful. So what else could the attacker do to defeat this? What could they try? Okay, so we see the problem that arise here was that we changed the message, the attacker changed the message, but the encrypted hash value was still the same as the original. So what could they try and do? Well, they could try and encrypt the hash of the modified message. That is, change the message, calculate the hash of that modified message, and then encrypt that. With what key? They don't know the same key K. Okay, so there's the problem. If they knew the key K, they could encrypt the hash of the modified message and then send that and the receiver would accept that. But our assumption is the attacker doesn't know the secret key K. So the only thing they could do is encrypt with some other key. If they encrypt it with some other key, when we decrypt over here, we'd get some output and we'd compare to the hash and that'd be wrong. Because when we decrypt with the wrong key, we get some different output. We don't get the original input. So again, we would detect that at the receiver if they use the wrong key here. And we assume the key K is secret to A and B. The attacker doesn't know it. So this modification attack is unsuccessful. This mechanism detects it at the receiver. And that's one example of how using a hash function is used to provide authentication of messages. Let's have a break and we'll come back and continue with some of these similar examples to illustrate hash functions in use and what the attacker can do.