 So, we've gone through the concepts of encryption. Some of the details that we mentioned yesterday about all the different algorithms, you don't need to remember. What the key points that you should get out of this part so far is those concepts of encryption and some of the assumptions that we're going to make. So, with encryption we take our plain text, we apply an algorithm and we get ciphertext. And we're talking currently about symmetric key encryption. We take a key as an input to that algorithm and the ciphertext is produced as output and that same key is used to decrypt. And we arrived at a set of assumptions after going through a few examples, but I'll just jump through to the assumptions slide. And using the notation we encrypt using one key, if it's shared between A and B, we get ciphertext. When we decrypt using the same key, we assume we'll get the original plain text. And similar, if we decrypt using the incorrect key, we assume we'll get the wrong plain text and that it will be recognizable to the person decrypting, that it's wrong. Okay, so, someone gives me ciphertext, I don't know the key. I try a key, which is the wrong key. When I decrypt, we assume that the operation will produce something such that I can recognize that key didn't work, the plain text is wrong. That is, we'll be able to recognize the correct plain text. Let's look at that a little bit and extend on the properties or particularly the services that this starts to provide. We have our user, user A. Let's say they want to send some data across a network to user B. Let's go through some of the basic services symmetric key encryption provides. We want to send data and A has some plain text. I'll denote it as what, P, T. It's the plain text that A wants to transmit to B. I'll use a subscript T to say this is what we want to transmit because we're distinguished about what's sent and what's received shortly. Before we send the plain text, if we want to provide the service of confidentiality, we want to keep this plain text confidential, we encrypt it. We encrypt and we get some ciphertext, C, T, and we apply some algorithm, some encryption algorithm, whether it's DES, triple DES, AES or some other cipher. As an input, we use a key, a key shared between A and B, and we take the plain text as input. This is the operation performed at A. We encrypt our plain text and we get ciphertext and now we send the ciphertext to B. We can think A sends C, T. What does B receive? C. Let's denote what B receives as C, R. Normally we'd think if I send a value, C, T, across the network, then we'll receive exactly the same. We'll receive C, T. But later we'll look at what if an attack occurs where someone tries to modify it. That's why I'll distinguish between what's transmitted and what's received. It may be what's received is different from what's transmitted, but so we'll distinguish between sending and receiving. B receives. What does B do when it receives the ciphertext? Decrypts how? What key? The key shared between A and B. We're using symmetric key encryption, so to decrypt, we're going to try and decrypt this ciphertext, and to do so we must use the key that's shared between A and B. In particular, the key which was used for encrypting. For this to work, we assume that B knows that key. Maybe we'll note that, maybe I should have done it before, but some sort of assumed knowledge was that A had a key beforehand, and so did B. It had the same value. You cannot see it. Okay, I'll change that next time. It's KAB. What are we doing? Decrypting. So, we decrypt with KAB the received ciphertext, and we assume that if something's encrypted with some key, okay, PT was encrypted with some key to get the ciphertext. If we decrypt with the same key, we'll get the original plaintext now. But we'll denote it first as simply PR. So this is the plaintext received by B. B decrypts CR using the key from its shared with A because it knows the message came from A and gets some plaintext PR. Let's consider some cases first, the normal case. Let's consider the case CR equals CT. That means that there was nothing modified along the way. A transmitted some ciphertext, and the bits that were transmitted is what's received by B. So this is the normal case. We send a ciphertext and we receive it. What can we say about PR in that case, in particular with respect to A and PT? What's true? If CR equals CT, what does PR equal? PT, okay? Our assumption is that so CT and CR are the same. So CT, so we can almost think of it, we remove the subscripts, take PT, encrypt with KAB, we get the ciphertext, decrypt that same ciphertext with the correct key, the same key, then we must get the original plaintext. This is our assumed property of encryption. If the two ciphertexts are the same, then that means that the received plaintext equals PT, the transmitted, which is what we'd expect. So again, one of these assumptions that we're making is that encrypt with the correct key and then decrypt with the same key, you'll get the original input back, the plaintext. Let's consider some other cases. What about the case if the received ciphertext, CR, was not equal to CT, why? Even more than that. So in this case, why would I say that CR is not equal to CT? Maybe we transmit something across the network, a sequence of bits, and somewhere along the way there's an error or with respect to security, someone malicious modifies the message. We send a sequence of 64 bits and someone changes some of those bits. So what's received is different from what is transmitted. So follow through what happens. B receives CR, decrypts with KAB, and what do they get? PR, and correct, PR will not be the same as PT because similarly, if we use the wrong key, if we use the wrong ciphertext, we will not get the original plaintext back. So that's an important property of our encryption. We encrypted PT to get CT. If we try to decrypt a different ciphertext using the same key, we will not get that original plaintext back. So the plaintext values will not be the same in this case. And what's more, we'll say PR is incorrect and we assume that B will be able to know that. When they decrypt, they get some plaintext PR and really it's a random sequence of characters. If they convert it back to the message format that they think it was, say an English message, it doesn't make sense. So our assumption is that B will be able to detect and know that this is incorrect. So it's not just that it's not equal, but really they know it's not equal because they recognize that the received plaintext is incorrect. If we decrypt something where we've used either a different key or a different ciphertext than what we used in the encryption, then the receiver will be able to detect that. So B knows this is incorrect. In fact, this leads to one of the services that we provide in security systems. The first one, remember we were providing confidentiality. The idea is that when A sends a message to B, no one else can read the message. How does that work? Why is P confidential? Why can no one know P, P-T-P-R, why? You're the malicious user now. You've intercepted the ciphertext. What do you need to do? Let's draw you, let's go back to the case one. You are the malicious user now. You've intercepted the ciphertext. You want to learn the plaintext. What do you need to do to learn the plaintext? Or what do you need to know? You need to know the key KAB because if you know the ciphertext, similar to B knows the ciphertext, to get the plaintext we must decrypt it. So in fact, we need to know the key to decrypt. And that comes back to one of our assumptions that is the key KAB is secret. The malicious user doesn't know it. If they did know it, it wouldn't be secret. So the malicious user, since they don't know the key, what can they do? How can they get the plaintext? Try. What did you try? Brute force. So you could try different keys. All right, let's try. So our malicious user consider they've received the ciphertext. They've intercepted. So what they try to do is decrypt the ciphertext. Let's say CR. Here we can say just CT, assuming they're the same. What key do we decrypt with if we're the malicious user? Well we can try all the keys. A brute force attack would be to try different keys. So K, K1, and then we can try K2 for any value of K. And we get some plaintext as output. Let's call it P1. So what the malicious user does is takes the ciphertext, decrypts with some key, gets some plaintext. What do they recognize when they get that plaintext? Assuming K1 is not the same as KAB, assuming that it's not the correct key, again we recognize that P1 is incorrect. That's one of our assumptions. When we decrypt some ciphertext with the wrong key, K1 is wrong because it's not the same one that was used for encryption. Then we'll get plaintext that we'll recognize is incorrect. The malicious user will realize, ah, K1's not the right key. Let's try a different one. They'll try and decrypt with K2. They'll try a different value, the same ciphertext. And they'll get a plaintext value, P2. And then again they'll recognize P2 is not the correct key because the plaintext is incorrect. So this is one of our assumptions that we can recognize that the plaintext is incorrect. And a brute force attack is just keep trying with all the other keys. How do we stop a brute force attack? How do we stop it? How do we prevent our malicious user from being successful? Set the limit, no. How do I, I'm A and B or I know them. I don't want to allow the malicious user to do a brute force attack. I don't want them to be successful of course. A longer key, make the key long because again the approach of the malicious user, try K1. Try some random key or the key of all zeros maybe. Doesn't work, try K2. Doesn't work, try K3. They need to try all possible keys in the worst case. So to prevent a brute force attack being successful, make sure that the number of keys that the malicious user has to try is very large. And we gave some numbers yesterday of what's very large. Usually 128 bits is considered very large because it will require what millions of years to do a brute force attack. So brute force is easy to prevent by making sure the key is large enough. Such that it will take too long to try all of them. If the malicious user was lucky or if they, we had a short key, then they could try all and they would find the correct key if they keep going. But we assume again that we have a key which is large enough such that brute force won't work. So if the malicious user cannot find the key via brute force, what else can they do? How else can the malicious user find our plain text? Let's say the key is long enough that brute force will take forever. And I think they can try. Crypt analysis. So if the malicious user knows the algorithm they're using is some very weak algorithm, then they can maybe, given the ciphertext, work back and find the plain text or the key. So another assumption we'll make from now on is that the algorithms used for encryption and decryption are strong. That there are no weaknesses in them that the malicious user can take advantage of. And AES is an example which is considered a strong cipher. There are no known weaknesses which are practical to allow decryption without the key. So we'll base everything from now on on those assumptions that the malicious user can't do a brute force attack and they can't find weaknesses in the algorithm. Therefore, if we've encrypted, they will not be able to decrypt. Any questions so far? What's that service that we just provided to our users, A and B? What's the name of the service? If you go back, we went through six services and there'll be quiz questions about that for sure. What's one of the services that this provides? It's the normal one we think of when we think about security usually. What's the purpose of doing this encryption? Just for fun? To keep our data confidential, that's our purpose. So this is the service of confidentiality. When we encrypt our plain text, we send the ciphered hex across and the malicious user cannot find the plain text. Therefore, it's kept confidential between A and B. So this is the service of confidentiality. What are some of the other services? So that's one thing. So it's easy to provide confidentiality, encrypt our data. But there are other things that we often want to provide in security systems. What are some other services? Go back to some of your first lecture notes on the introduction. There's a list of six services. Security services, the title of the slide is. Have a look at them. Read through those six services. So the third one is listed as data confidentiality. We can do that easy, encrypt. But what else would we like to do sometimes? Authentication is another service. And there are two different aspects of authentication. We want to make sure, one we'll consider now is we want to make sure that the person sending us the message is who they say they are. And another important service is data integrity. I want to make sure that no one can modify the message along the way. If they do, I want to be able to detect that modification. So let's look at them and we'll see that this encryption can actually provide those services as well and see why. In fact, we just saw data integrity. Let's say the malicious user modified the ciphertext that was transmitted such that CR, the received ciphertext, was different from CT. This case, what happens? If the ciphertext decrypted is different from what was obtained from encryption, then the plaintext will not be the same and importantly, B will recognize that the plaintext obtained from decryption is incorrect. So this second case, we said that the ciphertext was modified. So we're saying if the malicious users tried to change the message that was sent across the network, what would happen is B would receive CR. They know it comes from A, so they use KAB to decrypt. And because CR is different from CT, when we decrypt with that same key, KAB, the PR we obtain will be different from PT. That's the assumption of our cipher. If we encrypt two different values with the same key, we'll get two different ciphertext values. And that concept applies here. If we encrypt PT with KAB, we get CT. If we encrypted PR with the same key and PR was different from PT, we'd get a different ciphertext and the same applies backwards. So if the malicious user modifies the ciphertext, then B will know that. Or they'll at least know that it's not the correct ciphertext because when they decrypt, they'll find that PR is incorrect. So B recognizes, ah, I received ciphertext. I decrypt it. I get plaintext which is incorrect. So don't trust this ciphertext. It may have been modified by someone along the way. It may be just an accidental error in the transmission, but we know not to trust this ciphertext because it could have been modified. And that's how we provide data integrity, or one way of providing data integrity. Someone can modify the message, but we can easily detect that it has been modified. And therefore we ignore that. So we've provided the service of confidentiality and data integrity so far. Questions? How do we know the plaintext is incorrect? Anyone? Anyone else from me? All right, that was one of our assumptions. Why? Let's find an example, a simple example to illustrate the concept. Let's start with a simple example that's maybe hard to see. Remember we did an example of the Caesar cipher where we change, we shift the letters by three positions or x positions. This would be hard to see, but I'll explain it. I've got some ciphertext. It actually goes along, you can't read it all, but it's, here's some ciphertext. And the cipher I used was the Caesar cipher. The very simple cipher of when we take plaintext, we shift the plaintext letters to the right by k positions. For example, if I have the letter a in plaintext and k is three, then the output ciphertext, I shift a three positions to the right, so a becomes b, c, d, a encrypts to become d, and so on. So this was ciphertext obtained using the Caesar cipher. It keeps going and I used some key. It wasn't three. So, what I could do as a brute force attack is to find the plaintext, try all possible keys. And I did that, and I show them here. It's hard to see, but you'll eventually recognise. So what I did was I took the ciphertext, I decrypted if the key was a or zero. I tried that key. Then I tried key b or one, and I got this plaintext. And this plaintext, which one's the correct key? 11, why? It's the only one that makes sense. And that's typical with any encryption algorithm. If we encrypt using a particular key, some plaintext, and we get ciphertext, if we decrypt that ciphertext with the wrong key, we will get plaintext that doesn't make sense. And it's only with the one key will we get plaintext that makes sense. That is, all of these will be recognisably wrong. You can recognise that all the others are wrong. This is a simple case. It's using English language. But in fact, any message we send, we can include, make sure it has some structure. That's the idea here. Why do you recognise 11 is correct? Because you see there's some structure there. There's some words that you know. With the others, there's no structure. It looks random. Even with non-English texts, with images, videos, and so on, they all have some structure such that when we decrypt, if we use the wrong key, we will recognise that they don't have the structure that we expect. When we use the right key, we'll recognise that it does have the structure that we expect. So that's an example of why we know the plaintext is correct or incorrect. Use the wrong key, we'll know it's incorrect. Use the correct key, we'll know it's correct. Well, if we... Yes, we want to know the correct plaintext. In this case, if we try all the keys, find the one that makes sense. Same concept applies. So this was the case. We have the ciphertext, we use the wrong key, we get recognisably wrong plaintext. Okay? Ciphertext, use key zero, we get random characters. The same concept applies if we have the wrong ciphertext, but the correct key will get recognisably wrong plaintext. I don't have an example of that, but if you change the ciphertext and use the correct key, then you'll get wrong plaintext. And with real ciphers today, that will be recognisably wrong plaintext. So basically, if you decrypt with the wrong key or you decrypt the wrong ciphertext, you will know that. One of them is stated here, decrypting with the wrong key will not produce the original plaintext and the decryptor will be able to recognise that that key is wrong. And the other one is stated a little bit later. We'll get to that in a moment. Assumptions somewhere. It's part of... It's captured in these assumptions that if we receive ciphertext that successfully decrypts, then we know that it has not been modified. Well, that's based upon this concept. If it doesn't successfully decrypt, then we assume that it is the wrong key or the wrong ciphertext. Let's try and write them down just to be clear. Let's come back to our... maybe a point that's missing, okay? With any cipher, we encrypt using some key, I'll just call it key1, plaintext1, and we'll get some ciphertext. Let's call it c1. The assumptions we have with the ciphers is if we use the same cipher, we encrypt with the same key, but a different plaintext will get a ciphertext. If p1 is not the same as p2, then c1 will not be the same as c2. In other words, if we encrypt two different plaintexts, we'll get two different ciphertexts. p1 and p2 are different, therefore c1 and c2 are different. And in the reverse, if we decrypt two different ciphertexts, we'll get two different plaintexts. Similar, if we use a different key on the same plaintext, look at the first and the third case, if we encrypt the same plaintexts with different keys, we'll get different ciphertext. That's the third case here. Different plaintext, different ciphertext, different key, different ciphertext. And similarly, if we go backwards, if we have two different ciphertext values, c1 and c2, if we decrypted them with the same key, we'd get two different plaintext values. Or if we have two different ciphertext values, c1 and c3, if we decrypt them with different keys, that's not a good one. Ignore that one. Different key, different ciphertexts. Different plaintext, different ciphertext. That's all. Correct. If we encrypt the same plaintext with a different key, we'll get different ciphertext. And if we encrypt different plaintexts with the same key, we'll get different ciphertext. Any further questions? Any questions? That's all easy? Not easy. What's not easy? Everything's not easy. Is there anything less easy than something else? Anyone not understand these concepts? It's okay. But we need to make sure we understand them because everything is built upon these assumptions that we're making as we go. Different plaintext, different ciphertext. Different key, different ciphertexts. So given that, what do we get to? Coming back, malicious user modifies the ciphertext transmitted so that CR is different to CT. So when B decrypts CR, they're going to get a different plaintext than PT. Why? It's that principle we just saw. That is, which one? We have two different ciphertext values. We have CT and CR. If they were encrypted with the same key, it must be two different plaintext values. So a PR and PT must be different. And again, we assume that we can recognize that one of them is wrong. A PR is incorrect. This provides a service of data integrity. Integrity says that if someone does modify a message along the way, we can detect that. And we detect it because B recognizes that the received plaintext is incorrect. The first case provided confidentiality. No one could see the plaintext because they don't have the key and they cannot decrypt. None other than A or B. Another service, authentication. How does B know the message came from A? Are they sure it came from A? Let's consider a case where a malicious user tries to pretend to be A and we'll see what happens. We'll start again. B is going to receive a message and they're going to receive not from A but what is actually from the malicious user. B receives a message and the message says it's from A. B thinks the received ciphertext is from A. What does B do? When B receives ciphertext from A, what does it do? Try to decrypt. Good. It tries to decrypt that ciphertext. What key does it use? It thinks it's from A. What key will B use to decrypt? The key it's shared with A. We denote as KAB. Same as before. So B receives a message. This is from A. Let's decrypt it. Ciphertext CR. Decrypt. We're using KAB. The key shared between A and B. We get some plaintext. It denotes PR. Now let's go back about the message that the malicious user created. The malicious user wanted to pretend to be A. So they created a message. They took some plaintext. They denote PT. And they encrypted that plaintext to get... They took a plaintext. Maybe the fake message they wanted to send to tell B to do something that they shouldn't do. And they encrypted that plaintext with what key? What key could the malicious user encrypt that plaintext with? Or maybe an easier question is which key can they not encrypt with? KAB. They can use any key. So the malicious user just... They started this. They have a plaintext. Maybe some fake message they want to send to B. And make B think it came from A. They encrypt that with a key. But make... It's clear that we assume that the malicious user could not use KAB because we assume that they don't know that secret key. So we'll denote it as just KM from the malicious user. So they use any other key. But it's not the same as KAB. Because they don't know what that value is. And they cannot guess it because it's too large. That is too many bits. There's so many possibilities they cannot guess it. So... What happens at B? What does B realize when it tries to decrypt? It gets the wrong plaintext. PR is incorrect. Why? Because, again, the ciphertext, in this case, the transmitted ciphertext and the received ciphertext are the same. That is what malicious user generated here is the same what B receives. Here. So let's look at... CT was created by encrypting... Actually, we'll just denote it as C which was obtained by encrypting using KM of some plaintext PT. And then what B does is tries to decrypt C using a different key. And our assumption is that when we try to decrypt using the different key, we will get recognizably wrong plaintext. If we use the wrong key, we will not get the same plaintext and it will be recognizably wrong. That is a random sequence of unifying structure. So when B decrypts, they get some plaintext, they recognise this is not the correct plaintext, and that implies that something's gone wrong. Don't trust this message that I received. And it's the way for authenticating who sent the message. P is incorrect. Don't trust it. Don't trust the received ciphertext. And that was the same with data integrity. If we receive a ciphertext and we decrypt and we get recognizably wrong plaintext, don't trust it. Discard it. It's considered insecure. If it's recognizably correct plaintext, trust it. Assume correct. And so this really detects an attack. If the malicious user tries to send a fake message, they don't know KAB. It will only work if the malicious user knows KAB and our assumption is that they don't know KAB because it's a shared secret between only A and B. If the malicious user knows it, then it's no longer a secret. So this is authentication. B knows that the message comes from A in the previous case and in this case it knows it doesn't come from A. When it successfully decrypts in the first case if it successfully decrypts with a key KAB then it implies that it must have been encrypted with a key KAB. And if something was encrypted with a key KAB, who encrypted it? A or B? Who has a key KAB? A and B. So if I'm B and it successfully decrypts with KAB then it must have been encrypted with either A or B. Well, I am B and I know I didn't encrypt it so it must have been A that encrypted it. So that is authentication. We know that this message came from A if it successfully decrypts. Similarly, if it doesn't successfully decrypt, we don't trust it. It could have come from someone pretending to be A. So that's authentication. And similarly we have integrity. If it again, if it doesn't successfully decrypt, don't trust that message. Maybe it's been modified along the way or it's come from the wrong user. So in either case, don't trust it. If it does successfully decrypt, trust it and it's a proof that it does come from A and it hasn't been modified. Because if it was modified it wouldn't successfully decrypt. And if it didn't come from A it wouldn't successfully decrypt. So symmetric key encryption really provides three services at once. Confidentiality no one can find the plain text. Integrity if the message is modified we will know. And authentication if it comes from someone who's not A we will know. So we can authenticate the source. Confidentiality no one can find P in the case that in the case that in the case that it successfully decrypts we decrypt the message successfully then we know that that message is confidential no one else knows P to know P you must know the key. And again we assume no one else knows the key. Confidentiality integrity if it successfully decrypts then it implies that the message may have been modified. Or if it's unsuccessful in decryption that it implies that the message may have been modified or come from the wrong person. As we saw in this case if the cipher text was modified the plain text would be incorrect and we wouldn't recognize that. And authentication if it successfully decrypts that it implies that A the person who encrypted the message which means the message came from A not someone else. Important concepts because they used in a number of security mechanisms in IT security. Any questions before we look at an alternative approach for authentication? And all of this these assumptions are again listed on several of those slides and there's one handout with two pages which captures all of these assumptions so we've jumped through slides but the one that we really just went through was this it's a simplified version so we're looking now not just at confidentiality but authentication the receiver wants to verify that the contents of the message have not been modified data authentication or what we've just called data integrity the data's not changed and the source is who they came to be source authentication so they go together really integrity and authentication often come together there are different ways to do it we've just seen using symmetric key encryption whether in fact other approaches to provide the same services will go through some of them what we saw was that we tried to go through the case where if we use symmetric key encryption that is we have a message here it's m not p and we encrypt with a key shared secret key k we send the cipher text if it successfully decrypts we're saying that it must have came from A because only A has the key to encrypt and it must not have been modified because if it was modified it wouldn't successfully decrypt so symmetric key encryption provides confidentiality authentication of the data or integrity of the data and authentication of the source making sure we know who sent the message and this always based upon the assumption that if we decrypt with the wrong key or modified cipher text we'll produce output that will not make sense and we'll be able to recognise that so symmetric key encryption can be used for all three it turns out in many cases for practical reasons that we'd like to keep confidentiality separate from authentication maybe we'd like to use different software or perform those operations at different time when we're sending messages so although symmetric key encryption does provide authentication often we want to use other techniques for providing authentication because symmetric key encryption we don't want to have to rely on it sometimes the algorithms for symmetric key encryption are slower than our other techniques so there are other techniques for doing the same thing one of them is called a message authentication code a MAC it's effectively the same as symmetric key encryption but there are different algorithms for doing it we're not going to look at it just be aware if you see a message authentication code not medium access control that you may be studying in other courses MAC in this case means a way to authenticate messages and it uses almost the same techniques as symmetric key encryption there are some details on those slides of how it works but I think we will not see any further examples we may just hear the word a MAC if you hear this protocol uses a message authentication code then recognises it provides some form of authentication of the data and the source let me get to another example some names of the MACs there are different specific algorithms in the same way for encryption there's DES, AES and others there are different MAC algorithms OMAC, PMAC, CMAC UMAC, VMAC, HMAC and others again you don't need to remember them the one that you may see in practice a little bit more is HMAC it's actually based upon hash functions which we're going to look at next but there are different MAC algorithms used in practice let's look at instead of going through MAC algorithms just summarise on the assumptions we'll make about authentication so far if we receive ciphertext and it successfully decrypts with a key KAB then we know that the original message hasn't been modified there's been no modification and that it came from one of the owners of that key and there are two people who know that key in the world A and B and if I'm B and I receive the message that successfully decrypts then it must have come from A unless I sent it to myself so we use that as an assumption through the rest of the course and if we hear about a MAC then we assume that if we use a MAC and we talk about not decrypting MACs but verifying MACs verify the authentication if the MAC is successful then we know the message has not been modified and again the MAC has a secret key one of the owners of that key that message must have come from so in fact a MAC it's almost the identical assumption if we use symmetric key encryption or a MAC we can prove the message came from a particular user and that it didn't get modified along the way let's look at another way for authentication and to look at this because it uses hash functions and in fact hash functions become important for another security service of a digital signature so we'll spend a bit of time on them everyone studied hash functions which course did you study hash functions hands down I assume yes in this class if you don't put your hand up when I ask a question I'll assume you're just lazy you just mean yes where did you study hash functions I'm sure you probably did somewhere some lab computer lab maybe some data structures course possibly I think maybe data structures course hash functions are used in computer science for what do they do and it maps that input to usually some unique value we take the hash of some input and we get some unique value as output almost unique so we'll say a little bit about hash functions and then talk about cryptographic hash functions which are really the same but have a few different properties because they are important with different aspects of security so what is a hash function here's a hash function h some function that takes a variable length block of data m so the input is some message m and the message usually can be different lengths it could be 100 bits it could be 100 gigabytes so most hash functions will take variable length inputs and what it does is it's a function that takes that input and produces a fixed size hash value lowercase h is output that's called the hash value it's a bit confusing uppercase h is the function lowercase h is the hash value the output and the output usually has some properties or desirable properties and especially with cryptography some things that we expect of the hash function is that if we apply it on different inputs different values of m then the hash values that come out should be random looking appear random and should be evenly distributed about the space so that I take the hash value of one message and the output think is a random value and I take a hash value of another message which is almost the same as the first maybe differs by one bit then the hash value that comes out should be significantly different and again random compared to the first hash values they shouldn't be similar and we'll see some examples of that basically produce random outputs that's what we'd expect there are some limitations what are some examples of hash functions some names of hash functions MD5 you may have heard of MD5 it's used in security or different aspects of computer security sometimes when you download a file from a file server if you want to be sure that the file that you downloaded matches the original then maybe some hash value or MD5 checksum attached or as a separate entity that you can download and verify so we'll see the role of that MD5 is one hash function there are many hash functions the two ones that we come across a lot in security is MD5 and SHA SHA the secure hash function we'll see in the later slides it lists the names of them MD5 and SHA secure hash algorithm let's just have a quick example I have some plain text files ones from yesterday let's just have a look plain text here's our super secret message here's our message M and let's apply a hash function on it zoom in a bit we have our message and I have the MD5 hash function the software on my computer is called MD5 sum all it does is takes the message as input not the file name but the contents of the file so this program takes hello this is our super secret message keep it secret goodbye sorry we've run out of space and calculates the hash of that using the algorithm called MD5 you already know but be careful we're not necessarily talking about confidentiality we'll see that hash functions are not used to provide confidentiality we care about authentication at the moment and we'll see hash functions are used for that but let's just see it work there it is there's the hash value in hexadecimal you can convert it to binary if you like how long is it how many bits can't you see well on the slide it says how many bits MD5 is it's 128 bit hash value a 32 hex characters so this is the hash value of this file that's all so MD5 takes any length input and always produces a 128 bit output and it should be random okay I say random looking but we think in general random I've got another plain text plain text too and let's take the hash of that what are we going to get what value are we going to get again something very different two different hash values come out this is 3FAA so on the first one was 9,1,DD,D2 and so on why are they different same hash function two different hash values come out what can we say about the plain text what conclusions can we make about the two different or the two plain text values plain text 2.txt what can we imply if the hash values are different then the messages are different and it's hard to see it wraps around but if we zoom out a bit I'll try it again plain text 1 plain text 2 are they different where are they different the dot all the characters are the same except at the end I replaced the dot the final full stop with a space so I just changed in fact in the binary form it's just a few bits I think four bits have changed so the messages are almost the same but they are different but very similar and again we take the MD5 sum of each of them and we get two completely different hash values so that's the property we expect of hash functions two different inputs always give two different outputs that's our expectation and the other way we can look at that if we get two different outputs two different hash values that implies the inputs were different and the property is that the hash value should be random so therefore even if two messages are similar in this case they differ just by a few bits the hash values will be significantly different so random in this case they will not be similar and different size messages in this case they were the same size but we could apply it on much longer messages and again we would get a fixed size 128 bit hash value if we apply the hash function on two messages which are the same for that same input message now how long is our input message both of them were 72 bytes 72 bytes what's that 576 bits so our input messages were both 576 bits in length the hash values were 128 bits in length so we've said if we take two different inputs we'll get two different hash values two inputs will produce two different outputs is that always true correct and in fact in this simple example there's more than 128 bits I said that our requirement or our desired property of a hash function was that two different input messages will produce two different hash values but we have a problem since the hash function can take any length input and the length of the input is larger normally than the hash value then it is possible for two different inputs to produce the same hash value it must be and I think that's your issue let's explain that just to make it clear our hash function let's say m in our case was m1 the length in our example was 576 bits and m2 was the same length but we could use other length messages but these were our examples they were our two different plain text messages they were different and we produced hash values h1 is the hash using md5 we used of m1 and that was 128 bits and hash2 that's hash is the hash of m2 also 128 bits we said two different inputs produce two different outputs but it's not technically true that is there could be two different inputs that produce the same output and the reason is and that's true if the length of the input is larger than the length of the output how many possible messages are there if we limit our message size to just 576 bits all messages are 576 bits how many are there in theory no, not infinite how many messages if every message we take as input is fixed at 576 bits how many are there 2 to the power of 576 we're just binary message we've got 576 bits so the number of possible inputs possible inputs that is a possible values of m mi is 2 to the power of 576 in this example if we limit the length how many possible outputs of our hash function how every hash value is fixed at 128 bits so 2 to the power of 128 which one's bigger the number of inputs the number of outputs the number of inputs that is we have more inputs than outputs it must mean I cannot draw them all but let's if we looked at all the inputs all the inputs here if I dot for every possible input and the number of outputs there are fewer it means some inputs map to the same output a hash function is just a mapping function it maps the input to an output and because we have more inputs possible than outputs it must imply that some inputs return the same output so we have a conflicting statement here that we'd like our hash function to produce a different output for every input but when the length of the input can be larger than the length of the output it must be that some inputs produce the same output when two inputs return the same hash value the same output it's called a collision the hashing produces a collision on the output so in theory yes we all have collisions but in practice of these two to the power of 576 messages that's in theory how many of them are actually English sentences when we combine words not many so in practice if we have the hash value large enough even though it's in theory possible to produce collisions finding collisions is very hard it's very very unlikely to have collisions so theoretically possible practically unlikely and that's what's used in hash algorithms in security so it's a little bit of a conflict there but in practice it's not a problem because it's very very unlikely to find two different messages that produce the same hash value it depends upon the hash algorithm as well coming back to our slides so we talk about a cryptographic hash function even though it may be in theory possible to have collisions we say that for a hash function used in cryptography we'd like it such that it's practically impossible I say here computationally infeasible meaning trying to find messages that produce collisions would take too long computationally infeasible that we have these two properties let's look at property 2 what we just addressed if we have two different messages M1 and M2 in theory it's possible for two messages to produce the same hash value but we assume that with a cryptographic hash function it would take too long two messages that produce the same hash value H this is called the collision free property that is we don't get collisions to summarize that in theory it's possible for collisions a cryptographic hash function is one such that even though it's possible to find two messages that produce a collision will take too long because in fact it would come the challenge for the attacker find two messages that produce a collision in theory possible in practice similar to brute force attacks takes too long to do that another property of hash functions is that they should be one-way function a one-way function means that given the message it's easy to calculate the hash value we did that on my computer I calculated the hash of these messages does it almost instantaneously so it's very quick to take the input message and calculate the hash value but if I give you the hash value find me the message okay so I give you just this value 9, 1, d, 2 and so on find me what the original message was that should be practically impossible that's called the one-way property that is it's computationally infeasible for someone to find a message that maps to a known hash value so if I give you h and ask you to find m it will take you all the time in the universe to do so that's what we mean by this property and again hash functions are designed to satisfy these properties and we'll assume that they do as we go through the security mechanisms so to summarize the hash function takes a variable length input produces a fixed size output it's easy to calculate software can do it quickly but it's hard to go backwards it's hard to do the inverse of take the hash value and find m that's called the one-way property in practice hash function will produce random outputs different messages, different outputs and that to find two messages that produce the same hash value is practically impossible I give you a challenge your homework for this weekend go find another message which is different than mine sorry it's different than this message but produces the hash value that's the challenge of this collision-free property so secure hash functions we assume that that property holds and that you can't find that nowadays md5 is not considered secure so in fact there are messages that do produce the same hash value and people have found them but char in different forms is considered secure if we take the char sum of a message you will not be able to find another message so char 512 produces a 512 bit output I give you this hash value go and find another message that produces the same hash value you will not that's the collision-free property what we'll see next week is how do we use hash functions to provide security mechanisms especially authentication eventually they'll become very important in digital signatures so we'll continue that next week