 Just information for those. I think everyone here does the lab. Remember next week there is no lab. So Monday is a holiday and section 2, those in section 2 on Wednesday we've cancelled that as well because Monday we cannot have the lab. And the following week there's no lab because also we have holidays. Song crown. Next Monday what's the holiday? No, what's the holiday next Monday? No, something else. It's my birthday. So next Monday and that's why every year there's a holiday on the 6th of April it's my birthday. So thank you, enjoy. Not yet, I'm not my birthday yet. What's the chance that someone has the same birthday as me on the 6th of April? How would you calculate that? The probability that someone has the same birthday on the same date as me. Not the year in this class, unlikely you have on the same year as me, but the same day and month. What's the probability of that? It's interesting to know because it's related to hash collisions. So we'll look at that case of the chance that someone has the probability that someone has the same birthday, the same date as me or as anyone and then we'll see how that relates to hashes and hash collisions. How would you calculate it? And let's assume that birthdays are effectively random amongst a large group of people. That is, let's ignore leap years. There are 365 days in the year. Ignore twins and all those things. So if you think of someone's birthday, think that it appears on some random day in the year. Let's say there's one other person. What's the chance that one other person has the same birthday as me? How would you calculate that probability? Without knowing that person. One way to look at it is to find out the chance that they do have the same birthday as me. Look at the chance that they don't have the same birthday as me. There's no need to write this down, but just look at the concepts. I have my birthday on one day in the year. If we have one other person, so if we have, let's say, a group of two people, one person, me, have a birthday one day in the year, then that other person, what's the probability that they don't have the same birthday as me? I'll say don't have the same as me, just for sure. The probability that they don't have the same birthday as me, well, they can have their birthday on any 365 days in the year, let's say we choose randomly. Mine is one of those 365, so if theirs is on any of the 364 others, then that's not the same as me. So the probability that they don't have the same birthday as me, there are 365 days to choose from, and if their birthday lands on any of those 364 other days, then they won't have the same birthday as me. So that's a quite high probability. There's only one out of 365 days which is the same as mine. That's if there's just two people in the group, me and someone else. So the probability that they do have the same birthday as me, P for probability, just be quick, do have the same as me, is what? One minus this. So the probability that other person that does have the same birthday as me is one minus 364 out of 365, whatever that number is, 0.99 something. What if there are three people in the group, me and two others? What's the probability that no one or probability that one person has the same birthday as me, and we'll do the same way? Let's look at the probability that no one has the same birthday as me. Let's say we have a group of three people and we want to find the, we'll go straight to, what's the probability that no one has the same birthday as me as one of those three? How do we calculate that? It requires both of them not to have a birthday on my birthday. Probability that one person doesn't have the birthday on the same day as me. We know it's 364 out of 365. And the probability that the other one is also they can have 364 days to choose from out of 365. So we multiply those probabilities for the probability that two people don't have the same birthday as me. The first person they can choose from or their birthdays can come from any 364 days and it won't be the same as me. And the second person also 364 days and it won't be the same as me. So that's the probability that no one will have the same birthday as me. And from that the probability that someone has the same birthday as me is one minus that. This is that no one does and someone, don't care which one, one of those other two, one minus and we see 364 over 365 to the power of two squared. So we could calculate, given a group of people, what's the probability that someone has the same birthday of that group, anyone in that group has the same birthday as me as one of those people. And we can do it from this perspective of, okay, you look at the chance that no one has the same birthday as me. That is, the first person, their birthday falls on one of the other 364 days and the second person falls on one of those other 364 days and they have the probabilities of them multiplied together and you get the chance that no one does and the probability that one of them does is one minus that. And if there were more people in the group, four, me and three others, and it would be similar, it would be the probability that those three fall on 364 so it would become one minus three to the power of 364 on 365 to the power of three, okay. And it was another person to the power of four and so on. So the chance that any one of you in a group of our 30 students here, any one of you have the same birth date as me is we could calculate as one minus 364 out of 365 to the power of 30 people, okay. So we can calculate that. We'll see why it's important in a moment. This is the chance that anyone has the same birthday as me. Another question is what's the chance that any two people in this room have the same birthday? It's slightly different. Not the same birthday as me but any pair of people have the same birthday. I don't care which day but there is a match or a collision in birthdays. How do you calculate that? So this could be extended to cover N. We will not write it down. I'll show you another formula shortly. But let's do a different problem. Probability that any two people have same birthday. That's what we want to know. Again, in this case don't worry about my birthday but in a group of students here, what's the chance that just some of you, one pair of you have the same birthday? Does anyone know they have the same birthday as someone else? Maybe. Yes, you do. We'll see what that probability is. If we choose random groups, we'll see how we can calculate that probability. Does anyone have the same birthday as me? That probability we'll see is much lower. Of a group of 30, the chance that one of you have the same as me is quite low, in fact. We would plug in 1 to the power of 364, actually we can do it. Let's say instead of 2 here we have 30 here. What is that probability? 364 divided by 365 to the power of 30 is that 1 minus 8%. So there's 8% chance if we have 30 people that one of those people have the same birthday as me. Or as any person, it doesn't have to be as me, as one of the others. So now we're trying something different. The probability that any two people in this group have the same birthday. And we can approach the problem by a similar manner. Look at the probability that no two people have the same birthday. That's the opposite. What's the probability that no two people have the same birthday? Let's try it with a couple of cases. The second one. Let's say that there are, again, two people in the group. Just two people, we want to look at the probability that no two people have the same birthday. So, again, we can think of it. I have my birthday on one day. What's probably the other one doesn't have on my birthday? It's also, we've got 364 days to choose from. 364 days would not be the same as mine. That is, let's say, mine is on one day. The other person, the other one person, there's 364 days out of 365 that don't clash with mine. That don't cause a collision. So in this case, the probability that no one, in a group of two, have the same, let's call this P no, is simply and there's P any, then P any is one minus this. This is the same as before. Well, let's say a different case, three people in the group. There are three people. We want to know what's the chance that any pair have the same birthday? Not the same as mine, but just any one of us have the same birthday as one of the others. Well, I think there are three people now. We would look at the probability that no two people have the same birthday. For example, if my birthday is on one day and there are two other people, what's the chance that this one doesn't have the birthday on the same as me? There's 364 days to choose from. What's the chance that this one doesn't have the same birthday as me or the other one? How many days remaining? So this is on one day. This is on another day. The probability is this. So on a third day, then there are 363 days remaining. So the probability is we multiply these two together and we get the probability of no one is 364 on 365 times 363 on 365. Slightly different from before. In the case that we said the same birthday at Steve, as a particular person, the probability that no one had the same birthday as me was 364. So the second person could be any of those 364 days and the second person could be any of those 364 days. It didn't matter if those two people clashed. That's not what we're asking. We're asking that those two people did not have the same birthday as me, not as each other. But now we're asking make sure that no two people have the same birthday of that set of three. Well, they cannot have the same as me or the other one. And it becomes this. If there were four people in the group, how would you calculate it? The probability that no one has the same birthday. Similar concept. The probability that the other one doesn't have the same as me. There are 364 days to choose from. The other two, there are 363 days that will not collide with either of those two. And other three, there are 362 days will not collide with either of those three. And we multiply them together and that will give you the probability. We said we're ignoring leap years. I'll leave as a homework to calculate with leap years. We'll see that we're not caring about birthdays. We're going to care about hash collisions and there are no leap years involved there. So in this case, the probability that no one would be 364 out of 365, 363 out of 365 times 362 out of 365. And we could extend that for n. It's what? There's some factorial involved, correct? At the top, the numerator is 364 down to 362. So that's some portion of a factorial and the denominator is the 364 related to the power of n minus 1 in this case. So we could find a formula that would express that for any value of n. And what we wanted, though, was the probability that any person, any two people have the same birthday, it's 1 minus all of this. The probability that any two people have the same birthday is 1 minus probability that no two people have the same birthday. And you can find a formula for that, which I'm not going to write down because it's on one of the handouts you have. I think you have it. If not, just scroll forward a few pages, do you? Yes, title authentication. And if you scroll down a bit, you'll see that there's some discussion of this. This is called the birthday paradox or the birthday problem. And in a short moment, will it relate it to hash collisions? You can read through, but the first one we looked at was the probability that someone has the same birthday as Steve, as x in general. That was the first thing. And we can look at it from the perspective of the probability that someone has the same birthday as me is 1 minus the probability that no one has the same birthday as me. And we looked at if there's a group of two people, then it's just 364 out of 365. If there's three people, it's 364 out of 365 squared. And with n people, we can generalize to n and 1 minus that for the probability that someone has the same birthday as me. That was one problem. Then the second one we think about is, well, what's the chance of that group of people, any two people have the same birthday which we just went through and we worked out the probability that no two people have the same birthday. And we went through and you can generalize that and this generalizes it in terms of factorials. And it comes out as this equation, the probability that any two people have the same birthday is the one at the top where we have 365 days to choose from and a group of n people. So why do we care? We'll see that there's something about collisions of birthdays. In one case, it's given someone's birthday, mine, what's the chance that someone else collides with mine? That was the first problem. The second problem was given a group of people, what's the chance that any two people collide? And we're trying to compare, well, what's the difference in terms of the probabilities? So it depends upon n, the size of the group. And this plot shows those probabilities. The blue one is, sorry, the red one is the first case. That is the probability that someone has the same birthday as me. The probability is the red line. The blue one is the probability that any two people in the group have the same birthday. And the group size on the horizontal axis, n here. For example, the way to read this, 30 people in the class, so n is 30, the probability that someone has the same birthday as me is here. It's what, about 8% close to 0.1. So we say the probability that someone in this group, in this class, has the same as me is about 8%, close to 10%. But of this group of people, there's 30 people, the probability that any two of you have the same birthday is up here. Whatever it is, 65%. There's a much higher chance that a pair will have a collision of birthdays. And in fact, we have at least one pair here that we know of in this group of people. If you choose another group of people and look at the average, then you'll see that the average probability hits this calculated value. So, which one's more likely? Same birthday as me or any two people have the same birthday? First or second? The second case is more likely with the same size group, the same set of, the same value of n. Much more likely as n goes up. Or another perspective, what's the probability, or what's the size of the group such that the probability is greater than 50%. How many people do we need in the group such that the chance that any two people have the same birthday is more than 50%. That is, if I had to bet yes or no, more than 50%, I would bet yes. What's the size of the group? On the blue line, 50% comes out about here. It's around 23 people who can work out. Given a group of 23 people, there's about a 50% chance that any two of them will have the same birthday. But if you select one of those people from that group, me for example, the chance that one of the others will have the same birthday as me around 6 or 7%, much, much lower. So this is about collisions of birthdays. And the blue one is sometimes not so obvious. People don't think that. That probability is quite high. It's a very high chance that in our group that we'll have someone with the same birthday. And this logic is used in analyzing the chance of hash collisions. Because with hash collisions, remember with a hash function, we take some message and we produce some hash value. And it's a random mapping. In the same way, assume our birthdays are on random days in a year, then we care about the probability that messages will produce the same hash value, will collide on the hash value. And in the last lecture, we talked about weak collision resistance and strong collision resistance. And weak collision resistance was that given some message, it's hard to find some other message that produces the same hash value. Strong collision resistance is the attacker is allowed to choose any pair of messages to find a collision. If you're given some message, given some birthday, finding the probability of finding a collision is quite low. The red line for the same group size. But if you have the freedom to choose any two inputs, any two people, any two messages, then the probability of getting a collision is much higher. So from the attacker's perspective, the chance of them finding a collision is much higher if they can have the freedom to choose from any two messages than if they have to take one given message and then search for another message that will produce the same hash value. From the attacker's perspective, attacking the strong collision resistance property is easier. Questions. There's a lot of logic involved there, and I'm not expecting everyone to follow everything, but this concept of collisions amongst birthdays and collisions amongst hashes is the same. And we're trying to compare weak and strong collision resistance. With birthdays, it was n out of 365. So we had the parameters n and 365 days. It's with hash values. What do we have? We have the parameters of the hash size, the number of bits in the hash. That is, if we have 128-bit hash value, there are two to the power of 128 possible hash values. So a collision is when two messages produce the same hash value. So what we want to look at is what's the chance that... Well, how many messages do we need to try until we produce the same hash value? So if we want to compare to the birthday problem, instead of 365, it's the hash length. It's a two to the power of 128. And n is how many messages do we need to try until, say, we get a 50% chance of a collision? That's the way we would map it to the hash collision problem. So people asked me last week, well, is it really much different? Well, there's a significant difference. It's much easier to find of a group any two people with the same birthday than it is to find someone with the same birthday as me and the same with hashes. Everyone can answer an exam question on proving which one's easier. Don't worry too much. That's not the point. Just be aware that how strong and weak collision resistance are related. What's the difference between them and which one's easier from the attacker's perspective? If you want to check those calculations, you can read through there. And there are many websites of books that will explain it in even more depth. So coming back to hashes, back to our requirements for hash functions. This is what we want of hash functions. There's some practical requirements and security requirements. By security requirements, I mean they should have those properties if we want to use them for a particular security purpose, for data authentication or digital signatures, for example. Depending upon how we use the hash function, some requirements may or may not be needed. We'll list some shortly. But just recapping, hash function takes a variable size input, produces a fixed, usually small length, small output. Should be easy to calculate. So applying the hash function on the input should be easy. That is fast. And then we have the output should be random. Hash of many different messages should not all produce similar hash values. If the messages are similar, the hash values shouldn't be similar. They should be random output. And the three security properties here, and they've got different names, that is, we think of two different names each. The one-way property is saying that if I give you the hash value, it's hard to find the message. It would take too long to find it. It's also called pre-image-resistant. I would tend to use the names in brackets here because I find them easier to say and easier to relate to what the property is. The second property, weak-collision-resistant. We'd like a hash function to be weak-collision-resistant. That is, it should be hard if the attacker has some message x. It should be hard for them to find some other message that produces a collision. It should be hard for them, the first case of the birthday problem that is hard to find someone who has the same birthday as me. That probability should be low. And the way that we make that probability low is make the hash size large. And the probability is very, very low. Since the probability is low compared to the second one, it's harder for the attacker to find that. And then strong-collision-resistant, simply collision-resistant, is that if the attacker is allowed to choose any pair of messages, x and y, any pair that they like, it should be hard for them to find a pair that produce the same hash value. Hard to produce a collision. That's strong-collision-resistant. And that is easier for the attacker to do compared to weak-collision-resistant attacks. Because they have more freedom to choose those messages than there's a higher probability that they can find a collision. So sometimes we will compare hash functions in terms of strength, and we'll compare based upon how hard it is of each of these properties to be achieved. And it generally relates, as long as the hash algorithm has no weakness, or no known weakness in the algorithm, it depends upon the length of the hash value. I thought I had some numbers here. Yeah, this slide. That is, pre-image-resistant or one-way property and weak-collision-resistant are about the same in performing a brute-force attack. That is, they take about the same amount of effort for the attacker. Whereas the strong-collision-resistant to attack that property takes less effort for the attacker, given the same hash algorithm. And that's captured on this slide. The first two properties, which are pre-image and second pre-image attacks, or the attack on the one-way property, or the attack on the weak-collision-resistant property, basically the attack involves the same thing from the attacker's perspective. You need to find a message y that gives a specific hash value. And how to do that, is to try all possible values or random values of y until you get the right hash value. And the effort required is related if we have an m-bit hash value, so a 128-bit hash value as output, then the effort required is proportional to 2 to the power of m, the number of hash values possible. So if there's a 128-bit hash value, 2 to the power of 128 possible hash values, then the effort required, the number of messages that the attacker must try is about 2 to the power of 128. That is, it needs to apply the hash algorithm 2 to the power of 128 times, which, as long as the hash algorithm is... Most hash algorithms are not fast to compute, which would take forever for most cases. So to defeat the first two properties, it's equivalent to 2 to the power of m, where m is the length of the hash. But a brute-force attack on the strong collision-resistant property, they have the freedom of searching for any two messages that produce the same hash value. And if you look at those equations for the birthday problem or the birthday paradox, you can approximate them to see that to get a collision, it's on the order of 2 to the power of m divided by 2. That is, if m is 128, 128-bit hash value, to defeat the first two properties, attack them, it would take 2 to the power of 128 operations. But to defeat the strong collision-resistant property, it would take just 2 to the power of 64 operations. Much, much faster to attack. That is easier for the attacker, more chance for them to get the solution to find the two messages. So that leads to the requirements on the hash lengths. If we want a hash algorithm that is not subject to all of these brute-force attacks, that is, it's not subject to the strong collision-resistant attack, the weak collision-resistant attack or the one-way property attack, then we need to choose a value of m, the hash length, such that 2 to the power of m divided by 2 will be too many operations to try in a reasonable time. Maybe it's in the order of 100 or 80 or more. That is, the hash value of 256 leads to 2 to the power of 128 in this case, and that's considered, if you do 2 to the power of 128 operations, it would take too long to calculate. But in some cases, we don't need the strong collision-resistant property. It's not needed for all security operations. So in the cases where we only need the first two properties, we need just a hash value such that the length is such that a brute-force attack on those first two properties is not possible. So it depends on how we use the hash algorithm as to whether it's secure for its purpose. And we'll compare in the next few slides the two main hash algorithms, JAR and MD5. But here, there are some other attacks. These are brute-force attacks. There are other attacks that take advantage of the algorithm design, and some are possible in theory, but generally very, very complex, and it's probably just as much, just as easy to do a brute-force attack than apply the cryptanalysis in many hash algorithms. So if we want to defeat these brute-force attacks, we need a hash value long enough. MD5 uses 128 bits. That becomes here 2 to the power of 64. And to do 2 to the power of 64 operations is possible. So a brute-force attack on strong collision-resistant property is possible against MD5. And in fact, people have come up with even better attacks that bring it down to about 2 to the power of 60, 16 times faster than a brute-force. So JAR, the secure hash algorithm, uses longer codes, more bits, and it makes the collision attacks not possible. So let's talk about JAR. Just briefly, MD5 and JAR, just to give the parameters. MD5, message digest algorithm number 5, developed by Ron Revest, who... RSA. RSA, the R in RSA. He developed a number of other cryptographic algorithms. MD5, RC4 is a stream symmetric stream cipher developed by the same guy. I think all three of them may be considered geniuses. Not just him that's done other things. If you look at the history of the other two, Shamir and Adelman, they've also done a lot as well. But yet, Ron Revest has created many different algorithms used for different purposes in security. MD5 produces 128-bit hash value. It is still widely used. You still see it used in different applications. In password files, in some cases, that is the storage of passwords. Passwords are not stored in the clear on your computer. They're stored usually as a hash of a password. Or if you develop a website and you need to store the user's passwords, you don't store them in the database in the clear. You should apply some algorithm. Usually you take a hash of the password with some random number and store that value. And MD5 is still used by some people, but it's considered insecure for most purposes. It's no longer recommended to be used. There are some known attacks against it that make it possible to defeat MD5. So the secure hash algorithm was developed, and it's gone through different versions. There's SHA0, SHA1, SHA2, SHA3. And generally, SHA2 and 3 are considered secure. 0 and 1 not. There are some known attacks against them. So SHA2 is commonly used. SHA1 also is still used. SHA3 is quite new, so it's not so widely used at the moment. Actually, it's not still in development. The competition was run and someone won the competition, and it is actually a standard now. SHA1 in this table, the message digest size is the hash size. The other rows in the table are not so important. The parameters of the algorithm, the way that it calculates, but the hash size, SHA1, 160 bits. SHA2, actually, you can choose the hash length. 224, 256, 384, and 512. You can have different hash lengths with SHA2. So you often, I think, see SHA256 used. And that's considered strong collision resistant. With 256 bits in the hash value, that needs two to the power of 128 operations to defeat the strong collision resistance property in a brute force attack. There are other hash algorithms, but MD5 and SHA are the main ones that we'll see around. And just coming back, I think we skipped over one slide, and we're almost done. Hash algorithms are used in different security mechanisms. And depending upon the mechanism that they use for, the requirements of those three properties differ. So the three properties, pre-image resistant is the one-way property. Second pre-image resistant is the weak collision resistant property. And collision resistant is the strong collision resistant property. So if we use the hash algorithm for a digital signature, we would generally like that hash algorithm to have all those three properties. We'd like to have all those three properties. This one is under certain conditions, but we're generally like an algorithm such that an attacker cannot do a brute force attack against any of those three properties. But hash algorithms are used for other security purposes. Sometimes they're used for intrusion or virus detection. So your antivirus software uses hashes to compare files and do checks. They can be used with symmetric key encryption. We saw some examples where we combined the hash of the message, and then we encrypted it using a symmetric key cipher and a shared secret key. With that case, those three properties are not important. So the properties, the requirements differ depending on how we use the hash algorithm. So MD5 could be used here, and it would be okay. They use to store passwords, and you should be aware of that if you develop web applications, especially. You store someone's password in a database. You take a hash of the password and store the hash value in the database. Or even better, you take a hash of the password combined with a random number, a salt, and store that in the database. So a password file or a password database stores the hash value. For that to work, the one-way property should hold. It should be hard for someone to take the hash value and find the original password. The other two properties are not so important for that application. And for message authentication codes, we can convert a hash algorithm into a message authentication code quite easily. And a way to do that is called HMAC. And for that, again, the three properties are required. So we sometimes would say a hash function is weak if it satisfies these two properties plus the others. It is a strong hash function if it satisfies all three security properties plus the other practical properties. So people compare those hash functions based upon the properties it satisfies. How do we go? Did we get to the end? Any questions? We haven't looked at how the algorithms work. What does MD5 do to calculate the hash? And same with SHA. What these SHA does is... So what have we got? Message digest size is the hash output. The message size, the input, is limited. It must be less than this size. So we say any size input, well, there is an upper limit, but you see these upper limits are quite high. Your message must be less than 2 to the power of 64 bits, which is very long. The block size, they operate sort of in rounds in the similar to our block ciphers. We apply some algorithm and then repeat it and repeat it and repeat it. So in each case they operate... If we have a long message, first that message is split into blocks. So the block size specifies how long, 512 bits, for example. Word, I think, that is again split up into words or smaller blocks. And it's repeated multiple times. And the number of steps is like the number of rounds, the same concept in DES and others. The number of times we repeat that algorithm to produce the output. Is there a simplified SHA for you to study? I think the algorithms are not so complex that you can understand them. MD5 is you can actually study it and see how it works. It won't take long, but I don't have it here and I haven't looked at it for a long, long time. So go and study MD5 and SHA and other hash algorithms as well. Any other questions to finish up on cryptographic hash functions? I'm just going backwards to see if I've missed some things. So we've looked today really summarized about the requirements. So you'll see in your quiz there are some questions about those three security properties, especially be aware of what the difference between them is. You don't need to prove why one's easier than the other, but you should be aware that attacking the strong collision resistant property is easier than attacking the weak collision resistant property. And it's nice to understand why. Any questions before we move on?