 look at the the effort needed to break those properties. Sometimes we use different notation. We said one-way property and collision-free property but to be more precise we'll introduce some new notation and we'll talk about a pre-image. So we know a hash function uppercase h takes some input x and produces a hash value h we will call x the pre-image of h. x is the input but another name is the pre-image. We know h maps many pre-images to a particular hash value h. We know in theory there can be collisions especially because we have more possible inputs than outputs. So we know that there is a mapping such that multiple pre-images map to the same h hash value. We define a collision as occurring if for two different pre-images x and y two different messages the hash values are the same. So that's what we understand as a collision. We don't want collisions or we don't want it to be easy for the attacker to find collisions. We know there will be collisions. We've done this before we looked at if there was a 1000 bit message and a 20 bit hash value then there are 2 to the power of 980 messages that map to that same hash value and this is just the general formula here. Each hash value has on average 2 to the power of b minus n pre-images. So we've done an example of that. Let's get to the definitions. So this lists the requirements more precisely than our original two properties of cryptographic hash functions especially for digital signatures and other purposes. The input can be variable size. The message can be any length. The output is fixed length. We've said this before and usually small. So usually much smaller than the input. It should be easy to compute the hash value. My input is a 600 megabyte file. I'm applying the hash algorithm on that file. How long does it take? A couple of seconds to apply the hash function on that large input to produce a fixed length output. So it's easy to calculate the hash function. If it took me 10 hours to calculate that then it's not so convenient. So that's just for convenience purposes. So those are some practical requirements. From a security perspective we'll define three properties here. One's called pre-image resistant, then a variation second pre-image resistant and then collision resistant. They sometimes have different names. Pre-image resistant is what we know as the one-way property. The one-way property given the hash value you can't find the original message. For any given hash value h it's practically impossible or computationally infeasible to find the pre-image y. That is the property pre-image resistant, the hash algorithm is resistant to finding pre-images. It's not easy for the attacker to find them. So that one we've talked about before given the hash value you can't find the input y. So it has two different names. One-way property or a more formal name pre-image resistant. The next two we think sometimes are similar to what we talked about for collisions but they slide variations. Second pre-image resistant is also called weak collision resistant and the last one collision resistant is sometimes called strong collision resistant. So they have different names. We'll see why in a moment. The last two are similar in the definition but in fact the first two are similar in terms of what the attacker needs to do to defeat them. The second one, second pre-image resistant. Given a message x it should be computationally infeasible to find another message y which is different such that their hash values are the same. That is I give you a message x. You know the hash value of x. It's a hash of x. It should be hard for you to go and find another message y that has the same hash value as mine. So this is we don't want collisions. The last property here collision resistant is a stronger form of that. It says it should be computationally infeasible for you to be able to choose any pair of messages x and y, any two pre-images such that there's a collision. So there's a slight difference between those last two properties. The first one is I give you a message x. Your challenge. Find another message with the same hash value. The second one is go find any two messages with the same hash value. So understand the differences between them. They both about finding collisions. So that's why we call them weak collision resistant and strong collision resistant. It should be hard for you. Which one will be easier for you to do? The first challenge I give you is here's a message x. Go find another message with the same hash value. The second challenge I give you is go find any two messages with the same hash value. Which challenge are you going to choose because it's easiest? I'll have a vote. You've got two choices. Challenge one and challenge two. Challenge one, find this another message y with the same hash value as this given x. Challenge two, find any two messages x and y with the same hash value. You choose the easier one. Hands up for challenge one. Who wants to take challenge one? Don't be shy. Hands up for challenge two. Who wants to take challenge two? Why not? What's wrong with challenge two? Not find all possible. Find any pair. Here we need to find a particular value. Here we need to find any value. Any match. Finding any match is easier than finding a particular one. The second challenge is easier from the attacker's perspective. The first challenge says that here you must find a match for this particular message. You're restricted to find another message that gives this particular hash value. In the second challenge you can go search through any messages and as long as you find two that produce the same hash value, you win. The second one is easier from the attacker's perspective. That gets a bit confusing. The first thing is try to understand the difference between the definitions of the properties. The next thing is which one is easier for the attacker. Collision resistance or strong collision resistant, it's easier for the attacker to defeat this, to find any two messages with the same hash value. We'll come back to them in a moment. Some hash functions are designed. Some have the first two properties. Some also have the third property. They have different purposes. Depends on where you want to use that hash function. For digital signatures we normally need all three properties. They're under some cases we don't need the third one. That's the strongest requirement. For storing passwords, collision resistance is not an issue. We don't care about collisions. We care about that given the hash of the password you can't come back and find the original password. The first property is important. When we use hash functions for max, similar to digital signatures we need all properties. For other things like detection of viruses and so on, we often use hash functions that have, well they only need the middle property. When we use symmetric key encryption in the hash, the properties are not so important because we combine it with symmetric key encryption. Some hash algorithms will be designed with different properties. It's harder to design an algorithm that is collision resistant compared to the first two properties. Just coming back to summarize, three security properties there. We want the one-way property. It's hard to go backwards. We want the weak collision resistance property. It's hard for you to go find another message with the same hash value as a given one. And sometimes we also want the strong collision resistant property. It's hard for you, the attacker, to go and find any pair of messages with the same hash value. Depending upon where we use the hash function as to what requirements are important, what about attacks? Well the first two properties, the one-way property and the weak collision resistant property, from the attacker's perspective they use about the same approach. They do a brute force attack and they need to really try all possible values of a message until they get the same hash value. And the amount of effort it takes depends upon the hash length. If we have an m-bit hash value, they need 2 to the power of m operations to find that. So to defeat a hash function with respect to the first two properties, the effort required is 2 to the power of m. The strong collision resistant property, or simply the collision resistant property, what the attacker has to do is easier. The attacker has to find any two messages that have the same hash value. And it turns out it takes, if we have a hash value of length m, it takes 2 to the power of m divided by 2 operations to defeat that. So if we have a hash value of our last picture, if we have a hash value where the length is say 128 bits for the one-way property, the attacker, to defeat that property, takes approximately 2 to the power of 128 operations. To defeat the weak collision resistant property, it's about the same. Take the same number of operations. But to defeat the strong collision resistant property, the attacker would take 2 to the power of 64 operations, meaning it would be easier for the attacker to defeat that property. So given a particular hash function, if it aims to provide these properties, then it depends upon the hash length. And in this case, 2 to the power of 64 operations is not many. It's possible. So we may say this hash function provides the one-way property and the weak collision resistant property, but will not provide the strong collision resistant property because the attacker can defeat it. So if we increase the hash length up to say 256 bits, m is 256, instead like char 256, to defeat the one-way property, you need this many operations. Not possible. Too many. Same with weak collision resistance. Too many takes too long. The brute force attack. Strong collision resistance, 2 to the power of 128. Still too many. So we can say such a hash algorithm would have all three properties. It's easier to defeat the strong collision resistant property or it's harder to provide that property. To understand why the strong collision resistant property is easier from the attacker's perspective, you can study about the birthday paradox, which is what's the probability that someone in this room has the same birthday, not year, but same birthday a month as me? So find another person in the room that has the same birthday as me versus what's the probability that any two people in the room have the same birthday? Well, the second one, there's a much higher probability that I'll find two people in the room that have the same birthday, much higher chance than someone having the same birthday as me. So it's a similar concept and it's referred to as the birthday paradox.