 resistance. So first a word about the ideal primitive model. So we use it for proving security of hash functions or compression functions that depend on a smaller primitive. So we consider an information theoretic adversary who can query the smaller primitive and we just count the number of queries they make in order to break the constructions. They find a collision. So here the adversary is an efficient theoretic. The only problem is the randomness that comes out of these queries. So we can't predict what's going to happen. So we can take a look at an example. So here's for example a compression function that using a smaller primitive. So for me f will be random n bit to n bit permutation. So that's my smaller primitive. The domain is over there on the left. It goes from left to right. So this domain has size 2n. And so there's these x's and the output over there is n bit long. So all the wires on this picture have length n bit. So if the adversary makes a query to f, in this case the adversary is going to learn actually two to the n different inputs by making a single query to f. Going to learn how to evaluate two to the n different inputs because for every query it makes to f, that red value there it makes to f, there's two to the n different possible input values for the wires that will x or to that red value. So with one query it can learn to evaluate two to the n inputs. With two queries it can learn to evaluate two times two to the n inputs. But the total number of outputs is two to the n. So it already has with just two query more inputs that it knows how to evaluate than there are outputs. So what a pigeonhole principle it has collisions. It has a lot of collisions already. So this compression function can be broken in just two queries. So okay let's think about maybe repairing this compression function by making doing something more complicated. So I'll replace the x's or by arbitrary functions g1 and g2 that maybe they do some field multiplication or something a little bit weirder and hope for the best. So actually now that I'm not doing an x or I'll move that bottom wire over there. Where's the pointer? Okay yeah I'll move this bottom wire to the top. So the two n bits of input are up there. So again so for each input value is associated to some particular g1 output. And so the set of inputs, the set of all inputs which I like to draw as an oval is partitioned into two to the n different families according to their g1 output according to the value that they map to here. And when I make a query it's like basically selecting a slice from that oval when I make a specific query to f. So since I have two to the n families and the oval a size to the two n the average size of a family is two to the n actually to the two n divided by two to the n. And so by making one by choosing the biggest slice in this oval I can actually learn at least two to the n inputs, right? The biggest slice of size at least two to the n and by making two such greedy queries choosing the two biggest slices in the oval I can learn two times to the n inputs. So I can also break this compression function as two queries regardless of what g1 and g2 are. So one of the adversaries information theoretic for him g1 and g2 pose no mystery at all. I mean they could be horribly complicated for common mortal but for the adversary you can compute pre-images and the sizes of these layers and everything no problem. So we're never going to prove a good IPM security for compression function with these parameters. This is what we've learned here. So ideally when we design a compression function we like to reach something like birthday security and what we learn is that sometimes we're just not going to get anywhere near birthday security. This is the most general design you could imagine for two n bits to n bits using a single n bit primitive of n bit domain and you're just never going to get birthday security. So we would like to understand better when can we get birthday security when can we not. Before discussing the results or the conjectures I have to define the general model. So here's the general model. So we consider maybe a compression function that you call several different primitives or the same primitives several times. It doesn't matter. So there's different variables involved. So first of all here's s the output length of the compression function. Then I have m plus s over there is the input length. So if this is a compression function the input length should be bigger than the output length. So m should be some positive integer. And then there's the last variable is the number of times you call primitive which is r and finally n is the input length of our primitives. We assume all the primitives of the same input length. That's basically the only limitation that doesn't make it completely general is that all the primitives of the same input length. And so actually to make it completely general I have to for instance the function in G4 that decides the final output needs the output of F1 and needs the output of F2. So this would make a completely general drawing and this is what we consider. So we have a conjecture. I mean there is a conjecture about what is the best possible collision resistance of this thing because we know it's not always birthday. The conjecture is this. So it says the minimum of birthday and it's other funny blue term. So factor of r don't pay any attention to it. This 2 to the s over 2 accounts for a birthday attack. We know we're never going to get better collision security in birthday attack. But we know sometimes it's worse. And the conjecture is that it's worse by this funny term over here. If this guy is less than this guy then we're never going to get past this guy. So this was a conjecture by STAM in crypto 2008 I think. So now this is not a conjecture anymore actually. So this is now a theorem by these illustrious people. So actually let's do an example of this conjecture. So like put in some parameters. So here I chose parameters from a real world compression function, the JH compression function. So the input wire is over here on the left. There's 1.5 n bits of input. Every wire on the drawing is length n over 2. 1.5 n bits of input and n bits of output. So s the output length is equal to n. m the extra amount of input length is equal to n over 2. There's n over 2 extra bits of input. And r there's only one call to a primitive. So r equals 1. And STAM's conjecture or now whatever theorem would give us the following bound for collision resistance. There's 2 to the nr whatever this funny exponent here. So we compute it nr r equals 1 n minus 0.5 n. The 0.5 n is the m. And the 2 is the r plus 1. And you get 2 to the n over 4, which is less than birthday. Birthday would be n over 2 here because the output length is n. So this is a case where it's saying you're never going to get to birthday. Actually you're never going to get better than 2 to the n over 4 no matter what you do. And in this particular compression function indeed there is a 2 to the n over 4 collision attack. So if you look at the compression function a little bit you'll see that it's enough to find a collision on the top wire. Once you find a collision on the top wire you're all set. If I find a collision on the top wire by making queries to f then I can always adjust the bottom wire using this feed forward x-word to be whatever I want. I can always adjust the bottom wire to be whatever value I want and I can just compensate with this value back here. So here it's enough to get a collision on the top wire. The top wire is length n over 2. So it's birthday on n over 2 bits and this is where this 2 to the n over 4 is coming out. But the conjecture says that no matter what compression function you design with these parameters, this simply length is output length and one call to an n bit primitive you're never going to get better than 2 to the n over 4 and that's what we would like to prove. So actually I should say something else about this picture here. Once we get a collision we actually get 2 to the n over 2 collisions all at once. Because once they get a collision on the top wire it can really do whatever I want on the bottom wire. So I can get a collision with a certain value on the bottom wire is 0 to the n, value on the bottom wire 1 to the n, whatever I want. There's 2 to the n over 2 values I can put here. So once I get 1 collision all of a sudden I'm getting like exponentially many collisions and we also prove in our result we also prove this phenomenon is always going to happen. So not only at a certain number of queries you're going to get a collision you're going to get actually these exponentially many collisions. So here's the main theorem with this q equals so some constant term times this the bound of Stam. We're not going to get just one collision we're going to get this 2 to the 2 times so look at the exponent it's a funny it's quite cute. The exponent up here is the difference between the birthday cost and the Stam cost. So when the Stam bound is much lower than the birthday bound then you get this big threshold phenomenon. So the further the Stam bound is beneath the birthday bound the more dramatic this threshold is. The more when you do get a collision the more suddenly you get like these huge number of collisions okay. It's sort of this non-uniformity thing. So I should say one word about how do I count collisions. So normally the standard way of counting collisions is pairwise. So I just count the number of pairs that collide. But here the way we count collisions is if I have a function from a domain to a range number of collisions is the number of points in the domain that collide with something else in the domain. So here before collisions instead of for choose 2. An elementary observation I can make about this accounting. So this accounting is actually stricter than the pairwise. So this is always a lower bound for pairwise. Pairwise collisions can be much more okay. So we're being stricter on ourselves by doing this way of counting collisions. Is that this type of, this number of collisions is always bigger than the size of the domain minus the size of the range. So because the number of guys in the domain who don't collide with anything, every guy in the domain who doesn't collide with anything he takes up, he eats up one element of the range onto himself. So there's only range of many such elements in the domain that don't collide with anything. So always we have this elementary lower bound on the number of collisions in a function. Okay. So I describe the key lemma now and then onto the proof sketch that uses the key lemma. So here's a function from some finite domain to some finite range. I've cut up the domain into layers like I cut up the, in one of the first pictures. And so basically the lemma says something like this. We're going to select randomly some number of these layers. And the flavor of the lemma is saying that the number of inputs that collide after you've done your selection, so I'm restricting myself to the blue layer, is the number of left over colliding inputs is close to its expectation with high probability. Some kind of thing like that. But so let's actually, that's not so meaningful. Let's actually discuss some numbers. So let's see the original number of colliding inputs in F. So the original number of points in this oval that collide under F. Let K be the number of layers in the egg. Q be the number of layers to be randomly selected. So the number of blue layers. Here I have a variable P, which is Q over K, the probability of a layer being selected. So I don't have too many variables on the board. I'm going to take out Q and K and just leave P. Probability of a layer being selected. And so the, so here's, I mean, an elementary observation is that the expected number of collisions, I mean, after I do my selection, I can estimate it very roughly as P squared C. So if I'm an element of the domain who's colliding with somebody, then in order to survive and still be a colliding element after the selection process, first of all, my layer has to be selected. I have to be in a blue layer. That happens with probability P. And the person I'm colliding with, his layer or her layer, okay, also has to be selected. And that's also with probability P. All right? So this is where this P square is coming in from. All right? So the lemma is basically saying that you will get these P square C collisions, but there's a little, there's always a technicality. I mean, these things. So here we need this P square C to be bigger than the largest size of a layer. If there are very big layers, bad things can happen. So imagine that there's one layer that takes up almost a whole oval. The whole bubble is taken up by one layer almost. Then everything hinges on whether that layer is selected. Okay? So maybe you get no collisions with very high probability and just with very low probability astronomical number of colliders. So you don't want, you want the layers to be sort of like not have two big layers. So once this expectation reaches the size of the largest layer, then this, this lemma is good to go. So now let's do a proof sketch. So we just stay simple, not do the general case. So for r equals one, I'm going to prove stamps bound for this guy. So this is actually the parameters here for this compression function are those of the j hash, the jh I showed before. The domain length is 1.5 and the output length is n, one call to an n bit primitive. So this is actually a generalization of jh. We'll show it can never go past 2 to the n over 4 collision resistance. So here's my domain that I like to draw as an oval, 1.5 in bits. It is again naturally divided into 2 to the n over 2 families or layers according to the g1 value. So every guy up here maps to a certain g1 value. There's 2 to the possible, 2 to the n possible g1 outputs. So 2 to the n layers. So selecting a query to f is like selecting one of these layers basically, same thing. And I note that the average layer size here is 2 to the 1.5 n divided by 2 to the n, 2 to the n over 2. That's my average layer size is 2 to the n over 2. Make simplifying assumption for the talk that each layer actually has size 2 to the n over 2. It's not a big deal assumption. I mean, it's very easy to get to reduce to this actually, but we'll skip that. Okay, so the basic game plan is I review this as a compression function for my lemma. Like, here's the compression function, the oval with the layer is down to here. I have 2 to the n over 4 queries to make. I'm the adversary. I want to win. I want to get a collision. I have 2 to the n over 4 queries to make to this f. And basic strategy, I have nothing smart to do. I'm just going to make them randomly. I'm going to select 2 to the n over 4 different random values here and hope to get collisions or maybe even exponentially many collisions, right? So here's the blue layers I'm going to select. So selecting 2 to the n over 4 different guys here is like selecting 2 to the n over 4 different layers in the egg. So let's do the accounting for the lemma, the various variables that are involved. So first of all, the number of collisions that we start with. Okay, the number of collisions that we start with before is a selection. So there's domain, size of domain minus size of range. The domain is 2 to the 1.5 n. The range is tiny, 2 to the n. Basically, everybody's colliding. The domain is so much larger than the range, almost everybody's colliding. So we basically have 2 to the 1.5 n collisions when we start, okay? Now the number of layers is 2 to the n, the number of queries is 2 to the n over 4. The probability, therefore, of a layer being queried is 2 to the minus 3 fourth n, that's this p. And then the important quantity is p squared c. p squared c, unfortunately, turns out to be 2 to the 0. If you look at it, you know 1.5 and minus 3 fourths, this one is squared, 2 to the 0. And this is much less than the max layer size, which is 2 to the n. The lemma doesn't apply. The lemma totally breaks down. So actually what's going on is this lemma is very bad when it gives very poor results when the domain is much, much larger than the range. And like in this case, so why is the lemma not really giving us what we want? When the domain is much, much larger than the range, the person that's a colliding input in the domain is actually the average colliding input in the domain. It's colliding with many, many other people. And what determines whether that element becomes a colliding input after the selection of blue layers is just whether it gets selected as a blue layer. If its layer gets selected and colored in blue. If that happens, then it's colliding with so many other people that with very high probability one of its buddies is also going to be selected. So this p squared is too drastic. It's really more like p. So the lemma is not working. But we can play a trick to reduce to a case where the lemma works. So here's the trick. I'm going to pre-select a ground set inside the oval of 2 to the n plus 1 over 2 layers. And then so I do that deterministically. I don't make the queries for those. I view that as my new domain. This pink region is my new domain. And then within that I select my q layers among those. So it looks like something stupid, but it's going to help us. So with this new pink domain, the size of the domain, it turns out to be 2 to the n plus 1. Number of guys times average size of guy 2 to the n plus 1. I selected actually this number of layers so that the new pink domain would be just about the same size as the range. The range is size 2 to the n. And this pink domain has size 2 to the n plus 1. So I have a lot of collisions, but it's just a little bit bigger. I don't have these big multi-collisions. And now we can do the lemma again. So we get 2 to the n collisions. This is the size of the domain, the pink domain minus the range, 2 to the n collisions. This is fewer, this is less than before. We had 2 to the 1.5 n collisions before. And then the k is, so the k has changed. It's smaller. The q is the same. And the p is now q over k has become bigger. 2 to the minus n over 4 instead of 2 to the minus 3n over 4. Because k, the number of sets we're selecting from is smaller. And so p square c has become, actually c has become smaller by a factor 2 to the n over 2. And p has become bigger by a factor 2 to the n over 2. But p is squared. So it's working to our advantage. And we get now p square c is 2 to the n over 2, exactly this max layer size. Exactly the condition we need to get the lemma going. Can apply the lemma. We get, OK, we're happy. We win. So that's that. I think I'm out of time. And so I'll skip this, but I'll just put it. And thank you.