 The next and last part of the session is the iterated random function problem and the authors are Ritan Baumig, Niranjan Dutta, Avijit Dutta, Niki Mouha, and Mojiro Dandi. We noticed that the authors of the previous paper are also here, but this time they were so clever to get more of us. And Niki went with the talk. Yeah, yeah, thank you. So I'm the only person standing between you and the coffee break. So I'm asking you to just hold on for a little bit and maybe not use the espresso machine during my presentation, unless of course it's absolutely necessary. So what's this presentation about? Let's say, just to start introduction, we have a random function. So a function shows uniformly a random from a particular domain and range. And we also would like to consider an iterated random function. So here we have a random function that we've iterated our times. Now, the question I want to ask ourselves is, so in a black box way, so you cannot look inside, these are just objects where you can give inputs and look at the output. How are these two constructions different? Just to give a simple argument of how they are definitely not the same. If you look at the random function, then here you can see with a certain number of distinct inputs, what approximately the collision probability will be, so the chance of having two different inputs that lead to the same output. For this iterated random function, the collision probability will be approximately r times higher. So if you want to see this intuitively, you could say, I have some inputs, I apply the random function once. If I don't have a collision, I can still try again in the next iteration and see if maybe then I am colliding. So clearly we have a different collision probability for these two constructions, so they are not the same. I mean some people say, hey, a random function just applying it a few times is still random, but here's clearly an example that it is not. Now, in the previous slide what I showed is non-adaptive collision-finding attack. So we have a certain number of inputs, in this case, Q inputs. We apply the function, we get the outputs, and then of the outputs we check if there are two that are identical, so that would indicate a collision. Now, why do we restrict ourselves to this? What about adaptive collision-finding attacks? So it could be that there is one output, like in this case Y2, that is somehow being used to compute the next input. And then we may also look at the output and see if there is a collision somewhere. Also, why would we restrict ourselves to adaptive collision-finding attacks? We could look at any type of attack that you could do. So maybe there is, instead of a collision, some type of multi-collision or short cycle, or some other type of property that we may find. So then, in this setting, the best-known attack, so looking at random function or random function has been iterated, the best-known attack is this non-adaptive collision-finding attack that comes from the first slide. Just choose a set of distinct inputs, apply the function, and see if there is a collision in the output. But the best-known bound, which is a simple application of CBCMAC, so it takes any distinguishing attack of the bound, gives results where, as you can see, there is still a gap. So you have a factor R for the best-known attack, and the factor R squares for the best-known bound. So that would indicate that maybe we can find a better attack, or maybe we can prove a better bound to close this gap. So that's something we will look into in this paper, and it's something that we can do, or it probably wouldn't be accepted. So what we're going to do is look at the best-known, I stole Gregor's joke from yesterday, it's still working. So this is something we will do, we will look at improving the bound. It's too bad for people who actually like attacks and would like to find a better attack, because that's not something that will be possible. So let's give a bit more about the model that will be used. This is very basic, but it's really important to understand what exactly we're doing here, because when the model is different, of course the results are different as well. So we're looking at an adversary that will typically tie to these twingers between two worlds. One is usually called the real world, and the other the ideal world. So for any adversary that makes at most Q queries, so you'll be limited in how much of a function you can see, you can only evaluate it on a certain number of inputs. We want to bounce the advantage. So the last line of formula is given for the advantage to distinguish between these two worlds. Now, important to notice that the framework that we're looking at is an information theoretic framework. So these objects that you have, these random functions, they're not deterministic algorithms, but they are statistical objects. So this will mean that the output of this object is not known until you've queried it. So in the same sense that you don't know what the result is going to be at the dice roll until you actually throw it. This allows us to look at computationally unbounded adversaries, because there's no limit in the amount of computation you can do. The statistical randomness is what allows you to get a security result. So then the technique that we will use is, again as a last presentation, the H-coefficient technique by Padaian. So in this technique, you look at transcripts. What a transcript is, is a queue tuple of input and output pairs to the oracle. And important if you want this technique to actually work is that you partition your transcripts into two different sets, which are then typically called, I mean, this real world, ideal world, you get goods transcripts and bad transcripts. It's a bit of an arbitrary definition. Somehow you have to define how you're going to separate your transcripts. And that allows you to apply the technique. Now, the real application of the technique comes at the end of the presentation, but I think it can help to get some understanding of what the technique actually does with a much simpler example. So perhaps the simplest example where you could apply this H-coefficient technique to is the random function, random permutation, switching level, PRP direct switching level. So then what you have is a random function and a random permutation. And you want to see what is your advantage with a certain number of queries to distinguish between these two. Now, interestingly, this problem is not so trivial because, for example, the first proofs that were given of this switching level, the standards prove basically, did contain an error as Kono pointed out. So this is not to say, because I've heard that some people misunderstand this comment, that some types of techniques to prove security results are better than other techniques. That's not the point here. But it's just to show that it's not trivial to completely trivial to look into this result. It may be useful to reevaluate this type of result with much more powerful thinking, just to have an understanding of how it works. So, for PRP-PRF, we could say the random function is the real world and then the random permutation is the ideal world. A natural way to define what good transcripts are is that you could say that a good transcript means that all of the outputs are distinct. So that would mean that you don't see any collisions in there. Then here you see the lemma of the H-coefficient technique. It's important to see that we have this epsilon-1 and epsilon-2 that we would like to compute. When we have these, the advantages bounded by the sum of both of them, I see we've already lost one person. For the PRP-PRF switching lemma, for those interested, when the full slides are online, you'll see the derivation in the backup slides. You have a term epsilon-1 that is non-zero, a term epsilon-2 that is zero because in the ideal world there are no bad transcripts. Also interestingly, we're not actually calculating a collision probability here. We're looking at the probability of having a certain transcript in the real world and a certain transcript in the ideal world and because the fraction of those is smaller than one, we have this term epsilon-1 that appears. But it's actually quite arbitrary that we choose the ideal world to be random function and real-world random function and ideal-world random permutation because if you would do it the other way around, then you get an alternative way of proving the same result using the H-coefficient technique. But now what happens is that the values of epsilon-2 and epsilon-1 get swapped. So now epsilon-1 is the zero one and epsilon-2 is the non-zero one. And now it can happen that in the ideal world, which is now the random function, we have some bad transcripts, namely transcripts where the outputs are colliding. Okay, so let's get back again to the iterated random function problem, but forget for the slides that follow iterations. What we're going to do now does not involve any iteration. We're just going to look at random functions. What we're interested in is what are all the successful collision attacks that you can do on a random function with one or two trails? So we can clear where the trail is. What do they look like? So what can we do? We could look at a single trail attack, so at one starting point. And from this starting point, we can continue applying the random function. And then at some moment it may happen that we enter into a cycle. Okay, at this moment, that's the moment when we have found a collision because there are two distinct inputs that give the same output. Now what we can do is fix the length of the tail and fix the length of the cycle. So that's not something you would usually do, but that's what helps in the later derivation of the proof. In that case, you can bounce the probability of having a collision with these specific parameters. So if your number of queries is smaller than t plus c, then you will have a probability of zero because you haven't made enough queries for this fixed value, this fixed pair of t and c to actually find the collision. But as soon as q is larger than or equal to t plus c, then you will have this term that can be used to bounce. It'll actually use a second bound as well, which has some parameter alpha in there that is some real number. There's also a bound related to t and c involving this alpha. So basically later at some step in the proof, we will fix this alpha with a specific value that is convenient to derive the result. This is one example. Another thing that can happen is that you have an attack starting with two trails. And in that case, you can see here, oh, at this moment we found a collision. So this is what we call a lambda collision. The t1 and t2 are the foot lengths of the lambda. Again, they are fixed. And this allows us in a similar way to also bound collision probability with these two fixed values of t1 and t2. But we could actually continue because it's going to be interesting as soon as, not now but later, we will look at iterated random functions that we might not see this collision, but we might see a later one. Then here we can still find the collision, a second one which we call lambda rho double collision. Here's some imagination series picture here, where t1 and t2 are the foot lengths. Delta t is the intervening length. Again, c is the cycle length. Again, all these parameters are ones that we will fix to bound probability of a successful collision attack. And actually, again, we will derive two bounds, one with and one without the value of alpha that will fix the convenient value. Actually, there's a variety of this as well because if you look at this construction, when you continue on for your next collision, now it's getting back into the joint part of the two trails, but it could also go back to one of the two feet of your lambda. So in a different way of writing this would be this picture. This is another way in which you could find two collisions if you're starting with two starting points, two trails. So this would be then what we call rho prime double collision that we need to consider. There's actually another degenerate case as well where you have a three-fold collision going here back into the main meeting point. So then you have three different inputs with the same output, but that's also something we consider as a degenerate case of one of the previous ones. So having explained a bit what the type of things are that you can find as a collision attack with one or two trails, now we are looking at iterated random functions. And we're going to look at what successful collision attacks are with n trails. So now you won't actually see all of the output because some of them will be in turn. So what happens is I'm jumping, let's say r is equal to 2 and iterating twice. I'm jumping by two calls. So this dot that is not black is the one that you don't see going around. At this moment what will happen is I'm now going to do another call of 2 times f and I'm only going to see the output. There has been a collision on the underlying function, but you cannot see it in the output because that's not a value that you've output. However, if you do continue around the cycle, then after going around once, you will find the collision on the iterated function. Similarly, if you look at a two-trail attack, then let's say now we start from x2, we applied a function twice, again choosing r equal to 2. Now starting the trail from x1, here again x2, now from x1 I'm continuing. There has been a collision on the underlying function, but you don't see it yet. However, if we continue now, so I'm now only going to look at the one that started from x2. If I'm going around the cycle, then at this moment I will find a collision with the other trail starting from the other point. So these are the types of attacks that we need to calculate bounds on to be used in the final proof. So here you see bounds that we obtained for the single-trail attack, then again for the two-trail attack, and also generalized, we can see that because when you have a collision, that will mean that the two inputs that lead to the same output must be either coming from the same trail or must be coming from different trails. So there is a way of also taking an m-trail attack into account. So what would happen if an attacker would take m different starting points and bounds what collision probability would be of this? Now having done all these calculations of attacks, because as I said this presentation is not about attacks, it's about proving a better security bound. Those are, however, the attacks that we can use inside the proof that will get us the better security bound. So now for the first time we will look at the advantage of an adversary to distinguish a random function from an iterated random function, so at bounds not at attacks. We're going to look at how to construct this bound. So again, what we're going to do is apply the H coefficient technique. Here we choose the real oracle to be the iterated random function and the ideal oracle will be a random function to distinguish in a black box way between these two. And we define our good transcripts as transcripts that must satisfy two properties. One property is that all outputs are distinct. Now this will mean intuitively, I mean we worked on some more detail in the paper, that there would not have been some successful m-trail attack present somewhere, because clearly if that's the case then you don't have all outputs distinct. You've somehow found a collision. And also that there are no permutation cycles. So permutation cycle means that when you are going to start from a certain trail that you end up to the starting point again, that you have a cycle going back to the initial point. So these two things must be satisfied to have a good transcript. So let's see what it means in practice. Here we have some transcript that we found with m-starting points and we start, okay, at this moment we see that we have collisions so this is not a good transcript. Let's look at another one. So at this moment this is a permutation cycle so that's again something we consider to be a bad transcript. So we'll get another transcript. In this case this is this lambda situation that we have two trails merging. So again a collision so a bad transcript. What a good transcript will look like is for example this one. So basically a set of lines with a certain number of starting points. Interestingly this is something that will define an isomorphism of transcripts when you have these lines that don't have any cycle or any collision between the transcripts in the ideal world and the real world. And that's the crucial thing to actually get proven to see the generality of the result. So then let's apply the h-coefficient technique. So again here I give the lemma of the h-coefficient technique that we applied to PRP pairs and we're now going to apply to iterated random function. In this the bounds of epsilon 1 and epsilon 2 that we derived we have this m-trail attack term that we derived earlier but also as I said a good transcript this one that doesn't have permutation cycle. So this is something that we also need to take into account. You see another term appearing. For epsilon 2 as you can see in the term there's no r present for the number of iterations which makes sense because in the ideal world we have a random function and an iterated random function. So then this allows us to bound the aversary as follows. So to conclude the talk what we do is we look at the iterated random function problem which means that let's say that f is a random function where to distinguish between f and f iterated are times. And the best known attack is an attack that is at the bound that has the best known attack is linear in r if we want to look at the success probability but for the advantage of the best known bounds there is a quadratic term. So what we do is we improve this to an almost linear term so we still have this logarithmic factor that's of course an interesting thing but it grows slower than any polynomial function thereby showing that we have an almost tight result between the best known attack and the best known bound. So this concludes my talk. I'd like to thank you for your attention. I have a small puzzle because the attack is for an adaptive and your bound is for adaptive adversary. So I'm not sure if I understood your question. So the best known attack is for a non-adaptive attack but bound to be improved takes adaptive adversaries into account. So I think it is a little complicated because your bound is the non-adaptive attack may be equal to the adaptive attack. So what I should make clear is that we are not saying that non-adaptive attack is always going to be as good as an adaptive attack. It really depends on the way you choose the values of r and q and I'd invite everybody to look at the paper to see the exact bounds to know what exactly the advantages that we are looking at. However, to give them basic motivation like these are in the beginning of the paper for what we want to do, if we are going to look at a small number of queries and we're going to look at what the success probability is of the attack, then what we see is that there are non-adaptive attacks. So just taking a set of inputs and looking at a collision at the output is the best attack with a certain logarithmic factor still in there because if we would be able to improve the attack, we would violate the best known bound. The best known bound is adaptive so of course it takes non-adaptive adversaries into account as well. I'm not really sure if I'm answering that. I'm just a small part of it because I believe your proof is right. My question is whether a non-adaptive attack may be equivalent to an adaptive attack because your prong is the upper bound for an advantage. I think he's just saying that the adaptive doesn't help. Yes, the adaptive doesn't help. And I'm saying that the adaptive can help but it really depends on how you choose the values of r and q. So the result is actually much more than what I'm showing in this presentation. You will see the full formula in the paper where you can see that in specific cases it does help to be adaptive but this is for example large values of q and r. If you look at a very small number of queries like what I showed in the beginning of the presentation then an activity does not help. It's more subtle than this. Any other questions? But do you think adaptive could help to get the log3 and q there which improves the attack to get the log3 and q there? That's a really good question. So the factor log r that comes in there comes from a specific siren that we need that is related to the distribution of prime numbers because that's somehow important. It's something that we may be able to get rid of but it's not clear. So we want to make sure like if you make small steps in the wrong direction in this security proof that we're really careful then you end up with the trivial cbc background and you end up with anything that is improved over what is known already. So we were really careful to avoid this but did still end up with this log r factor and that's something that is not really clear if it is a necessary factor or not. But it comes from this privacy? Yeah. So how do crimes come into the picture? Yeah well yeah so there is a certain small lemma that we needed for which course we found the result that is conditional based on the Riemann's Zeta Hypothesis but that I thought oh no that's never going to fly because the reviewers will say and I would say myself what is this Riemann's Zeta Hypothesis I don't understand it should I assume that it is true so we found another type of result that was proven that we need somewhere in the computation that is unconditional but it does have this log r factor in there it's related to when you see yourself going around the cycle you need to go around a certain number of times the greatest common divisor is important I don't know how many times so the greatest common divisor between r and the cycle length is important and that's where these prime numbers and eventually the log factor comes in. Okay let's first take further discussions of life into the coffee break and let's thank the speaker again.