 Okay, so thank you. So hi everyone. So this is joint work. Yeah with Peter Gaji and Chishto Pjeczak We're both from ISC Austria and as you can see from the title the talk is going to be both about kid sponges and truncated CBC But actually so that's what the paper talks about but the talk is going to be mostly about kids sponges And hopefully I can convince you on the way that the truncated CBC case is just a special case of the result Okay So broadly this talk is going to be once again about pseudo-random function. And in particular pseudo-random functions are central concepts in symmetric cryptography and well beyond that and they are used for numerous applications, they're used as message authentication codes, they're used for Symmetric encryption, they're used for key derivation and so on. And in this talk I'm going to focus concretely on those hash functions that are obtained by appropriately key hash function by inserting a secret key somewhere into the computation of the hash function roughly so that what we get is a good pseudo-random function. And of course we know this and the one known construction that the most widely used construction doing this is HMAC Which is used all over the place, but somehow HMAC is actually some extra overhead that we will not like to have in it Which is due to the fact that the HMAC construction needs to deal with potential extension attacks on the underlying hash function, namely the fact that for all hash functions up to before SHA-3 that I've been using practice it is easy to give an H of M to compute the hash of an extension of M even without knowing M And these have to be taken into account because it's a property that we don't want to have in a good PRF But for SHA-3 actually There are no extension attacks by design. Okay, and so HMAC is not really necessary Okay, and in particular SHA-3 relies on the sponge construction which in its basic form relies on two parameters really N and R as I'm going to use it today in this talk and There's a third parameter C, which is just a shorthand for N minus R and to hash a message M the sponge construction relies on some underlying fixed permutation pi which goes from N bits to N bits and It starts by splitting the message into R bit blocks and then the computation starts with some initial state With we can it's an N bit state and we can think of it as having an upper R bit part in a lower C bit part and then at each round We are going to extort the message block into the upper part of the state and then feed the whole state into the permutation and get the next state and we do this over and over again and Until we are done We run out of message blocks and then to find the final output in order to prevent extension attacks What we do is we take the final state and we truncate it by chopping off the lower C bits And then we just left with the upper R bits which are our hash Okay, so to be fair the sponge construction achieve much more also variable output length for example But I'm not going to talk about it in this talk Okay So if we want to get a PRF from it, this is a hash function There are two natural approaches to key sponges one of them is by putting the secret key into the initial state into the IV and The other one is by prepending the key to the actual message Okay, so the second one is actually more desirable in practice because you don't have to modify the underlying hash function to get what you want and If you like HMAC or you are in the HMAC words, you can think of this as moral analogs of NMAC and HMAC in the sponges Okay So and in this paper what we do is we address the question of how secure these Constructions are as pseudo random functions, and we are not the first ones doing so But our contribution is to give the first near tight analysis of the concrete PRF security of these constructions And we do so both in the random permutation model And we also give analysis in the standard model And we consider both keyed IV and keep repenting variants of keyed sponges Okay, and it's clear that I mean we believe these results have direct implications of course on shot three so should three user sponges and our results give a strong validation for deriving PRFs and max from sponges and also they validate adult construction that are based on the sponge partings But perhaps not in shut and this talk is going to be mostly focusing on the random permutation model analysis for the keyed IV case which gives most of the important ideas behind this so When I talk about PRF security in the random permutation model What I mean is that I think of the keyed sponge construction now explicitly as an algorithm that takes inputs produces outputs depending on some secret key K and Also makes explicit black box queries to some underlying permutation pi Which for the proof we are going to model is random so it's chosen uniform and random from the set of all n bit permutations and Our attack model then considers a distinguisher that can make both construction queries to the sponge construction as well as primitive queries to the permutation pi and to its inverse Okay, so the distinguisher is required to output a decision bit and we assume that a extinguisher here to be computationally Unbounded so it can compute as much as he wants But it's bounded in the number of construction and primitive queries. It's allowed to make Okay, and to achieve security We will like this real world to behave close to an ideal world where this sponge construction is replaced by a truly random function Okay, and we measure the concrete security of the construction by looking at the advantage the PRF advantage of such a distinguisher Which is the difference between the probabilities that the distinguisher outputs one on the left and the probability that puts one on the right Now Random permutation model security proofs of course do not give us an actual security proof So the real permutation underlying sponges is not random, but they nevertheless give us some strong security guarantee in terms of proving lack of generic attacks for example that That treat the underlying permutation as a black box Okay, and just that just understanding the concrete security for generic attack against keyed sponges It's already something which is really not resolved today Okay, in fact the best attacks that we know to distinguish keyed sponges from a random function are So achieve advantages of the following form either Something like q square over to the sea where q is the number of construction query Or if you allow primitive queries, then you can achieve something like he q times q pi over to the sea Okay, but the bottom line in all of these attacks rely on the ability to find Collisions on the lower part of the state on the lower sea beads and in particular not any internal state collision like for example here But we are really interested in finding Collisions on the lower sea beads for states that lead to an output So for example, if you alter the outer part you know that and you find a collision in the lower part Then you it's actually easy to distinguish the construction from a random function and it's pretty much all we know how to do Okay, but still there's a gap with you know what provable security guarantees to us And in particular an interesting question that you might ask starts from the observation that all of these attacks They can only be they cannot be cast as only relying on short messages like messages. There are only a few blocks and For example, you might start wondering things because we wondered about them for all their constructions of PRFs and max Whether we can find attacks that you know exploit message like perhaps you only need to make a few queries Because you're only allowed to do that, but you use long messages and that help you distinguish and in fact If you look at the related work That I mentioned before there's been numerous analysis of keyed sponges starting from the first indifferentiability proof for sponges That implies already something in this context But all of these analysis actually leave this possibility open that there may be such a tax exploiting length in fact All of these analogies analysis prove a bound on the best advantage achieved by a distinguisher That can make Q construction queries of length at most L and give us bounds that have Pretty much the form that I gave there which strongly depend on the length So here the length means the number of orbit blocks that your message consists of okay So all bonds have these following form and in particular they include terms that for example have the form L Square times Q square over 2 to the C So it might well be possible to find an attack which makes very few queries, but uses a high L But we don't know any such attacks and so one can conjecture that maybe this lack dependence is not necessary So what's our main result? So our main result is a new bound on the PRF security of sponges, which is very close to what we want to have So the first two terms are actually exactly what we will expect But there's an additional part of the bound that actually depends on the length But the key point is that the part of the bound that depends on the length only as turns with the nominator 2 to the end as Opposed to 2 to the C and this is actually not too bad in general because remember the C is equal n minus r and For example in many concrete applications like when we use shot 3 R is pretty large and it is very safe to assume that Alice strictly smaller than 2 to the R And in those cases this final term just goes away and the bound is really matched by the attacks and it's tight Now note that our result applies for more general Construction that actually doesn't restrict the message blocks to be our beats, but they can be arbitrary and beat blocks and They might have some structure, but they do not need to be our beat They can be arbitrary and beat blocks and the only constraint is that at the end of the construction You just chop off and truncate and get only the first our beats and Interesting point here is that if you now think of the permutation pi as being secret and not public then this is exactly the truncated CBC Construction, okay So if you think of your distinguisher as not making queries to the permutation pi You can think of pi as coming from a block cipher and then you get directly an analysis for truncated And in fact for the remainder of this talk to give you an intuition of the proof I'm going to restrict myself to this truncated CBC scenario And I'm going to just discuss how the proof works in the case where the distinguisher can only make queue construction queries But not primitive words as this capture already most of the challenges of the proof Now so in particular what are these challenges? So if you want to prove security of this construction, we have to take into account two things that make the proof difficult The first one is that we have to deal with dependencies that are not available when analyzing order construction Say like encrypted CBC for example or prefix for a CBC so We can make a query learn an output and then make a later query Such that the computation of the output for this later query query depends internally on the output of the previous query And this is because we can just go on computing and we just outputting part of the state as one of the outputs and Another issue is of course that all of the previous analysis get non tight bounds because what they do is that they essentially a Their bound comes from essentially bounding away the possibility of having collisions in the lower seedleads of the state at all But not all of these collisions are good. Some of them. We don't know how to use them for attack So we have to find another way around that to analyze the construction so in our approach is based on the idea that we model the computation of the sponge construction of the key sponge construction on a sequence of inputs as a Labeled tree which we call the vertex tree, which is a bit different that usual graph theoretic in interpretations of when analyzing iterated Symmetric constructions in particular the message tree is going to have the vertex set Consisting of all of the prefixes of the messages. We are considered So we are looking at the computation of the spot key sponge construction on a set of Q messages So the set of vertices is going to consist of all of the prefixes of these messages So for example, if we have four messages here consisting of different blocks here here by a bold face 01 I mean a block consisting of n times that bit, you know, we will get something like this where the violet vertices correspond to the messages and The orange vertices correspond to prefixes of it which are not Messages that we are actually looking at and then we can model the computation of the sponge construction here by Assigning label to vertices the correspond to the internal state state values of the construction when evaluating it on these inputs So for example, we're going to start assigning to the root a label corresponding to the secret key And then we just go on applying labels that by just evaluating what the construction will do So for example, the label of one is going to be obtained by applying the permutation to the label of the root Exhort with the message block leading you there Okay, and so on we can assign labels to all of the three and the labels of the violet vertices are going to be exactly the outputs of the keyed sponge construction on the corresponding inputs and It's also convenient just to give you an idea of the framework in which the proof is placed to define Reduce message tree which is exactly the same thing except that we are going to hide from you those labels that are assigned to the actual inputs Which are they actually out so those that are used to compute the outputs of the construction? Of course the outputs are going to be obtained by truncating these labels Okay, so why is this good because in the actual proof what we are going to do is We are going to consider Transcripts of interactions of the distinguisher with the given construction and we are going to enhance them by appending this reduced Message tree to them. For example in the real world We are going to consider an interaction of the distinguisher with the sponge construction making queries to the underlying permutation and this defines a sequence of input messages and corresponding outputs and Dispines a transcript and we are going to additionally add to the transcript the corresponding reduced message tree and We are going to do the same in the ideal world here The outputs are random, but we can still define after the execution is over a reduced message tree by using a random and independent permutation Okay, now of course This can only help the distinguisher and in particular what we can show easily is that the advantage of the distinguisher d in distinguishing these two worlds is not Larger that the statistical distance between these two augmented transmissions Now of course we are giving a lot more information to the distinguisher and the question is how much does it help? But the intuition is that this reduced message tree should not help too much because we are deleting exactly those values for me that leads to outputs of the key sponge construction and so this should not it should not help you distinguishing the real world from the ideal The catch is of course that this nice intuition is not true Like always so such reduced message trees can actually give you some information You know for example even if I erase all of the values that define the outputs of the key sponge construction You might still learn interesting things in some isolated cases So for example if I give you such a three and then I tell you that additionally the label of zero which is Up here on the left and the label of one one collide you can see that I haven't erased them from the three You can also infer because I'm using a permutation that the corresponding inputs collide and in particular that the label of one is equal to the label of epsilon x-word with the one block and This is useful because now it's going to allow you to learn What the output value is on this message and allows you to distinguish the real world from the ideal world and So we have to exclude some degenerated message tree and in particular These are exactly I mean this is a long definition, but it's easier said in words These are exactly those reduced message tree where so if you look at the message tree these are exactly those were the values assigned to the messages and The values assigned to the successor of the messages which was exactly what we exploited here are not unique and collide with something else that you Can see okay, so if something like this happens when we reduce the tree We're just gonna say sorry bad luck The reduced man the tree is going to be set to some error message which I error symbol which I do not hear a star Okay, and otherwise you see everything as before and in particular We're gonna say that an interaction transcript is good if the corresponding reduced message tree is not going to be reduced to such a star so and the expectation now which is still not easy to see is that if we give such a non-degenerate Reduce message tree now. It's going to be hard to distinguish Okay, and this is something that we are going to prove by yet again like in many other talks in this conference by using Patterns H coefficient method which is is going through many new applications and in particular the idea here is that we have this Set of possible transcript. We have a set of good transcript which are those for which the underlying message tree is not degenerate and Now if we manage to show the following Okay, which is the essence of patterns H coefficient method that there exist some values epsilon and delta such that The probability that in the ideal word a transcript is not good So it's outside is blue set is at most epsilon and more over all good Transcripts appear with more or less the same probability in the real and the ideal world So they are related by some multiplicative factor one minus delta Then the statistical distance between the transcripts the real and the ideal ones is at most epsilon plus delta Okay, so very useful lemma. There's nothing to do with cryptography is just a very nice property of the statistical distance But it turns out being really power and in fact, it's not hard in our context if we define bad, good and bad transcript as we just did to bound the probability That an ideal transcript is not good in the by using actually a very powerful lemma by Bellari-Pietchak and Rogaway from crypto 5 And it was addressing CBC like construction that applies to this setting So the hard part is actually the second part of the proof Namely proving that the probability of getting a good transcript in the real world and in the ideal world are close to each other And are related by a factor one minus delta where delta is exactly pretty much what we want something like q square over 2 to the c Okay, and I don't have time to say much more But I just want to say that the key problem here that we want to address is the following So imagine that you're given a real world transcript with a reduced message tree Which you know that is not from the degenerate case So the underlying message tree was not degenerate so that corresponding labels were unique Then the question that we are asking is now given this reduced message tree for which you don't know some of the labels The underlying problem is essentially a counting problem. You want to know how many ways do you have to? To complete this reduced message tree into a full message tree Without making it degenerate and such that these new labels that you're completing with are consistent with the actual outputs of the construction So this is an just inherently a counting problem, which is the core of the proof Okay, and I've been cheating a little bit here by hiding some higher order terms And the fact that you have to still make some assumption that the other labels that you see are well behaved But everything pretty much work out and sort of a painful way Okay, so just to conclude two final things about other results. I mentioned in passing that we also have standard model bounce Not much to say here except that we so essentially we can fit our result in a very elegant framework that came up in a Workshop paper a couple of years ago by Chang et al that was giving standard model security proofs for sponge like PRF Constructions and we can fit our improved bounty in their framework which essentially reduces So this will be our improved bound But it's essentially reduce the standard model PRF security of keyed sponges to the PRP security of the underlying permutation when placed into some even Mansour like ciphers where instead of whitening the whole input and output as you usually do in even Mansour You're just going to whiten the lower c bits of the inputs and the output Okay, so this is still expected to give you some sort of good cipher And if you assume that that's really secure and you have a bound on how close this is to a random permutation Then you can use this bound to plug it in in the standard model result and it's going to give you a standard model bound Okay, the reason why this doesn't supersede our random permutation model result is that when you translate this result to the random permutation model Which is the usual benchmark model for sponge like construction analysis The bound that you get are actually weaker than the one we can the ones we can prove directly Okay, but it's still interesting and also there was a recent work by Andrei white I'll at the last FSC that actually showed how you can make this reduction tighter by instead of having this factor q times I'll you can have another quantity which is called the multiplicity which in some applications it's smaller and it's easier to bound and The other thing that I haven't mentioned that I don't have time for is the key prepending case So in particular in the paper we give a full analysis in the random permutation model again There's not really a standard model analysis there for the case where we Prepend the key to the message and in particular and this is really the harder case the key can be spread across multiple blocks Okay So this makes the analysis significantly harder and in particular to do that We actually surprisingly have to find a connection with work in the area of key alternating ciphers and recent analysis by Chang and Steinberger To actually search similar problems come up when analyzing this key absorption phase Before when the key is placed before the message Okay, and again the non-trivial part is dealing with the fact that you might have collisions in the lower C beats of the state, but that's not the end of the world so we have to get around that and And so we can't just simply exclude those collisions because this is not gonna give us a good bump And also in passing I want to notice that in this concurrent work I just mentioned by Andrea at all there was also a claim in the previous in the first version of the paper That was claiming a better bound than what we have here, but ended up actually having a mistake Which is now fixed by using The techniques that we have here. So if you see a better bound around the hours, it's just not because So that the bound has been correct. Okay, so this is pretty much all I can say in the time of this talk So I think that the bottom line here is that Given the fact that shot three now is our new reality and it's going to be used and we want to build PRS from hash function Then understanding the concrete security of the resulting constructions is really really important And it turns out to give rise to quite of an interesting technical problem that we have addressed here If you want to give bounds that are really tight and match existing attacks Okay, and I think we really had to develop some new techniques and new ways to look at how you analyze such Iterated PRF construction that I really hope that I've been can be applied to other constructions and in terms of all the problems So I cheated you a bit during this talk. I have to be honest about that So if you haven't paid close attention to the bound and I haven't thought maybe but the bound was a bit At you know was a bit simplified There are some higher-order terms in it if you look at the paper Which are not relevant for the statements I made but make the bound that look a little bit uglier than it should be And it's not clear whether they're necessary or whether they're the result of our proof technique So cleaning up the bound in that sense is definitely a meaningful open problem that might lead us to learning new things and also So the tightness of the bound assumes again that messages are not they can be very long But they cannot be longer than 2 to the hour or is the output length so investigating time is outside this regime is also a very interesting open problem and We have a full version of the paper online so with all of the proofs and that's all I wanted to say