 Good morning everybody. So this is joint work with Bart Menink and Jill, so I will not have any nasty questions by Bart. I hope. So this is something with the catch-activity we've been working on for years, I think since 2010 or so. Especially this part, sound hashing modes based on arbitrary functions. And then more recently, a few years ago, we started also looking into permutations and then Bart Menink arrived at Halberg University and we looked into it deeper and we had some nice results. And then we said, why don't we put it in a big sock and submit it to a big sock paper and submit it to FSC. And the result is this. So it's basically about how can you, what is the kind of, not the least number of conditions mode has to be, has to satisfy but kind of sufficient conditions for a mode to be sound, to have birthday-bound security and the chaining value, and still be simple. My ambition is to try to explain to you in the coming 20 minutes what this is. So let's start with a number of examples. So we start with SHA-256, but SHA-1 would be the same. So we have, we build a hash function from a fixed input length compression function, this one, by applying the Mörger-Danberg construction, well known to everyone, I think, if you wouldn't know it, you know, maybe not in the right conference. And this compression function, we build it again in a hierarchical way, so we have two layers. We build it again from something smaller, namely a block cipher, because that's something we know how to build. We don't know how to build a fixed input length compression function, but we know how to build a block cipher. And this is basically the Davis-Mayer construction, where we take the block cipher, we put the message block in the key input, this is the key input, the message expansion corresponds with the key schedule, and we encrypt the CV to the next CV, and we put the feed form. An underlying primitive in SHA-256 is a block cipher with 256 block length and 512 bit key length, but the key is actually the message input. Another example, so why is MD6, so why do I go to MD6? Because this played an important role in the development of this paper. So it's quite innovative construction, submission to the SHA-3 competition by a big team, and it's very different from Mörger-Danberg in that it's hierarchical. So you basically do these points, one of these points is kind of an application of an underlying compression function. And here are the leaves that contain message bits. And then here the chaining values are assembled in intermediate nodes and you build a tree. And this is the final node or the root node as they call it. It is special, it is encoded in a special way. And then every of these intermediate nodes are also in the leaves. You indicate the coordinates of these in the graph. So they put a lot of encoding on top of this to make sure it's secure. They also had a proof of security of this mode if you apply all this code. And this compression function, as we cannot build a compression function, we have to build it from something else. And that was in their case a permutation, so it's quite innovative. And they build it from a permutation with the following construction. They fix part of the input, 15 words. And then they have here a dedicated space for these vectors and also for the key. But again, that's not important here. And the data and then they do the permutation and they chop. So they truncate the output and this is then the chaining value. And this is then again, so you apply that here. The chaining value goes then here and it will be put in this place. So it's the dedicated construction. Underlying primitive, this permutation was a 5,696-bit permutation, talking about big permutations. Also this prepend is 15 words. It's 15 times 64, so that's... I cannot compute out of my head what that would be, but there's a lot of bits that you lose. So this could be... Our idea was that this can be done better. So we can do this much more cheap. Okay, then a more recent example. This is Kangaroo 12, something we proposed in 2016 and presented on ACNS, right? In Leuven recently. And what it does is that the goal of this Kangaroo 12 is that it's parallelizable. And we do it with an hierarchical mode, where basically we have one layer only. Each of these blue blocks is part of the message. So we split the message in a number of chunks. And each of these chunks we can hash independently. And then we add some padding bits coming from Sakura, our coding mode. And so each of these arrows is the underlying compression function, which is in this case a ZOF. And we get also a ZOF, because this thing is a ZOF by itself. So the output is invariable, extendable. So what do we use for this underlying ZOF? We use the well-known sponge. And what is sponge by itself, it builds a ZOF from a permutation. And the permutation is catch up B with 1600 bits. So these are three different examples of what hash functions look like. So you see typically a number of layers of constructions. And in the end, you reduce to something you can build, which is either a permutation or a block size. Okay, so what's the basis of security for hash functions? Well, we're in a provable security session, but we should not forget that we cannot prove a hash function secure. We cannot prove a block size for secure, or a permutation, or anything. We can only rely on public scrutiny and crypto analysis. That's what we have to rely on. But we can do something close to provable, proving the security. Namely, we can idealize this hash function by replacing the underlying primitive by something completely random, like a random permutation, randomly drawn from the space of all primitives. And that we can prove. And what we mean by proving secure is that we can show that it's hard to distinguish from a random oracle. A random oracle is the ideal hash function. That's our great ideal. And that proves that actually any attack that breaks this hash function must exploit properties of the underlying primitive. So it says something about the mode. And that's what this talk is about, about this mode. So you can say, yeah, okay, but in the end, if you replace this idealized underlying primitive by a concrete instance, then the proof is no longer valid. Yes, but it's still good to have this proof because what can happen if you don't have a good bound? So remember this construction, but it suffers from length extension. And I'm not going to explain it, but what is a consequence of length extension is that if you use this for as a mock function, where you just put here the key and here the rest of the message. So you just compute a mock on a message where you prepend the key in the input, that's not secure against forgery. And there is a fix, that's HMAC, but it's a quite expensive action. Other things are that many attacks were found that had a complexity lower than expected. So for instance, for long messages, second pre-image is not n where n is a digest length. It's not a security strength n, so it does not take two to the power n, but two to the power n divided by the length of the message. Multi-collision attacks much faster than for a random model, and so on. And all this affects all these old-style hash standards like MD5, SHA1, and so on. Okay, so now what are the modes we will treat in this paper? Actually, we look at modes that you can describe in a two-phase process. So you don't have to compute it in a two-phase process, but you can visualize or you can conceptualize it like that. So what do I mean by that? So in the first phase, we look only at the length of the message. We do a process that only depends on the length of the message to be hashed. And maybe some parameters. So for instance, in MD6, there was a parameter saying serial or parallel, and that would give two different modes. So you can have here also parameters saying what the length of these blocks are, or maybe... Here, for instance, we have two chaining values per intermediate. You can have three or four, so a number of parameters. And using these parameters and the length of the message, we build a kind of recipe to do the hashing of any message of this length. So in this case, we have a message of 21 bits. And the template says that we have to take the first six bits, we have to put them here, the next six bits here, the next six bits here, and the remaining few bits here, that we have to add padding, and then that we have to append two zeros to each of these blocks. Then we have to apply our compression function, our underlying function to each of these and assemble here the chaining values, and append one zero and so on. So this is kind of a recipe. You see three different colors, these light gray that are message bits, the dark gray are chaining value bits, and the white bits, the white are the frame bits. So bits that do not depend on the message content, nor of the output of the intermediate function. Okay, so what we... We only allow the input to the function always to be a concatenation of these three types of bits. This model does not cover, for instance, a feed forward. We cannot do a feed forward in this model, but we don't need it. So this is the first phase where you... This is really where the mode comes in. T stands for the mode. Why is it T? Because we were initially thinking only of three hash modes, but now it's also sequential. So this is where the mode comes in. The mode comes in where you convert the messages and parameters into this template that we call a template. Then the second phase is where we basically take the template, take a message, and just execute it, and a function F. So here this is completely independent of the underlying function, underlying primitive. Here this function comes in, the message comes in, the content of the message, not just the length, and we execute it. Basically we just do what the template says to do. So we put these bits, put them there, then apply F, and so on and so on. And the hash is basically our underlying function applied to this node. We call that the final node of the tree. So this we call a hash tree, or short tree, and this is a tree template. And it's actually the properties of these trees that make a mode that satisfies certain conditions, then our mode is secure. So this also covers sequential hashing, because sequential is just a special form of tree. And we did it for the three types of underlying functions. Well, this is an arbitrary function, with no special properties, so like a Zoff or a hash function, or a fixed input length compression function. It can be a truncated permutation, and it can be a block cipher or even a truncated block cipher. So where we take a chaining value, part of the data output, the ciphertext, let's say truncated to some... Okay, so now I'm going to try to explain the conditions in my tenets that I have left. So these are the conditions. So I'm not going to now name them, I'm going to immediately try to explain them. So here, this is the space of all possible binary trees, you can imagine, about trees kind of these trees, they have a connectivity, in the paper it's well defined. And it's an infinite space, but I just depict it like a rectangle. And in this space, a node, mode T, it defines a set of trees you could possibly arrive at. And that's what we call the set of trees generated by the mode T. I could call this ST, but yeah, I've worked for us. Okay, so now let's take a look at our first condition called message decodability. It says that if we have a tree, then that is generated with the mode. So by applying our mode to a concrete message with a concrete function, then from this tree, we should be able to unambiguously derive the message and the template. So we can actually, from this, reconstruct the message and the template. That's condition one. Condition two is, it has a more abstract depiction of such a tree where we don't put the bit strings and I'm going to use it to define some some ideas. So we can define in this tree, this is a tree of ST, we can define different types of subtree. So this is a final subtree. Final subtree is a subtree that contains the final node. This is a leaf subtree. A leaf subtree is a subtree where from its roots, all the descendants are in. So you don't you have all, you have up to the leaves. And this is just a subtree. It's not a final, not a leaf subtree. So we, again, do our big diagram. Now we can define the set of all trees that are subtrees of trees in here. So we take all the trees in here and then we take, for each of these trees, we remove some nodes and then we can form the set of all proper subtrees. And we call that ST. And subtree freeness says that this set and this set must have empty interception. That's the condition. That's all. So you cannot have a tree that is at the same time a tree of ST and a subtree of a tree in ST. Okay. Third condition. A radical decodabit. So you see here, it feels very bad because there must be something missing here, right? So there is here a chaining value where there is no arrow going. And we call such a chaining value, we call that a radical. So it is kind of something missing. And so now I try to define a radical decodabit. So F here I die a diagram with the subtrees and the trees. And and a subset of the subtrees are the leaf trees not overlapping with the final trees. If they would overlap, that would be a full tree. So that cannot happen. And radical decodability simply says that of these trees you can always find a radical because this tree is not complete. You can always unambiguously define a final radical. The real radical decodability is a bit more subtle. And you would think this is more restrictive, but this is actually less restrictive. So we allow we have a slightly larger set or can be much larger set than ST final that's where you can identify a radical. So this is radical decodabit. So now let's take a look at what we mean by sound. We look in the differentiability model that was introduced by Maurer in 2005 more at all in the team. And later applied the hashing by Gaant et al. And we applied it already for sponge and there we proved an advantage to choose of n, so combination 2 of n to the power minus c, which is basically the birthday bound in the capacity. As soon as you got internal collisions in the capacity you lose it. And here in our paper for this setup we can prove it where we replace the c by the length of the chaining value. So as soon as you have collisions in the chaining value you lose it. So but I have to explain a bit this diagram. So I have here the adversary and the adversary can query our construction. No it cannot query our construction. He can actually query with messages and templates. So the mode is here. He's doing the mode and he builds messages and that is to take into account these parameters so I'm not going to explain it here because then I would go over time. But so the adversary has access to the execution of the template and he has access to the underlying compression function, the underlying primitive. And he can actually check consistency of the responses here and here because this is a very simple parser. He can do that himself. So he can do either send mz or build z itself and do the thing here. And that must be consistent. And this will be consistent. But at the other side you have the random oracle who has to actually our mode should behave like a random oracle. So the random oracle is like there's no choice here. And we also query the random oracle with m and the templates. And for any mz we'll give a different answer unless they're the same. And he can also query here the simulator that we have in differentability. And this must be consistent. And these conditions basically allow the simulator to do his work. That's it. Because of these conditions the simulator knows what's going on and he can be consistent with the random oracle. That's the whole point. So we get this bound if the mode satisfies our conditions. But there's one condition I didn't mention because that's the additional condition for block ciphers and permutations. Truncated permutations. And then we can do inverse queries. Which we cannot do when a compression function is off or a hash function. And without this additional condition you cannot build a good simulator. Because the simulator cannot know what's going on as soon as people start doing inverse queries. There we need an additional condition which is leaf anchoring. And leaf anchoring means that the n first bits of the permutation are reserved. Either they are a constant IV in leaf nodes or they are a CV in leaf nodes. For block ciphers this also applies but this anchoring must be in the data input. So not in the key input. You can take all the countermeasures but this is the simplest. One countermeasure that doesn't work is a Davis-Meyer feed forward. That doesn't work. It doesn't work again. Okay so now let's take a look at the minimum solutions. So with a compression function actually you can satisfy these conditions by just this simple mode where you have to add two frame bits per call and hear some panic. That's all you need to do. This is a fixed input length compression function. So there's no need for an IV and here you just have every time the CV is first. If you work with a truncator permutation or a block cipher like for instance SHA-256 you have to put an IV and then you can actually get rid of one of these two bits and you only need one frame bit and you can put that directly after the IV. So for instance in SHA-256 you could put here 255 bit IV and 255 bit CV and one bit and you would have here the 512 bit completely at your disposal no length padding or anything. This would be more secure from this point of view than the current SHA-256. Okay so what are the implications of this work? Well you can do a three hashing mode on top of it secures off and that will give you a secure zone. So if this zone is secure the three hashing mode will also be secure. So that's Kangaroo 12 as an example and the Secura encoding actually ensures two of these conditions. For hashing based on permutations sponge is not covered. So that's something else. So we didn't cover sponge but sponge was covered in 2008 so that's efficient. MD6 if you look at all this magic that they put in all this coding they could have actually just put n bit IVs in leaves and one frame bit and it would be much more efficient and they wouldn't need a 5,000 bit permutation. For hashing based on block ciphers from this analysis in this context Davis Meyer feed forward this uses. Mercadamware strengthening this length coding is uses. And the CV can actually be shorter than the block length of the cipher. So if you take SHA-512 for instance if you just aim for 128 bits of security and why would you aim higher you can just reduce the chaining value to 256 bits. Okay that's it. Thanks for your attention. Questions for Yuan? So I have one. Do you know if the conditions you define ensure that the mod is collision resistance preserving because it's not implied by indifference ability. It could be that the compression function is collision resistant but the mod is not even if you prove indifference. That's meaningful if you can buy if you can build something that is collision resistant. So we as symmetric crypto guys cannot build something that is collision resistant we build something that looks as random as possible. So collision resistance preserving is fine but at some point you have to rely on randomness. Like ideal cipher model you can say maybe ideal cipher with feed forward maybe collision resistant I don't know if that can be proven if it's an ideal cipher but you're not relying on collision resistance you're relying again on this. But it's a property of the mod that you could prove which is not implied by indifference. I'm not sure but I think we have collision resistance but I thought we had but it's not the essence we look at indifference ability basically. No that's implied by indifference ability. There is a paper by in western part which gives examples where you have indifference ability but the mod is not collision resistant. Very strong indifference ability this indifference ability is super tight. So I don't know if you can get this tight indifference ability and still have collision not have collision resistant misery but we didn't even mention in our paper because we think it's not important. What are the questions? Well if not let us thank Johan and all the speakers of the session. Thanks for attending and it's time for a coffee break.