 So the next presentation is called the design of Zudu and Zouf. It's a paper by you and dame, Seth Hofholt, Shil van Asch and Connie van Keer, and Shil will give the talk. Okay, thank you for the introduction. Hello everyone, so indeed I'm going to talk about Zudu and Zouf. So this is the outline of my presentation. I will give some motivation for this work. Then I will go through the definition, sorry, of Zudu then Zouf. And I will conclude with some words on how to use Zouf. How it can be used. So first, motivation. The motivation starts from the very title of this conference. We want to have fast software encryption. And presumably it needs to be secure, all right? Usually we don't only need confidentiality, but authentication is also important. Sometimes we need authenticated encryption. And it needs to run fast on software, but yeah, hardware, this should also look that. But more generally, we would like to have something that runs fine on a wide range of platforms. Because in many applications, you can have small devices that interact with the high end server on the other end. So the same algorithm would be nice if it can run fine from the low end to the high end. So one way to do this would be to have some very small primitive that can be used in a mode or in construction where there is a lot of parallelism. So the small device can just compute this primitive serially. But then as the platform grows, this primitive can be computed several times in parallel as much as the platform allows it. And so last year, we presented something called it's a construction based on permutations. And it has a lot of parallelism. So let me go briefly through it. So it takes as input a secret key k. It goes to, so there is a permutation f. It goes to the permutation to create a mask k. And then there is the input of this, so the random function is divided into blocks. The mask is going to be exalted into these blocks, but the mask is going to evolve through a rolling function depicted in blue. And then all these resulting sorts will go through permutations and the outputs will be summed. So you can see that all these calls to the permutation can be parallelized. Then the sum goes through another call to the permutation and then the output is again going through a rolling function to be diversified. And we saw the secret mask and this gives us output blocks. So again, the output can also be parallelized. So last year we proposed an instance of Farfale using the Ketchak permutation with 1600 bits and the result was called Kravate. Well, 1600 bits, that's a big too much for low end devices. So one option would be to go for instance to Ketchak P400, but Ketchak P400 uses 16 bit lanes, which is not very convenient on 32 bit platforms. So instead our motivation was to try to have a permutation that fits nicely in this setting and that's Zudu. So please meet the mascot of Zudu, so it's a very robust animal. So yeah, basically Zudu takes a lot of inspiration from the Gimli permutation. So the Gimli permutation is defined on 12 words of 32 bits, so it fits nicely in the registers of typical arm processors. We reuse that idea, but the round function itself is more inspired from Ketchak components than anything else. And our main purpose is to plug this permutation into Farfale as I just explained and the result is called Zuf. Another purpose is to plug it in the duplex construction, the result is called Zudyak, it's available in the Zudu cookbook, but that's not the purpose of this talk. Okay, so I said that we have 12 words of 32 bits, so actually the state is organized as four times, three times 32 bits, this picture only shows eight bits in the side direction, it should be 32 just for the picture, so that's the state. The state consists of three planes of 128 bits and each plane consists of four lanes of 32 bits. A column is something that we are going to use a lot in the round function. So the round function consists of first theta, the linear mixing, row west we will move the planes independently of each other, then chi, the nonlinear, the S box, and row east again we will move the planes independently of each other, and of course there are round constants. So first chi, so chi is really a three-bit S box just like the S box of Ketchak, except that is on three bits instead of five. So to compute one output bit, you take two other bits from the input, one of them is complimented, then you take the product of these two bits and then you absorb the result to the third bit and you do that for all the three bits. It's a degree in two function, but it's also an involution and it has nice properties in terms of propagation of linear masks and differences and yet it made the analysis much easier in terms of trace, I will give a few words on this. Theta is the mixing layer, so it's a column parity mixer, so the idea is that we first compute the parity of all the columns here, then there is some folding, so the idea is that two copies of the parity are sort together with different translations, then the result is sort back into the state and that's the result. So if you have just one bit set to one, you can see the effect, so the parity will be one bit, then the folding will be two bits and then six bits will be sort to the state. Of course if you have two bits set to one it's the same column, the parity is zero and there is no effect, that's the kernel. Row east, we don't shift the plane at y equals zero, we shift by one position in the z direction, the plane at y equals one and at y equals two, we shift by two positions along x and eight positions along z. And then similarly, row west, one shift by x on the plane y equals one and 11 shifts along z in the plane y equals two. So these planes, they move independently of each other, so all the structures on the columns will be destroyed for the next round. So that's the pseudo code of Zudu, so first data computing the parity, having the two copies and then storing them back, row west shifting the two planes, yota round constants, sky computing the products and then storing them back and then row west moving the two planes. Okay, so in terms of cryptographic properties, clearly the security of anything based on Zudu would be limited by differential crypt analysis, so the maximum probability of a differential from delta A to delta B, but finding this maximum differential is hard to determine. Instead, we look, we approximate this by the maximum probability of a differential trail. Differential trail is a trail that it's a differential where each intermediate difference is specified. And similarly to the design of Ketchak, we have something called weak alignment and that makes this approximation plausible at least. Okay, so we, instead of talking about differential probabilities, we talk about weights, which is the negative logarithm of the differential probability. And we looked at Zudu and tried to have bounds on these, these trails using the techniques presented two years ago. So what we started with the, all the trails of three rounds up to weight 50 and there we could find one with weight 36. So that gives us a bound of 36 for three rounds. Then we could extend these to six rounds. I mean, we could not extend them to six rounds. We could have, by showing that we cannot extend them to six rounds, we have an argument that says that any trail on six rounds should have at least weight of 104. And then for four rounds and five rounds, we also did the exercise that's updated compared to the paper. So for four rounds, we could prove the bound of 74 and we found an instance of 80. So we don't know if this 80 is the best or not. We just have an example. For five rounds, we didn't have an example, but we could prove that the weight is at least 90. And the same goes for linear. Differential and linear trails have the same bounds on purpose by design. We choose the rotation constants such that this situation is like this. So the diffusion is pretty good in terms of the strict avalanche criteria. We need 3.5, three and a half rounds in the forward direction to have full diffusion. And in the backward direction, the inverse of theta is heavier and then we just need two rounds. I think that's it for Zoudou. So Zouf is just taking farfale, plugging in Zouf in those Fs there, all the Fs. We also, so yeah, Zoudou is a family of permutations parameterized by the number of rounds. So we use six rounds of Zoudou in Zouf. We also need to define the rolling function. So for the, on the compression side, the rolling function is linear, operating on a full state. And on the expansion side, it's non-linear and also operating on a full state. We make a security claim of 128 bits of security. I think it's both daytime and time. I don't remember exactly. And we also make a post-quantum claim. If someone has access to a quantum computer, then the security is two to the 96. We don't make a claim for someone who would implement this on a quantum computer. That's a different story. Yeah, so I said our goal was to have some good performance on a wide range of platforms. So let me give you some numbers. So let's start with the Cortex, the ARM Cortex M0. So for long inputs, we can reach 26 cycles per byte and long outputs, similarly 25 rounds per byte. And then as a comparison, the AES 128 encounter mode is about five times slower on that platform. On Cortex M3, which is a bit bigger, we can reach between eight and nine cycles per byte for long inputs and long outputs, compared to something of about four times slower for the AES. Then on more the high end, so that's on the skylight processor, we use the AVX2 instruction set, which allows us to have eight instances of Zudu compute in parallel using 256 bit registers. And there we are slightly slower than AES, we are below one cycles per byte, but slightly slower than the S, but still using something that is a general purpose instruction set. On the Skylight X, the more recent processor, they have AVX512 instruction set. There we can compute 16 instances of Zudu in parallel. And there we again are below one cycles per byte and then we are even faster than the AES using hardware AES instructions, of course. So let me conclude this presentation by now, assume we have ZOOF, what can we do with it? And actually, Farfale implements something that we call a deck function. So a deck function is not a new construction I'm going to define just a way to capture that kind of functionality that Farfale implements, so ZOOF implements. So deck stands for doubly extendable cryptographic keyed function. And so it's this FX here, it takes as input a sequence of strings, so from X1 to XM, and it's going to produce some output bits like a PRF, but potentially an infinite number of output bits. So now I'm going to explain a little bit the conventions when we saw a string of a given length to another string of another length. The result is the length of the smaller of the two strings. So I have an infinite, potentially infinite number of bits of output bits here, but I'm storing that to zero to the end, so I effectively I'm just taking n bits of output from this deck function. Then this shift by Q means I'm skipping the first Q bits, so what I'm doing here is just taking n bits from this deck function starting from the offset Q. So from offset Q to offset Q plus n minus one. So it's doubly extendable, meaning that it has some incrementality properties. First on the input, if you compute FK of X, then you save some state, and you want to compute F of Y after X, then the cost of computing this new, making this new evaluation does not depend on X, only depends on length of Y. And clearly the Farfale construction allows for this. And similarly on the output, so it's also extendable on the output, you can request first a number of bits, then more bits and more bits, and every time you ask for more bits, you don't need to start from the beginning, you just pay a cost that is proportional to the number of bits you ask for incrementally. So having a deck function and using this idea of having a deck function, you can build some modes, and I'm going just to give one mode as an example, we call this mode deck sanay force. So it's a session based authentic and inclusion mode, session and non-spaced. So first session because it's a stateful object, and we can ask authenticative encryption of messages, and every time the tag authenticates not only the current message, but all the messages, the sequence of all the messages that have been received so far, that's the session. So at initialization it takes a nonce, then we are going to use some bit E, that is just one bit, that is going to toggle from zero to one every time we switch from one message to the other. History is going to capture the input of my deck function, so it starts with just end a nonce, and we create a tag, an optional tag on just the session setup, just authenticates the nonce. Then if a message comes in, so it contains associate data A and plain text B, we start by creating some key stream here with the R deck function, so now we still have only the nonce. We skip the first two bits because we use them for the tag here, and we saw that to the plain text B, and that gives us the cipher text. Then we update the history by adding the associate data and zero to recognize it E to recognize the current message, same for the cipher text N1, and we produce a tag on the current version of the history, and we flip the bit for the next iteration. The next iteration is going to start with this with the same history, just skipping the first two bits because they were produced as a tag for the previous message, and it goes on like this, and E toggles from zero to one every time. If we instantiate deck Sunny with ZOOF, the result is ZOOF Sunny. We also propose a mode called deck Sansei, which is basically it has the same functionality, but there is no, the nonce is constructed in SIV kind of mode, so the SIV is constructed from the message, sorry, the nonce is created using SIV from the message, and then deck WBC is a wide block cipher, so it provides authenticated encryption with minimal expansion, and all this is available in the extended K-check code package. And that's it, thank you for your attention. Any questions? Okay, I have one small question. So you mostly introduced it in the context of FORFALA, but should I see ZOOF as a standalone permutation or is it really designed for FORFALA? No, you can use it for other purposes. For instance, if you take 12 rounds and it's sufficient to be plugged in the sponge or duplex construction, it's strong enough for that. Okay, so no other questions, let's thanks you again.