 Hello, everyone. My name is Leo, and I'm going to talk about ALZ, a 64-bit ARX box. This talk will also feature our new ciphers called Krax and T-Rax. This is very much a teamwork with Christophe, Alex, Luan, Johann, Alexei, Veslin, and Chingchu. Quite a team. In the beginning of this work was our new lightweight crypto submission called Sparkle. It consists of two hash functions called Ash, and several authenticated encryption schemes called Schwem. This is not the topic of the day. If you want to learn more about those, I suggest you have a look at our task paper presenting them. Instead, today I'm going to talk about ALZ. So in our initial submissions, the core components are permutations. These permutations themselves have a core component, ALZ. So the aim of this talk is to analyze this component and thus to shed some light on the security of our candidates. Along the way, we will also show that this component can be used to construct new tweakable block ciphers and new regular block ciphers. The outline of this talk is this kind of obvious. First I'm going to talk about ALZ, and then I'm going to talk about the new ciphers. So first, what is an SBOX? So this is a block cipher. It operates on n bits and it has a key kappa of a fixed length. n is usually 128, or in our case you will see 64 or 256. And it's a permutation of the space of n bit permutation of n bit strings, which is parameterized by a key. In practice, block ciphers operate using a round function. This round function is iterated multiple times using also a subkey called, in this case, Ki, which is xord into the state before the round function. The round keys are derived from the master key using the key schedule, which I'm not going to talk about today. And the round function itself needs to have several properties. So this was the findings of Claude Shannon, the Claude Shannon. In a block cipher, you need to have confusion and diffusion. And these two properties, you also need to have them at the level of the round function to a lesser extent. And then by iterating this round function, you get the property that you need for the full cipher. The idea of confusion is actually kind of difficult to formalize. It's that the relationship between the output bits, the input bits, and key bits has to be complicated. In practice, it means that it has to be highly nonlinear. Diffusion is simpler. It's the idea that all output bits need to depend on all key bits and all input bits. A convenient way of doing that is to use what we call an sbox layer. So you're going to have small functions that operate on a small part of the state, which are applied in parallel over the full state. These will enter the confusion. For diffusion, if we use a function which is linear, then it's, I'm not going to say easier, it's less difficult to quantify the dependency between the outputs and the inputs. So these small functions, s, that are applied in parallel on the state, that's the sbox. In practice, what does an sbox look like? Well, that's the sbox of the AES, the block cipher. You can specify it just using its lookup table. So in this case, it maps the input bit string of with eight zeros to the output bit string with zero x 63, etc, etc. You can also define it with a math formula, which is also the case for the AES. You can also use bit slice representation. So where each output bit is effectively a Boolean function of the input bits. So here we have the sbox of Kechak. It's a five-bit permutation in this case, and it has this nice representation. This is the case of small sboxes, which is the case usually. In Kechak it's five bits, in the AES it's eight bits, and these are pretty typical block sizes for sboxes. The advantages of using small sboxes are that the cryptographic properties are easy to study because computing the relevant properties for a permutation that has a small block size is computationally trivial. So if you want to compute the DDT, the LAT, or the AINF, when n is small, it's very easy. The resulting ciphers are easier to study also. That's one of the main selling points of the AES, which is that it's provably secure against some specific forms of differential and linear attacks because of the properties of the sbox and how they interact with the linear layer. They are easy to implement. You can simply use a lookup, and you win, or in the case of Kechak you have the bit slice representation, and voila. And for small block sizes components that are basically optimal are well known and easy to implement. So the multiplicative inverse in the final field has all that you need. It has a very high algebraic degree. It has very good differential properties, very good linear properties. In fact, the best that we know how to achieve. So it's a solved problem to some extent. Not everything is perfect though. If you are implementing your sbox using a lookup table, then you're going to have problems with side channel attacks. And in order to use a small sbox, you need to have a linear layer which provides strong diffusion, and that can be bottleneck in terms of performances. So you can have a linear layer which is strongly aligned, like in the hash functions tree bug, where the linear layer is this dense 8x8 matrix. Or you can have linear layers which are not aligned with the words in the sbox, in which case it can be more efficient, but then it's more difficult to study. So for instance, when you look at KJack, you have whole papers dedicated to the analysis of the differential properties of the permutation, which are needed because you can't use arguments as simple as in the case of the AES. You have alternatives to structures based on small sboxes, so you can have ciphers like Chacha or Speck, or the hash function Blake2 is the same, where you're going to have an arc's network, so it's not sbox-based, but then it can be very difficult to study. The question then is if it's possible for a cipher to not use small sboxes, while still allowing strong security arguments in the style that you can have when you use small sboxes. To answer this question, we need to answer an obvious question, which is what actually is an sbox, because we tend to think of them as small permutations that are implemented using their lookup tables, but now that we are defining sboxes using Boolean functions, it's not really a good description in my opinion. And since the sbox is supposed to provide confusion, we need then to answer the very deep question of what is confusion in a cipher. And that will quickly lead us to very deep philosophical questions to which I won't even pretend having the answer to. So let's see what are the properties that you need for a good sbox? You need that they have known, very important that we know, the properties and that they be good, obviously you want good properties, that are relevant to crypto. So differential properties, linear ones, integral algebraic properties. You want them to be good and you want to know them. That's what makes a good sbox. And that's pretty much it actually. You don't need to know its size. It's not really a criteria to be a good sbox. It's the only thing you care about is that the properties are known and that they're good. The size doesn't matter to define an sbox. And in fact, it's not an idea that we're introducing. We already have such things in Satchelna where you have a 16-bit transformation which was considered like a big sbox. You have a 128-bit transformation in Spook, which is a bit like a small block cipher, which you can see as a big sbox. And then you have four of those in parallel. And you also have the 96-bit spbox in Gimli, which can be seen in a way as a big sbox. So how do we design then a wide sbox? It's more complicated because with a small sbox that is four over eight bits, you can just compute their cryptographic properties using standard tools. But it's not really possible for wide sboxes. So let's write first a scope statement of the wide sbox we wanted to build. So first, we wanted our algorithms to be very good in software. It means that modular addition makes sense. Modular addition is not going to be so great in hardware, but on software it's extremely efficient. So we went for an arc structure with addition, rotation, and XOR. We wanted to be efficient on eight 16 and 32-bit microcontrollers. And that imposes then the use of 32-bit instructions, because if we used, say, eight-bit instructions, it would have been great on eight-bit microcontrollers, but it would have sucked on 32-bit ones. On the other hand, 32-bit instructions can also be very efficient on eight-bit microcontrollers. Also, since we are in the software case, we want to use parallel shifters. But then we have to use specific rotation amounts, because different rotations can have different cycle counts on different platforms. So we have to choose these rotations very carefully. We want the cryptographic properties to be both strong and well understood. As I said before, we want to use this as an sbox, so we have to know what it does. And that means we're going to have to use an iterated structure, because then we can leverage the literature on the analysis of block ciphers. If we build our sbox like a small block cipher, then we can analyze it like a small block cipher, and we know how to do that. We also want then to minimize its width, because if the width is not too large, then we can make some extensive experiments to ensure that our analysis is good. So we are going to stock two 32-bit words. So we need 32-bit words to be efficient on all microcontrollers, and only two of them to ease our analysis. And finally, we want to have diffusion that is as fast as possible inside the sbox, and in practice it's going to mean that we want to use different rotations in each round. So instead of having an identical round, which is repeated several times, there is going to be slight modifications in the rounds. This will complicate the analysis, but it will significantly increase the efficiency. So from this scope statement, we got an outline, and now from this outline, this is what we get. So something very simple with just three operations, an addition of a rotation, a xor of a rotation, and the addition of xor of a round constant. Questions remain, how many rounds do we need? So how many iterations of that? What are the rotation amounts that we need? These rotation amounts will decide the cryptographic properties, as well as the implementation properties, so we have to choose them very carefully. And also what sort of round constants do we want? Of course, we are going to want to pick the best constants and rotation amounts, and to pick the best, it means that we have an optimization problem, and then we need to have criteria to optimize for. So one of them is that alzett has to help prevent differential attacks. So this is the quantity we need to modify to optimize in this context. It's the maximum over all possible alpha and beta, where alpha is not zero, of the probability that alzett maps a difference equal to alpha to a difference equal to beta. This is very standard in symmetric crypto. In practice, since alzett operates on 64 bits, we cannot compute this quantity. Instead, we are going to estimate it, and we are going to do it again in a way which is very standard in symmetric crypto, which is that we're going to look at the best differential trail. So that's a sequence of differences that we lead from alpha to beta. So this probability, it's about the maximum taken over all possible delta one, delta two, and delta three of this product. Assumptions are being made in the background. So allow me to open a parenthesis to discuss those for a few seconds. The first assumption is the assumption that we can just multiply these probabilities, meaning that they are independent. A priori is not the case, but it's a standard assumption to make in symmetric crypto. Usually it's the mark of assumption. We rely on the addition of subkeys to have more confidence in the fact that it holds, but when we look at permutations with a fixed key, as is the case here, we do not have key additions, but we can still make this assumption and then see how it works in practice. That's how it's usually done. This also relies on the assumption that the probability is dominated by a single trail. Assumptions mean verification, so we need to verify if these hold experimentally, and we have, and they do. To make a long story short, the probabilities are indeed independent, I mean act as if they were independent, and the probability is indeed dominated by that of a single trail. So all is well and good, and I can just close the parenthesis, and we have that this approximation is true. This doesn't mean that the work is done far from it, then we need to find a way to estimate, to compute this maximum, and to achieve this we have used Matsui style research to find the best differential trail, or differential trails, if there are several of them. We have done the same type of analysis for the linear case, in this case it's not based on a Matsui search, we have used the SAT solver. We have repeated this analysis for many different sets of rotations, and we have picked the sets of rotations that were the best from an implementation standpoint, so those that have a very small distance to multiples of eight, think zero, one, seven, eight, nine, etc. And then we have picked those with the best differential probabilities and linear correlations, so the lowest. So that's what you have in this table. Here is the set of rotations for ALZ, these are minus logarithm the differential probabilities, if you estimate I described before, and below it's for linear correlation, so the higher the number the better, and for comparison we have also put these quantities for the 64-bit block cypher spec. When the numbers are in blue, like this 24 here, it means that we have a bound, so we know that the probability cannot be higher than 2 to the minus 24, but actually maybe the highest possible is 2 to the minus 25. We are not sure, but we are certain that it's at most 2 to the minus 24. And that's how we got ALZ. We have picked the best set of rotations, and that's the one that is pictured on the on the right of the screen, so 31, 24, etc., etc. I won't go over the specifics of how we have chosen the round constants, because that would take me very far, but we have found that if we use the same constant in each round then it's fine, and we have chosen them carefully to optimize even further the linear properties. So in the end the results for ALZ are as follows. Basically it kind of behaves like an AES round. So one iteration of ALZ has similar differential and linear properties, and two iterations of ALZ have similar properties to two rounds of AES, so that's one super S-box in this case. We have looked at many other attacks beyond differential and linear. We have also looked at integral attacks that would leverage the division property, and we have found that they can not cover more than six rounds. We have looked at invariance, linear ones, but also non-linear ones, and we have found them not to really be an issue. The algebraic degree increases very quickly because a modular addition has a high algebraic degree, and as I mentioned before we could make some experiments to see if there was a big clustering of the differential trails, so many differential trails with the same starting point, the same end point, but different values in the middle. This is something which is observable, but nothing to be worried about, and indeed the very fact that we could make these experiments is a nice aspect of ALZ. As for the name, well this is Luxembourg. If you don't really see where that is, these are the neighbors, France, Belgium, and Germany. The ALZ is this blue thing down the middle. It's a river, and you have a picture of ALZ taken in Luxembourg city, just under my face. H is the name of the hash function, and because it's very close to the University of Luxembourg, which is based in H sur ALZ, ALZ being the name of the river, so that's why. What more can we do with ALZ? Well, we can design new block ciphers possibly with a tweak. So first let's look at a very simple way of building a block cipher using a 64-bit arcs box. So if I give you a 64-bit test box and I ask you to build a 64-bit block cipher, how do you do it? Think of the simple structure and I'm going to give you some time. So very straightforward, you just take a master key that you cut in two halves, look for the first half, apply the arcs box, so the second half, apply the arcs box, and repeat. There is just a small issue if you do that, which is that you might have what we call slide attacks, and so to prevent it you just soar a round counter. And that's it. That's what we call cracks. But this is a round structure. We still need to give a number of steps. So this is a block cipher, EK. How would an attack work in practice? So how do attacks work against block ciphers? You have a distinguisher in the middle. So it's a property of the cipher that you would not expect from a random permutation. And you have that in center rounds, and you are going to try and reach these center rounds by guessing part of the keys from the bottom and from the top. So if you want to avoid attacks, you want to prevent the existence of a distinguisher with a complexity that you would find worrying. And you need to prevent an attacker from reaching the distinguisher in the middle rounds through key guessing. So you're going to compute the number of rounds that are needed to prevent all the distinguishers you know about and care about, and the number of rounds that is needed for all bits of the state to depend on all the bits of the key. The first is RE, the second is RD, and then the number of rounds is basically two times RD to prevent key guessing, plus RE to prevent distinguishers. And to be safe, you don't quite do that. You also allow for possible improvements in the cryptanalysis of the cipher, and you don't use one time RE, you do it one plus eta times RE, where eta is a security factor. And that's KRAX. So KRAX is a cute bird with a very nice haircut, and in our case it's a 64-bit block cipher with a 128-bit key, a very simple key schedule, and given the parameters we have and a security factor of 20%, we have that we need 10 rounds. KRAX is extremely lightweight, so this is a full implementation of KRAX. You can compile it, that's valid C code, that's the KRAX encryption in full. We have made some optimized implementation on microcontrollers, and we have compared them with the best implementations we are aware of of several lightweight block ciphers. This table is in the full version of the paper, the one which is on Eprint, and as you can see KRAX is extremely good, especially in terms of RAM. So the RAM usage for both encryption and encryption is four times lower than the second best, and since we don't really have a key schedule, this one is free compared to others. It's also very small and very fast, so pretty good lightweight block cipher, which you might consider using if you need such a thing. But what if you don't need a lightweight block cipher, but instead something much bigger? In other words, can you use a wide S-box in another way than the trivial one, which consists in just iterating it? Can you go beyond, so we would call this kind of structure even once or in symmetric crypto? Well, yes you can. That's what we call the long-trade strategy, so that's something we introduced when we designed Spark's in 2016, if I remember correctly. And the long-trade strategy is a design strategy which allows you to leverage wider S-boxes. So if you have an S-box which needs several iterations to offer good cryptographic properties like as it does, then it's better to avoid full diffusion in one round, because since your S-box is like a diesel engine, it needs time to get started, it's better to leave it some time to do its thing on its branch and then try to interact with it. So it's a principle which allows us to use a wide S-box and then we'll see why we would want to use wide S-boxes. So this idea of letting the S-box do its thing, in practice it means that you're going to have an SPN structure, a classical one, with a layer of S-boxes and then a linear layer for diffusion. What's specific is that the linear layer will be itself built like a faster network, so like you can see on the pictures. Why is it nice? Let's look at two steps. So when you have two steps, that's what it looks like and you know for a fact that some of the differences will have to go through two iterations of your wide S-box, so of LZ in our case. So we know that as soon as a difference enters here, it will have to go through a double arcs box, and that means that the probability of this trail will be at most 2 to the minus 32 in the case of LZ, which is very low. And you won't be able to leverage some hypothetical clustering that you could have, because the difference, the value of the difference here is fixed since you're tapping the value here to explore it on the other side. Same holds when you have more branches, especially in this case, this given the definition of L prime that we have used is an MDS layer, so it also provides strong diffusion. Long-trade strategy, how to use wide S-boxes to build block ciphers, and the t-rex round function is as follows. So that's just a 256-bit version where you have four 64-bit branches, and you add the tweak directly. So you have a key addition which is not represented with a complex key schedule in this case, but the tweak is just cut in two halves of 64 bits each, and you have the first half which goes here, and the second half which goes here. The tweak is added every second round, every second step, not at every step, because it turns out to give us better bounds. What kind of bound you ask? Give me some time. So how do we justify the security of a cipher built in this way? Why is this a nice way to build a cipher at all? So let's look at truncated trails. Truncated trail is a differential trail where you only care about whether the part of the state is active or not. So you don't care about the value, you just want to know if it's active or not. So for each S-box you want to know if it has a non-zero difference in its input or not. So if you have L-boxes, then you have two to the L possibilities per step. Each S-box is active or not. That's two possibilities, and you have two to the L. So if you have S steps to take into account, that gives you a total of two to the power L times S. So in the case of AES-128, which is a 10-round cipher with an 8-bit S-box where you have 16 of them applied in parallel, that gives you a search space of size 2,160. You're not going to enumerate all of those. So you can reduce this number to some extent using the properties of the linear layer. So not all transitions are possible from round to round to the next. So you can use that to reduce the search space significantly. But what you can do is use white S-box to reduce, and I mean reduce, the search space. In the case of the AES-128, if you were using LZ as your S-box, then you wouldn't have two to the 16 possibilities in each step. You would have two to the power two, which is quite significant. What this means is that if you have an algorithm which uses white S-boxes, like T-Rex, then you can enumerate all truncated traits. You can just loop over all of them. It's practical. It's practical and it's actually quite fast. You don't need a cluster to do that. And what you can do, for instance, and that's the idea of the long-trade strategy, is to enumerate all traits, and then for each of them, you compute a bound on the probability of the differential, and then you pick the worst bound you have found over all the possible traits, and that's a bound for the whole cipher. That's the idea of the long-trade argument. What happens when you add a tweak? So a tweak is kind of like a key, which is public, but not in the public key crypto sense, if that makes sense. So you have a tweakable block cipher has three inputs. The plain text, the key, as before, and the tweak. The tweak kind of acts like a key, but which you can assume is under the control of the adversary. So not only does the adversary know it, maybe they can even choose it, which gives a new venue for attack because then the adversary can inject differences through the tweak. So how do we handle this from a block cipher design's perspective? It's actually very easy when you have a cipher design using the long-trade strategy, because you could enumerate all the truncated differential traits before. Well, in our case, when you add a tweak, that's only two new inputs that you need to consider in every step, which is really not that much. So you can use the same approach as before. You just make a small, well, I'm not going to say tweak, you just make a small modification, which is that the tweak can cancel or maybe add new branches to your truncated tray. And what this means is that the related tweak security is very easy to assess, both from a conceptual and from a computational standpoint, when you have a cipher with wide S-boxes, which is built using the long-trade strategy. So we have gone from how do you build a wide S-box to how do you build a cipher using a wide S-box. And now finally, why do you want to build a cipher with a wide S-box? And one of the reasons why you want to do that is that it gives you very easy security arguments in the related tweak setting. So we have done just that. T-RaxL is a tweakable block cipher, sorry. So we have taken the round function that I have showed you before, added a key schedule to it to handle the master key. The tweak is just sort into the state every second step, because we have found that we got the best bounds when the tweak was added every second step. It has a 256-bit key, a 128-bit tweak, a 256-bit block size, and using the same type of analysis as before, we have found that it needs 17 steps. We have put a bound on the query complexity of the attacks. So an attacker can try as many keys as they want, but they cannot use more than 102 to the power 128 known or chosen plain texts. That's the idea. So it's just not realistic to allow the adversary to query a keyed or a call as many time as they want. And doing so would lead us to use too many steps in the primitive. So we have chosen to give it a very, very conservative bound and work from there. Why would you want to do that with a white tweakable block cipher? You can do some nice things, because you have at this stage quite mature modes of operation that need a tweakable block cipher, which are parallelizable. But then the problem you could have is that usually their security is in the birthday bound, follows the birthday bound on the block size. So the bigger the block size, the bigger the security. Hence the need for a white tweakable block cipher, which TRX provides you. Time to wrap up. So LZ, which is the key component of the Sparkle permutations, which are themselves the key components of our NIST lightweight crypto submissions, has well understood, which is what we want on this box to have, and strong cryptographic properties. So this provides new light, more light on the analysis of this NIST candidate. LZ can also be used to build a lightweight block cipher which is arguably at least as light as spec. It can also be used to build a white tweakable block cipher, which will allow you, since the modes are parallelizable, to better use vector instructions. And also, the wider block size is very interesting for people who work on post quantum symmetric crypto. So that's another context in which TRX could be useful. And maybe in your own design. That's a 64-bit test box. If you need such a thing, then just go ahead and use it. Finally, I urge you to not miss the talk on improved differential linear attacks with applications to arc ciphers, which deals with the cryptonesis of arc-based ciphers, which those using LZ very much are. And it's a paper by Christophe Bayerle and his co-authors, Christophe being a co-author of this one. So be sure to check this one out. And with that, I thank you.