 Hello everyone, my name is Sébastien Duval and I will present this work which is joined with Bigilb-Bilguin, Laurent de Meilleur, Itamar-Levis and François-Exèvier-Standard. This is a work about looking for S-boxes suited for low latency masking and what we'll care in particular about and depth. In this talk I will first introduce the subjects and then I will develop a tour that we made. This tour is made to optimize the implementation of small S-boxes both for and depth and for the number of AND gates. And then I will talk about how we looked for low and depth S-boxes on larger sizes by two approaches. The first approach is a follow-up of the previous talk. We look at power maps and we use a fine equivalence to search for good power maps. And the second approach is about trying to go for even larger sizes using length-doubling structures. In the introduction I will talk about SPN and in particular about S-boxes which is the object that we are interested in here. And we'll talk about sectional tags and about masking which is a countermeasure to them. And in particular I will talk about glitches and their impact on and depth. So one SPN is a structure which is quite standard to build symmetric cryptography primitives. We have a picture here of two rounds of an SPN. So one round of an SPN is first having a round key addition of a secret round key. Then we have a layer of small S-boxes. This is a non-linear layer. And then we have a linear layer with a big diffusion function. This is one round and we iterate this as many times as we need until we reach security. In this talk we will focus on the S-box. The properties we want of this S-box is that it should be highly non-linear in the sense that it should have a high algebraic degree, it should have a low differential uniformity, and it should have a low linearity. In this talk we will summarize low differential uniformity and low linearity into low worst probability which is the worst between the two. And also we want this S-box to be lightweight. Side channel tags are a type of attack in which we consider that the attacker can observe more than the input and the output of a function because the attacker can also observe the implementation and any measure that is doable while the implementation is working. So this can link some information. So the attacker can look at any physical observable measures on the implementation and these can depend on the secret data that is being manipulated. So such measures could be time, energy consumption, electromagnetic emissions, power consumption, or any more. And these leak information about the secret on intermediate variables of the computation. So we need to handle this, we need to counter this and the most popular counter measure to certain attacks at the moment is masking, in particular Boolean masking, which is a type of secret sharing in which we share a variable X into D shares which sum up to the value X that we want to protect. The cost of masking is linear for linear operations such as XOR gates, linear in the number of shares that is. And for non-linear operations such as AND gates, the cost of masking is roughly quadratic in the number of shares. So AND gates are costlier to mask. This is why we usually try to minimize the number of AND gates in a circuit. But what is less known than this is that we also need to care about the AND depth. And this matters in particular in the case of hardware implementations. That is because of glitches. Glitches are electronic phenomena and they are ephemeral incorrect values which happen on the wires between clock edges. So in your electronic circuit you sample on the clock edges and at those times you are sure that the values that you sample are correct. But between the clock edges there are still values on the wires. They are possibly incorrect but there are still values there and they depend on the data so they can depend on secret data. So we need to make sure that these values don't leak anything and we do this using registers which are synchronization layers with the clock signal. When such in all attacks are not involved such synchronization is already used. For example when we want to reuse a gate like in a loop to make sure that the data that is in the gate is clean before we start a new iteration. But when sites are involved the attacker can observe glitches so we can use registers to counter this. When we mask observing the glitch of a masked XOR gate gives roughly no information about the secret data but observing the glitch of a masked AND gate does give a lot of information about the secret data that is manipulated. So we need to synchronize using a register in each AND gate. This has an impact on AND depth because that means that nonlinear operations require a register and that means that since the number of clock edges is equal to the register's depth that means that the layers of AND gates reduce the latency of the circuit. So to have a low latency primitives must have a low AND depth. This has not been considered much in the design of primitives in literature and in particular it's not been considered much in the design of S-boxes and this is the topic of this work. So first I will present a tool that we developed which optimizes both for AND depth and the number of AND gates jointly. This tool works for small sizes and for simple functions such as quadratic functions and it uses set solvers. So this tool takes as input an S-box as a lookup table and it outputs a circuit with minimal AND depth and few AND gates. We know that it is not optimized for XORs. We tested this on 4-bit S-boxes and we used it to optimize 5- and 6-bit S-boxes mostly quadratrix because they are simple. If we look first at what existed before, well here G stands for optimizing for the number of AND gates and D stands for optimizing for the AND depth and we can see that there were quite a few tools to optimize small S-boxes in the literature but there was none to optimize for AND depth. OORs is the first and we do more than that, we optimize jointly for AND depth and for the number of AND gates. We do this by adapting a tool by Kostoffelin which uses set solvers. So the goal of OOR tool is to optimize both for AND depth and for the number of AND gates jointly. We do this by modifying Kostoffelin's tool. OOR tool uses a greedy algorithm which builds a graph with layers of XOR gates and layers of AND gates with weighted edges which the weight gives the AND depth and the set solver is used to find which edges should be there in the circuit for the circuit to fit the lookup table of the S-box. This is roughly the idea and the results are these. Well we tested it on 4-bit S-boxes which is a size on which things are quite well understood. We looked at many S-boxes in the literature and we tried to optimize for the AND depth. In many cases we managed to reduce the AND depth without increasing the number of AND gates so just again in terms of AND depth. Sometimes there was a small cost in the number of AND gates but these are very special cases. So we conclude from this test case that the tool works well and we also observe that most 4-bit S-boxes which have a good worst probability are equivalent in terms of AND depth and number of AND gates. So this is an interesting observation and also what we observe is that the best AND depth that is achievable for 4-bit S-boxes if they have a good worst probability the best AND depth we can reach is 2 which is not very good. We would like to have an AND depth of 1 which is why we will now try to look for other sizes of S-boxes. We will look at this with two approaches. The first approach is a follow-up of the previous talk in which we look at power maps in particular quadratic ones using a finite equivalence and we also care about the cost of the inverse implementation that is because some mode of operation and in particular some mode of operation which are suited for side channel attack resistance make use of the inverse of the S-box. So we need to make sure that the implementation of the inverse of the S-box is lightweight too and this is not obvious. So our goal is to use a finite equivalence to search for low AND depth S-boxes which have an efficient inverse implementation which we will handle by trying to find S-boxes for which the forward and the inverse S-box share a lot of their resources in the sense that with the same electronic circuit we can do more or less all of the forward and inverse S-boxes and we will also try to make sure that if possible these S-boxes should be optimisable for the tool. We managed to look at 5 to 11 bit S-boxes which is quite a large variety of sizes with sizes which are not very well understood yet. There are a few limitations because for instance the tool cannot work for large or complicated S-boxes and we only look at a finite equivalence classes which prevents us from having a precise number of XOR gates for our implementations and for sizes larger than 6 we cannot look at AND gates anymore we only look at AND depth but still this gives some information. If we look at the results for 5 bit S-boxes this is what we get. The first observation that we have is that there is no optimal S-box on this size. The optimal S-box would be having a good worse probability for an S-box which shares a lot of resources with its inverse and with both the forward and the inverse implementation having AND depth 1. This is not possible on 5 bits but what is possible is having an AND depth for which the forward S-box has AND depth 1 and the inverse S-box has AND depth 2 with a good worse probability and a low number of XOR gates. This is doable for instance using the function X to the 5 because X to the 5 actually its inverse is X to the 5 applied twice so this allows of course to implement X to the 5 AND inverse with a lot of resource sharing and we also observe something interesting which is that on 5 bits if an S-box and its inverse both have AND depth 1 then the S-box and its inverse are a finite equivalence which is an interesting observation. On 6 bits it's rather similar in the sense that similarly we don't have any optimal S-boxes but we do also have S-boxes with a good worse probability and depth 1 and the inverse with AND depth 2 and a low number of AND gates and we do have the same property that is that if an S-box and its inverse both have AND depth 1 then they are both a finite equivalent and here also the power map X to the 5 is interesting because it can be implemented with its inverse with a lot of resource sharing because X to the 5 is actually well its inverse is actually X to the 5 multiplied by X to the 8 which is a linear function. This is a bit less of resource sharing than in the case of 5 bits because here we require a multiplication in the find field which still costs a bit in terms of AND gates and AND depth and if we look at other sizes 7 to 11 bit S-boxes well actually on every size apart from size 8 the we find very nice S-boxes which have a low worst probability which have low AND depth and with the inverse having a rather low AND depth 2 and with some resource sharing between the forward and the inverse S-box. So what is interesting here is that there are some very promising power maps on large sizes which are not very well understood so which would be very interesting and I think it's very worthwhile to look a bit more into these sizes because such S-boxes could be very useful. Now for the second approach we will try to go for even larger sizes by using length doubling structures such as Feistel, Misty and Bridge to build structured S-boxes. So the idea here is that if we have larger S-boxes we can actually hope for a better worst probability in the sense that the worst probability can be much lower than for small S-boxes. If the worst probability is lower that means in a whole primitive like in a whole SPN we may require less rounds and less rounds means a lower AND depth for the whole primitive. So larger S-boxes if we can find some with a low AND depth and a good worst probability they can actually imply a lower AND depth for the whole primitive. What we do have on large sizes is S-boxes with a good worst probability and a low AND depth. We also have some structured S-boxes only on six and eight bits which have a good worst probability and an efficient inverse in terms of resource sharing with the forward S-box. And what we want is to have a good worst probability, a low AND depth and some structure to allow for an efficient implementation and some resource sharing with the inverse. Our approach is to build such large S-boxes from smaller ones and to optimize the small ones using the tool. The structures that we will use are these a 3-round Feistel network, a 3-round Misty network and the bridge structure. 3-round Feistel and 3-round Misty were already used to build S-boxes. A 3-round bridge is actually a fine equivalent to something that existed which was used in the Litlin S-box. What is interesting with these structures is well use what you do is you split an input X which is an N bit into two halves of N over two bits and then you only do operations on half the size of it, so on N over two bits. You do XORs on N over two bits and you do operations S1, S2 and S3 which are non-linear but on N over two bits. And why are all three of these structures interesting? Well actually it's because for 3-round Feistel network in the forward implementation the AND depth will be that of going through all three of S1, S2 and S3 and it's the same for the inverse so you will go through 3SI for the AND depth for Feistel network but actually for the Misty network from the input to the output you only go through two of the SI. So the AND depth for a Misty network is only two in the forward direction two of the SI and for the backwards direction so for the inverse you need still to go through three of the SI in terms of AND depth but for the bridge structure you only go through two of the SI both for the forward and for the inverse direction which is interesting it can lead to a low AND depth. This is the intuition and now if you look at the results I only put three kinds of grades in this table so a check mark to say good dash to say that it's decent and across to say that it's bad and first we will just look at size six so six bits because it's quite representative of some phenomena so what we can see is that Feistel network at Misty and bridge all give a good worst probability but Misty and bridge give a better forward implementation that's in terms of AND depth mostly and the bridge structure also gives a better inverse implementation and this actually appears in most of the sizes what we can usually see in the whole table is that bridge is always slightly better than Misty because it's better slightly for the inverse implementation that bridge is the best for AND depth but what we also observe on some sizes is that sometimes Feistel outperforms Misty and bridge in terms of worst probability so this can also be an interesting structure to use and it's also better for resource sharing with the inverse but globally what we can observe is that if we forget for once about the the inverse implementation we have some very good options which have an excellent worst probability and a very very good forward implementation so these are very interesting S-boxes to use and also even if we look including the inverse implementation we still have some very good candidates on such very large sizes which are not usually considered so we have some low and depth S-boxes with a very good worst probability on very large sizes and this can actually lead to a low and depth in some primitives hopefully so this is the end of the talk let's see what what we've been through what we did was an exploration of a large space of S-boxes in a search for low and depth S-boxes and we explored some sizes which are not usually considered and which are not well understood but some S-boxes on this sizes were very promising we also observed that four-bit S-boxes well the case is mostly settled but it's bad for and depth and larger S-boxes can actually be very promising for this we also observe as a side note that two and minus one bit S-boxes seem to be better than two and bit S-boxes there's a lot of future work needed first we we need to look at what happens in four primitives so if we plug these S-boxes into a full SPN for instance what what happens do we need less runs indeed what what really happens in a full primitive and we also need more study of large S-boxes which because they are really not well understood and of course better automatic tools that can always be useful to look at larger or more complex S-boxes one takeaway message that we should always think about is that and that is important for secure hardware cryptography and also there's a lot more detail in the paper and in particular there is a large portfolio of concrete circuits for all these S-boxes for all these sizes so if you are looking for implementations of S-boxes there are many different choices in the paper thank you for your attention if you have any questions i will be happy to answer during the questions session