 So, like I say, we are interesting to find some new S-box or some S-box that can be efficiently implemented in hardware and particularly in a traditional manner. And traditional implementation is really useful to be resistant against such an attack. So let me start what it is such an attack, what is a such an attack model. In the Blackbox model, we have our adversary that makes some requests to some Blackbox or Rykel and you have to do some computation and you have to make a lot of time before you recover the key. In the such an model, we allow the adversary to observe the device that implements the algorithm. And this device will require some time, some power consumption to perform the operation. Then if the adversary monitors this through an oscilloscope, it will attempt some traces and with the help of the traces, now we know since the light 90, that key recovery can be really practical. So to counter such an attack, a command control measure is to use masking. So on the left part, I represent what happened in the unprotected cases. We have our device that implemented the start of, for example, the AS. I have my plain text X that I absorb with my key K and I'll apply the S-box. And since my device use some small bit of the key for the S-box, I can perform some exhaustive search on the value Y and mount an attack. The idea of such an other of masking is to use secret sharing. And then I don't have my secret, but now I will share my secret in several parts. And then if my masking is correctly implemented, I have to guess different value and that is more complex to mount an attack. And in particular, we have a theorem in masking that says the number of traces needed to mount an attack will grow exponentially with the number of shares that we use in the implementation and with the noise as a basis. As we saw yesterday for this theorem to earn, we need to have some enough noise on each shares. Other condition we need is that our randomness is unpredictable and uniform. And finally, we have some physical assumption that says that our device doesn't combine different shares. That is the independence of leakage and it can be defective in software due to transition or in hardware due to glitches. And to have masking in hardware, a command control measure is to use threshold implementation. So in the threshold implementation, I have my function F and I will also use some shared function. And so what I want for threshold implementation is the correctness. Only if I use the shared function on the shared input, I should obtain the same result when I open my shared output as if I apply the function on the input. The second condition is the non-completeness and that is for resistance against first order attack and that says that each sub-secret should be independent of at least one share. If it is independent of one share, it is independent of the secret as I need all the shares to recover the secret. And then whatever happens in this function, it will be independent of the secret. And finally, we need a third condition that is called uniformity. And what that means is that the shared output should have the same distribution if I calculate from Z1, Z2, Z3 through F or if I just share Y. And this can be difficult to achieve when we want to have threshold implementation. So sometimes we just add some randomness to obtain a uniform sharing. Due to the non-completeness, we can see that if my function F is of degree T, I will need to have T plus 1 shares. We can see we will have to compute some monomial of degree T and due to the non-completeness, I will have to have at least T plus 1 shares. So the idea to reduce the number of shares needed is to use the composition of the function and I represent my function F as a composition of two functions. And now I can just share the first function and the second function. I have the non-completeness, I have the uniformity so I can reuse the output directly but I have to stop the glitches so I have to put some register in between. So that leads us to different techniques to implement the function and basically I will present, I will talk about two different techniques. The first one is some raw implementation where this register are for the input of the different S-box and then I have the function, the shared function and I have my register to stop the glitches and each clock cycle I will move off one. And then in the number n to the number of functions, I will compute all my S-boxes. That is nice and we can always do that but we can also have the second type of implementation if the function we use to perform the composition is the same. And then I will just have to implement once my function F and use some multiplexer to say if it's not the number of the composition of my function, I keep the same input and at some point I will input the second input. So this can lead to really small implementation but we will need more clock cycle to perform our S-box. So what I've been doing here for S-box in threshold implementation, there is a nice paper at chess 2012 that look at our threshold implementation for free and for bit permutation. And a particular result of this paper is that among the different affin cases of S-box, 35 can be easily shared in a threshold manner and what I mean that efficient in threshold is that we just have to use free shares, no extra randomness and we have at most two stages. So composition of two functions. Another approach to look at threshold implementation for S-box is to take S-box that have good cryptographic property and try to find some nice threshold implementation. So a lot of effort have been done for the AES as it is a standard and there are different trade-off between the space of the S-box, the number of randomness needed, the number of shares, et cetera. And there is also some more friendly S-box as used in Fedes or K-Chack that use smaller number of bits. And finally the question we have when we saw this result is can we have some 8-bit S-box which good cryptographic property may be not as good as the AES one but still decent and can they have some really efficient threshold implementation? So our idea is to say we have our result for all 4-bit S-box, can we use this to have threshold implementation of 8-bit S-box and actually in the literature a lot of Cypher use smaller S-boxes to build larger S-boxes. That is the case of Clefia, Krypton, Fantomas, Iceberg, et cetera. And among all these Cypher just one of them can be implemented in an iterative manner. So that is nice. We have some nice threshold implementation but can we obtain some better results, some better S-box which similar cryptographic property as this S-box but that can be take advantage of iterative implementation. So we launch some search and we first look at function by Z on the substitution permutation network and in that case I will use some S-box F1, F2 on 4-bit then apply some affine transformation and I can add some constant as the key addition. So this is the structure used for Iceberg, Hazard and Wirepool and if I want to enumerate all S-boxes of this form I have 16 factorial choice for F1, 16 factorial choice for F2 and that is too much. So to reduce the cost of the search we say anyway we want to have some efficient implementation iteration so we just have to look at the 35 cases that are particularly efficient to mask. And then we look at different case for affine transformation and also for the constant and then we have a set of 2 to the power of 32 S-boxes. So we run our search, I will show the result after but first let's look at the second construction that is Fester network. So Fester network we just have one function and each input our 4-bit and this is a structure used for Robin and Scream in particular and there is some mathematical proof on the bond of cryptographic result we can obtain up to 3 rounds. So we know what we can have but if we do more rounds maybe we can have better results. To remark about the Fester networks the first one is that whatever the function is even if it's not a permutation due to the Fester structure I will obtain a permutation. And the second remark is that as I absorb the right input to the output of my function F1 if I can compute F1 in the threshold manner in one cycle I will obtain automatically a uniform sharing on the 8 bits and that gives us uniformity for free and now the number of choice is all the function from 4-bit to 4-bit and this is 2 to the 64. So we try to reduce this search space and what we found is that we have function can be defined in F in cases that basically F1 is F into F is in the same cases of F if there is this A and B to F in permutation and serial linear mapping such as F1 it's equal at the composition of B by F by A plus Z. So this is nice but it don't reduce the search space what we can look at it's at one function of the from A minus 1 F A and if I write my one function like that I can also rewrite it as applying A minus 1 to both input then do a Fester run with just only F and then apply A to the both output and if I compose this function my A minus way A minus 1 and A will cancel out and then what I obtain it's A minus 1 Fester of F and A and that is a finic equivalent and in particular of the same cryptographic property of Fester of F. So we don't have to look at all function of this form we can just look at function of from F composed by A and we can add the linear mapping C and then we reduce the cost due to the power of 46 and to do our search we use graphic process unit. So what are the cryptographic property we want for our S-box first we want a B-jection but for SPN we look at permutation components so we will have a permutation in any case and for the Fester network we have this due to the behavior of Fester network then we want our expect to have to be nonlinear or as most nonlinear as possible so that means we don't want to have a linear approximation of the function. We look at differential uniformity that means if I put a differential input on my S-box it should be difficult to predict the differential I should observe on the output and we also look at algebraic degree. So let's go to the result I represent in the table in our paper we have a larger table which more S-boxes and more criteria if you are interested in this I hardly encourage you to look at it but just to show some more nice results so the first column is for the different S-box that we look at then we have the different cryptographic property and in that case the IAS S-box is the best S-box we know up to today and we don't find any S-box that is better than the IAS not surprisingly and that we found some that are similar value as S-box that are used in the literature. And then we have the two-shot figures that basically is the area and if we can use iterative implementation or re-implementation and basically due to as we use the same function our new S-box can be done in an iterative manner and what we can see is that we have some really small implementation when we look at iterative implementation. We have also the number of stage that is basically the number of clock cycle we need to perform one S-box and that is also the number of register we need to have and there is a number of masks that have been used to obtain uniformity and that is only for the IAS we need to have this so there is some nice S-box that are less good as the IAS S-box but that can be implemented efficiently in three-shot and finally we have the type of the function. So what we can see from this table is that we have interesting trade-off for good cryptographic S-box and in particular we can see that the S-box we found if we compare to what we can find in the literature have smaller trade-off implementation even if we did that or our security requirement and finally due to our exhaustive search and FESTA network we have this side result that say there exists no head bit FESTA network with the identity with the same function and if we iterate up to five run we can have something better than as before so that is the same as wire port that is some more interesting for some people and then so what we can what is a take-off message of this paper what we should read it is because we have a large set of S-box that we present that can be really easily implemented in three-shot manner. Our S-box are also really efficient when we want to implement in an unprotected manner in hardware and maybe some other interesting result is that the function ASB2 that have the same cryptographic property as Swabin and Scream have also the same number of AND gate and that is of particular interest when we look at software bit slice implementation so we can have this S-box that can be both efficiently implemented in three-shot manner but also for software masking and that is nice I think so that concludes my talk and if you have any question please ask.