 Hello everyone. I'm going to present our paper entitled as Consolidating First Order Masking Schemes, Nullified Fray Shandomness, which is joined with Amir and I'm Ayn and I have the pleasure to present this paper in this video. Masking schemes are one of the most popular compromises against site analysis and which is based on a randomizing sensitive data during the execution of the cipher. So in the masking scheme, we usually divide the sensitive variable to some shares and which may force the adversary to recombine the shares to recover the key. But how we can evaluate a given mask design? So to this end, probing model was proposed. So adversity can probe the intermediate values and each probe are exact and independent. And it has been shown that security in this model also provides security in other models. And if any combination of intermediate values does not reveal anything about the circuit, then design is secure against the order side channel attack. And due to its simple basis and due to its simplicity and its abstraction, probing model is basis for many problems in site channel analysis. However, it does not properly work in hardware implementations. The reason behind these is glitches. Glitches are unwanted transient at the output of a combinatorial circuit. And this fact usually or mainly due to the unbalanced path of the input of a combinatorial circuit. So in this model, when a probe plays on a gate, it propagates backward up to that synchronization point. So here is a simple logic circuit. So if probe the output or one of the output, the probe propagates backward and the adversity have information about the odd for input bits are involved in the calculation of original probe. So to this end to mask hardware platforms, main methodology has been proposed. So one of the first methodologies that are immune against glitches is terrestrial implementation or TI. TI is based on three essential rules and condition which are called correctness, non-completeness and uniformity. And basically in this design, we made a design which is correct and non-complete and by adding some correct terms, we try to find uniform sharing. The number of input shares is defined based on the algebraic degree of the target function and also security order D. However, in D plus one sharing, we use only D plus one, so it's independent of the algebraic degree of the target function and also the structure of D plus one sharing is something like this. I brought a very simple example. First order secure to input AND gate using two shares. In D plus one sharing, the mask usually is divided into two parts. One part is called component function. Now here we have four component functions and I showed one of them with the box. The result of each component function should be stored in a register and we have compression layer is basically the combination of some component function by X or to generate two output shares in this case. But for some Boolean function like this one, hobby plus C, it has been shown that it's not necessary to use fresh masks. So in this case, C, which is a variable of the target function F, can be seen as a fresh mask and then it can be shown that this design is each extended problem secure and also uniform. If you replace the C with B, then we want to see whether it remains secure or not. So to evaluate the security around this table. In this table, you can see each intermediate variable here. So if adversarial probe, one of the intermediate values like X prime 0 to X prime 3 and for example here X prime 0, which is an output of first component function, you can see that if I fix the unshared value R and B to anything and then C probe up and I go over all valid input shares, then I see always 3 0 and 1 1. So the adversarial, regardless of the unshared value, see the same probability distribution. So it means that it's completely independent of the R and B. I can explain this one in other way because every component function is non-complete. So in this case, it receives only O0 and B0 and O0 and B0 are independent of O and B. So probing any component function output does not reveal anything about the secret. So what if adversarial probe, one of the outputters, so in this case X0. So in glitch extended probing model, it expands to 2 probes as you can see here. So to evaluate the security, we have to see this part of the table. And as you can see again, regardless of the unshared value R and B, the probability joint distribution is always the same. So we can conclude that this design is always Q and it doesn't matter the adversarial probe, which part of the circuit and replacing the C with B also leads to secure design. So this means that we can make a secure mask implementation of 2 input and gate without any mesh mask with only 2 shares and by only inverting the R in this function. So R bar B plus B is already all there. And if you follow this construction that I show in the last slide, then we can have a secure design. I would like to stress that this is the first time such a mask implementation is presented in the literature. So we are going to make it a bit generalized. So if F is an arbitrary 2 input function, which is a quadratic, then we have always the term R bar and the shared variant has 4 quadratic terms that you can see here. So due to non-completeness, we should have at least 4 component functions. So to find a secure design, you follow the below steps. So basically we made the set F0, including all possible 2 input constant 3 coordinate functions for F0. So in this case for F0, it takes all 0 and B0 as its input to have 4 different coordinate functions, namely all 0 B0, all 0 B0 plus all 0 and then we have all 0 B0 plus B0 and all 0 B0 plus all 0 plus B0. So each set has 4 different coordinate functions and the set F0 has 4 different elements. We can do the same for other sets and make F1 to F3. And then we search for tuples F1 and F0 if we suppose that the component functions 0 and F1 are compressed. So we are searching for tuples, which leads to identical joint probability distribution. As I've shown in the last slide, we should make sure that if the adversity prone of the output shares and it expands to some of the component function output, we should always see the same joint probability. This ensures the security in which extended probing mode. And then they are X or I mean the output of F0 and F1, X or together, we should see a balance function, which means that it should yield to as many as one as 0. This is a necessary condition to achieve uniformity. So if any of those uppers fulfills both conditions, we should add them into the set F01 and do the same for the component function F2 and F3 and make the set F2 and 3. And the last step, we search for the tuples whose X or makes a correct sharing. So in TI, in the first step, we have something which is correct and non-complete and trying to add correction terms to make it uniform. However, in our algorithm, we have something which is non-complete and uniform. And also a glitch extended probing secure and then we try to find the correct sharing. So we can make it a bit more generalized for any two input function. So then we are forced to use at least eight component functions because we have a QB function and we have three inputs, each input is shared with two. And then we have two to the power of three, which means eight component functions. So the ester is pretty similar to last one. We make sets F0 to F7 and each of them should include all possible three cubic terms. Then if we assume that F0 to F3 are compressed, we search for tuples that again has the same identical joint probability distribution, which again is short security, glitched in the probing model, and then their X or is a balance function. We keep those tuples fulfilled both conditions and add them to the set F0, 1, 2, 3. We do the same and make the other set and then we try to find the correct sharing. So in this way, I found many solutions for three input end gate without any fresh masks. So this is also the first time such a construction is presented. So let's make it a bit more generalized and for more complex Boolean functions. We have a function F, which is a constant free as always. If it's not, we can remove it at first and then add them in the shared mask version of the target function, which is F here. So again, because we have four input and also it's cube function, we can share it or we can realize the share variant with two input shares with eight common functions. I would like to highlight that this is not the only possibility and we have some other possibility to share on or basically distribute the share term into component functions. And the algorithm is pretty similar to three input cube functions, so I'm not going to repeat them, but we have more search space. It's harder and we need more time to apply algorithm to such functions, which has four input and with algebraic three of three. So we take the middle S box. And we know that it's for bit to four by Jason and each coordinate function is at most cubic. So we can apply an algorithm to each of coordinate functions. And as you can see, we have many solutions for each coordinate function. So, for example, here for one coordinate function, we have about 70 million solutions, which is secure under each extended promoter and uniform and also of course correct. But it doesn't mean a combination of these solution leads to a jointly uniform solution. So to find the joint uniform solution, if we take two solution and then check the uniformity and if it was uniform, we add the third one, or we discard it. So in this way, we discard the non-uniform solution earlier. And if the third one is jointly uniform, then we add the last one and check the uniformity. And keep in mind that the number of possible combination in middle S box giving these numbers is very high. It's not possible actually to check it all in months. So we found the solution and actually many solutions for Midori. And based on our findings, we realized a two-share round-based implementation of Midori, which supports both encryption and decryption function without any fresh mass and would not stress that this is the first time such a construction is presented in literature. So we have applied our technique to present espouse and we also found millions of secure solutions on the clock beach probing model and also uniformity. And then we have designed a two-share serial implementation of Midori to make it fair to compare it to a state-of-the-art. And again, this is the first time such a construction without any fresh mass using the minimum number of input shared is presented. We also applied our technique to print S box, but it's not as I said before that the former cases. So both S box and S box inverse are used in prints. And as always and like in former cases, we have many solutions for each coordinate function. However, we found no solution. We jointly informed for S box or its inverse. So basically, if you take three output shares, then it's jointly uniform, any combination of three. But then we add the fourth one, then it's not uniform. We have already searched all the space and we didn't find anything. But based on these constructions, we implemented both S box and its inverse. So here, and we used the component function and the register and the compression and we also have a state register here. And this design, we didn't see any leakage in practice. I mean, getting 100 million traces from FPGA, we didn't see any leakage and this is our observation. And we think the diffusion layer here plays a role. So if you compose a couple of S in a row without any diffusion layer, we have more leakage. But if we add this diffusion layer here, then the amount of the leakage is too small to take in our cases. So this is only our observation and not a proof. And we also applied our technique in ASS. So basically, two tower field approach and use it. So we have three multiplier here and inverted here with the 4B to 4B function and a square scale function, which is a linear. Then we have two sports. So to apply our technique to these functions, we integrate all these functions and make one function which is 8B to 4B function and we call it a square scale multiply. And then we have an inverter here, which is a 4B to 4B function, which is at most cubic. And a square scale multiply is quadratic, each coordinate function. We have multiplier here, which is 8B to 4B function. And each coordinate function is quadratic. So we can use our algorithm to find the solution. And actually we found a secure probing, sharing, and also a uniform for each of them. And based on that, we introduced two designs, one take one fresh beat here to make it secure and uniform. And then we found the solution without any fresh mass here. And also the multiplier, two multiplier here is also uniform and probing secure. And even though these 4B and these 4B, which is output of this multiplier and this multiplier is uniform, these 8B is not uniform. So you can get rid of these one fresh mass here by implementing the square scale multiplier function twice in such a way that it provides two different outputs. And then we have to implement inverter twice. And then we have a multiplier here. And again, the output of each multiplier is uniform, but it's not joint uniform when considered these 8B. So I should highlight that this is only the inverter, so we need an input affine here. So let me show here, which is easy. We apply the input affine as to the result of the register and then add it in front of it. However, we need output affine here, but we cannot give it right away because it's not uniform. So again, we are going to make a use of diffusion layer of AES. So R prime, B prime, C prime, and D prime are the output of the inversion, the mass inversion. We need to apply the output affine and then the mass column, and x, y, z, and t are the output of the mass column. So we divide this or basically decompose the two functions, mass column prime and beta. And because all of these operations are linear, we can change the order. So basically apply the beta first, and sort the result in the register, and then apply the mass column prime. So as you can see here, beta has only 0 and 1, and does not mix anything from the same S-box. And for example here, you can see the addition of other versions output as a fresh mass. So then x prime, y prime, z prime, and t prime become uniform individually. So of course if you consider all of them together, it's not uniform anymore. So then we apply the mix length prime and output affine, and then we have the output of the mass column. So here you can see the general structure of the AES encryption, which is a byte serial implementation. We have key registers, we have a state register here. We apply beta, store the result in the register, and then output affine and mix column prime. And the result of the mix column prime also register here. So because we didn't apply the output affine right after the inversion here, we need to apply output affine inverse to make sure that the key expansion is correct. And as I said before, we have input affine here, which is stored in the register, and then we have the inverter here. And again in our FPGA analysis, we didn't see a leakage. And we're seeing again this mix column and beta function and mix and prime play a role and make the leakage too small to detect in the FPGA evaluation. So here's the result. As you can see, Midori has a bit more area overhead compared to the state of the art and also has a bit more delay. But an advantage, it uses two shares, two input shares, which mean that it needs less initial masking. Our present, it has lower area overhead and also roughly the same delay and also is two shares compared to the state of the art, which is a two shares. Our prints designs has no freshness and basically it doesn't need any fresh masks. And the area overhead and delay is also roughly the same with the state of the art. And our AS, we have two variants with one bit of freshness per SBOX and per clock cycle because we use only one SBOX in bytes of implementation. And we also have no freshness design, which is a bit larger because you have to instance of some function twice. And you can see here, we have the smallest AS implementation here in the state of the art and we use basically one fresh mask here and no fresh mask here. So as an evaluation, as I said, we verify the security of our construction using silver. All the designs are secure under the glitch and the probing model. And as I said before, some of them are not jointly fully uniform. And but for trains and mid-air, we found a solution which is jointly uniform. And because silver, it is still possible with silver to analyze the full encryption, we perform practical analysis on FPGA and security FPGA and getting 100 million races. None of them, including in AS and prints, we didn't see any leakage. So this is my last slide. And in this paper, we provided a methodology to realize first-order two-share masquerization of nonlinear function without any fresh masks. And we introduced at first time the SQ AND gate with two input and three input with no fresh masks. And also the media response and presence box. And we also applied our technique to prints and AS and didn't see any leakage in practice. And our designs were the best of our knowledge or the only one which used only two-share without any fresh masks and without applying changing of guards. So thanks a lot for your attention and watching this video. Please don't hesitate to ask me if you have any questions or if you have any suggestions. Thank you so much.