 The next talk is on Lightweight Insight Channel Secure 4-Bit S-Boxes from Cellular Automata Rules. The paper is by Ash Rujit Goshal, Rajat Sadukan, Sikhar Patranabis, Nilenjar Datta, Stjepan Pichek and Depdip Mukhopadhyali, and Rajat will give the talk. Thanks for the introduction. So in this talk, we are going to discuss on Lightweight Side Channel Secure 4-Bit S-Boxes design used from Cellular Automata Rules. So the first thing that motivated us for this work is the NIST Lightweight Competition requirement and to provide a solution for the resource constraint devices for IOTs. So for Lightweight block ciphers, the common matrices that we use from cost perspective are area, memory or energy consumption and from performance perspective is throughput and bar consumption. Now most of these matrices are orthogonal to each other. Suppose if you increase area, then your throughput will be better but if you make a better throughput, your area might degrade. So for our work, we focused on area as the Lightweight Competition requires that it has to be a low cost and side channel resistance is necessary for the application. So coming to some Lightweight block ciphers, we have a present or gift or Midori where which follows a linear layer which is very lightweight that follows bit permutation or almost MDS and shuffle cells. For SCA measures, linear layer is much more cheaper but while it comes to non-linear layer for S-Boxes, it becomes heavy especially for making it side channel resistant. So side channel countermeasures requires dedicated design where we try to address this issue in our work. So our goal is to design this 4 cross 4 S-Boxes with side channel resistance using cellular automata rules. So what we have done is we have used a cellular automata rule with focus on area and with those rules we tried to classify based on the number of cubic quadratical linear terms in the ANF and using that we have seen that we have smaller area footprint with those S-Boxes 49% and 35% and it consumes lesser power compared to present or gift S-Boxes. Now following this we designed two design paradigms with these S-Boxes one with bit permutation and another with almost MDS matrices. At the end we will show those results as well. So before going to the details let me give you some background. The first thing is we call an S-Box optimal which follows bijective property with non-linearity 4 and differential uniformity 4. Then for side channel countermeasures we randomize the intermediate values and using secret sharing scheme. So we followed a threshold implementation scheme for our S-Box side channel resistant design. So proposed by Nikova. So this is a countermeasure against the differential power attack. So this is also based on a secret sharing scheme and secure multiparty computation which has correctness, non-completeness and uniformity property. So let's take an example. So I have taken an example of an AND gate which has two inputs. We are sharing the x, y input into four shares and we will get output also as four shares and we have four output functions to be implemented in different modules to get the final output. So this particular example follows all these three properties. So correctness, non-completeness and uniformity. If you do an exhaust sum of all the outputs you will fulfill these three properties. Next is cellular automata. Now cellular automata is a local rule is a function which consists of grid of cells where each cell has some finite set of states and the state of each cell is dependent on some neighborhood of its cells. So for our function, for our S-box design we use a rule, a function, iterated overflow four clock cycles to get four outputs. So every cell, for example, if our function is x1 to xn, every cell, you consider that there are n cells in the function and each function and if it depends on four, suppose for example d number of neighboring cells, so x1 will be depending on the d number of neighboring cells. Similarly, all the cells will be dependent on its neighboring cells. So this is an example where we have taken n as 6 and we have taken d as 3. So if you see the first output, like the 1 in the first output, this 1 is dependent on this 1 0 0 to give the output as 1. So now comes designing how we use this to design our S-boxes. So if we take four variable rules, we will get around 2 to the power, 2 to the power 4 total rules and we check the optimality. The optimality definition is according to the slide that I have discussed before. So according to that, there are 512 rules which are optimal, which S-boxes are optimal. So using that, we are permuting the input bits and we are getting in four clock cycles the four output bits. So we have observed that if we classify those rules according to their ANF representation, first of all those rules has a algebraic degree 3 and if we classify those rules according to their ANF representations, we observe that since the algebraic structure of... So suppose if f1 and f2 are two functions with same number of cubic, quadratic and linear terms, so those two functions will have, since their algebraic structure is same, so they have same area footprint and same identical TI structure. So these are the classes that we have obtained, like 1, 2, 2, meaning 1 is the number of cubic terms, 2 is the number of quadratic terms and 2 is the number of linear terms. So we have obtained for the... In this 512 S-boxes, we have obtained these are the different classes and this is the TI implementation of algebraic degree 3 function where we are... So these are the four inputs of an X-box, S-box, X1, Y1, Z1, W1 and we have this threshold implementation and followed by a demux circuit. We are using a shift cyclic register to rotate the bits in every clock cycle and we are obtaining 1, 1 bit in every clock cycle. So we require four clock cycles to obtain the threshold implementation output of the S-box. So we have for all these boxes, representative S-box from each classes we have taken and we have used 180 nanometer technology to synthesize our design and we have computed the dynamic power as well and we have observed that for class 1, 2, 2 and class 1, 3, 3 the area requirement and the dynamic power requirement is the least. So one observation though we don't have a proof for this but we have observed from all these classes that if the number of cubic terms is less first priority is on cubic terms. So if the number of cubic term is less, the S-box class becomes more area and power efficient and suppose if the number of cubic terms are same then the sum of linear terms and the quadratic terms has to be less to make it more area efficient, area and power efficient. So then we went further and we optimized the function that local rule having algebraic degree 3 to make it we decompose that function into two functions with algebraic degree 2 and we obtained a more optimized area efficient T i that is we call it as a composite T i where in the first clock cycles from each classes we have taken and we have used 180 nanometer technology to synthesize our design and we have computed the dynamic power as well and we have observed that for class 1, 2, 2 and class 1, 3, 3 the area requirement and the dynamic power requirement is the least. So one observation though we don't have a proof for this but we have observed from all these classes that if the number of cubic terms is less first priority is on cubic terms. So if the number of cubic term is less the S-box class becomes more area and power efficient and suppose if the number of cubic terms are same then the sum of linear terms and the quadratic terms has to be less to make it more area efficient area and power efficient. So then we went further and we optimized the function that local rule having algebraic degree 3 to make it we decompose that function into two functions with algebraic degree 2 and we obtained a more optimized area efficient T i that is we call it as a composite T i where in the first clock cycle we obtained the intermediate values in the next clock cycle we obtained the final values for the S-boxes. So using that if you see we have still reduced the area further and for class 1, 3, 1 we noticed like we have almost reduced 80% from our previous implementation from algebraic degree 3 to algebraic degree 2 decomposing that function and then followed we have evaluated our design using fixed versus random text vector leakage assessment and we have collected 1 million trace for that and we obtained a value of minus 0.42 plus 0.3 and the permissible range is minus 0.4.5 to plus 4.5 so we can conclude that this our design is also side channel secure design. So now we have we have designed two applications based on this S-box first one is completely focusing on the area. Now what we observe is if we I mean if if we use if I mean for if the diffusion layer is less then we need more rounds and that will degrade our throughput. So for present we have a branch number 3 and for gift we have branch number 2 and but gift S-box possess BOGI property which they utilize and made and reduce the number of rounds to 28 but if we use our S-box with the bit permutation we get more number of rounds because we do not have the BOGI property but it will but our this this paradigm has the least area because permutation doesn't consume area so for using this paradigm we are able to hit I mean we are able to show that our design is optimized in terms of area. Next is design paradigm 2 where we are using almost MDS almost MDS has better diffusion so we get a better higher throughput so since we the number of rounds are less in that so with that we could reduce the number of rounds from 40 to 16 but the problem with this design paradigm is the area would be a bit more because of the MDS layer almost MDS layer so this is the table where we compared the other block ciphers along with our CA based one so we have I mean with this class we have seen that with 16 S-boxes and diffusion layer we obtained an area of 2253 with 131 class of CA based S-box with this is the G in 180 nanometer and this is the throughput that we have obtained with almost MDS so with almost MDS throughput increases but area throughput is good but the area degrades but with bit permutation area reduces but throughput becomes much better we have also explored the non-optimal S-boxes like for example there are other S-boxes the other rules which does not follow the optimal property we have explored all those non-optimal CA based S-boxes also but we found that those CA rules are not suitable to construct a CA based S-box as they do not have BOGI property and even the non-linearity is also not good so in a future work we will extend this to 8x8 CA based S-boxes and we are considering this as a design for the NIST submission which would like for this design as a theme for the NIST submission Thank you Questions? Yes Thank you Thank you for the talk Can you go to slot number 18 please where you showed the construction that you have I have actually two or three questions one is what is the latency of your design here means that just given one input to the S-box you need a couple of clock cycles if I have understood correctly How many clock cycles you need? Four clock cycles and then you are switching them shared input to be given to the same circuit Yes and then these changing between the shared input is already known that is leaky you cannot easily just change the input of the S-box which is shared to the one circuit you should have some pre-charging a state in between otherwise the shares that you have they will be overwritten on each other at least on the signals and then you may already violate the non-completeness property But if we implement this in different modules then it won't violate the this leaky I mean it won't leaking in that case I mean if we implement in them with the hierarchy hierarchy on option in different modules then it won't be Then why do you need four clock cycles There are four different modules Yeah, but four clock cycles is required because in each clock cycle we are obtaining one bit of output of the S-box But then you use the same circuit the same sub-circuit to make the other let's say bit of the S-box output or shared of the S-box output Share of the S-box output Yeah then as I said you know you have shared input to the circuit and you just change the input to the S-box which is shared again you mean you swap some shares to the S-box Probably we need to discuss about this in offline because it probably takes longer My second question is how do you achieve here a guarantee uniformity Yeah, so uniformity is like we have the expressions like when we decompose a Boolean expression into the number of shares we check the uniformity using the uniformity table like if the values comes zero or a specific single value then it's uniform So whatever for this class one three one and one two two whatever so we have I think we have the expressions I think wait I think that slide I don't have but we have it in our paper that when we decompose this expression like class one two two or one three three using three shares we have all those expressions checked with our it is meeting the uniformity But you consider the uniformity only for one single bit of the S-box output or just for all of the outputs because you cannot just consider one part of the S-box output and say this is uniform and then if I can't concatenate the other uniform output the result will be uniform considering all the outputs all the outputs uniform because if we divide this expression into three output shares then those three functions are all uniform it will follow the uniformity property now since we are using the same function so it will follow the uniformity property overall Thank you and yeah thank you for the talk you compared your S-boxes with other S-boxes but the cost mostly depends on the implementation not that much on the S-box so what implementations did you use for these existing S-boxes? we used the what implementation like LUT based optimization for other S-boxes and for our function based okay any other question to Rajat? okay if not let's thanks Rajat again thank you