 Hi, I am Annapurna Valiwaiti. This is a scholar from Reply to Bangalore. I'll be presenting our work title, Second Order Masked Lookup Table Compression Scheme. It's a joint work with my advisor, Srinivas Vivek. I'll be explaining the words from our title, 40 Second Order Masking, Lookup Table, Compression, and finally, R Scheme. In the usual setting, Crypto algorithms are secure when the adversary has access only to the inputs and outputs. Kokkar et al demonstrated that when these algorithms implemented on the hardware are prone to side-channel attacks. In this setting, along with inputs and outputs to the algorithm, the adversary can also observe the leakage during the execution from the device such as computation time, power consumption, electromagnetic emission, so that the adversary can exploit this leakage obtained from the device to deduce useful information about the secretly. There are practical side-channel attacks demonstrated on bank cards, smart cards, embedded devices. For instance, this is the setup used for power analysis attack where the power leakage from the device is observed using the oscilloscope. There are many countermeasures proposed in the literature to counter side-channel attacks. Masking is a widely used countermeasure. I would like to insist that for this presentation, we limit our focus to software countermeasures to side-channel attacks, particularly power analysis attacks. So, the goal of this countermeasure is to minimize the impact of side-channel leakage. As per the masking scheme, the secret is distributed among say D plus 1 values such that we call these values as shares such that any subset of D values is picked uniformly random and the D plus 1th share is computed using the secret and the remaining D shares. So naturally, since it is difficult for the adversary to observe the shares when the algorithm is executed using the shares, so the higher the number of shares, the more complex the side-channel attack will be. But lower orders strike a balance between the practical efficiency and security. So, the security of algorithms proposed using masking countermeasure is well studied. There are two widely used models to prove the security of this class of countermeasures. The first one being the noisy leakage model proposed by Charri et al. and the second one is probing leakage model introduced by Isha et al. For our work, we use the probing leakage model where the adversary is allowed to observe at most D values during the execution. To protect block ciphers using masking countermeasure, we need to protect the operations of the block cipher and the operation should be evaluated in the presence of shares. So the operations of block cipher are broadly classified into linear operations and non-linear operations. It is straightforward to implement linear operations since the operation can be evaluated on the individual shares to get the desired effect. Whereas it is complex to evaluate the non-linear function in presence of shares, the non-linear component of a block cipher is nothing but s-box. So there are many countermeasures discussed to efficiently evaluate the s-box of the block cipher. And these countermeasures are broadly classified into two types. The first one being the lookup table-based countermeasure, the other one being the circuit-based. Polynomial-based evaluations, bit-slicing will fall into the latter category. So the first ever provably secure masking countermeasure to protect s-box is discussed by Chary et al. As per the scheme, the secret is divided into is distributed among two shares, say x1 and x2. According to this method, we will be constructing a randomized lookup table, say t in RAM such that the s-box lookup table is shifted using the first share x1 and the shift is protected using y1. In the final step, we will be using the second input share x2 to look up this table t to obtain the sharing of s-of-ins. As observed, in the first step to construct t, we are using only x1. The second input share is used only in the final step. Since there is a construction of randomized lookup table involved in the scheme, it will have an execution time penalty of approximately 2 to 4. At the same time, the RAM required for this scheme, for instance, if you take AES128, the input s-box is 8 bits, the input to s-box is 8 bits. So, it requires a memory of 2 power 8, which is nothing but 256 bytes for the construction of randomized lookup table. This is per s-box called. So, it may turn out that the overhead associated with memory will be expensive for resource constraint devices which will have limited resources. On the other hand, the circuit-based schemes require memory of, like the circuit-based scheme, something like proposed by Prof et al. It requires a constant amount of memory, but it will have an execution time penalty of approximately 30. So, we can clearly see that the lookup table based require more amount of memory. On the other hand, the circuit-based schemes require more demand, more execution time. So, it will be a good idea to balance the memory required and execution time, especially in favor of resource constraint devices. So, this idea of balancing the memory and execution time is first discussed by Raw et al. It's a first-order-based scheme. Later, Wadnala generalized this scheme for first and second orders. So, the basic idea behind these schemes is nothing but we are going to reduce the size of the lookup table by grouping the s-box values. More formally, we are going to construct two sub-tables T1 and T2 in RAM. And the size of these sub-tables will be decided by the compression parameter, say L. From now on, we will be using this notation. Say, if A is an input way, A is a secret variable of n bits. A superscript 1 is used to represent the higher order n minus l bits of the secret. Whereas, A superscript 2 is used to represent the lower order l bits. So, in the first step, we are going to construct T1, table T1, which is going to group the values of the s-box. And in the second step, we are going to construct another table T2. The purpose of T2 is to unpack the entries related to higher order n minus l bits of the secret. So, to facilitate compression, we are going to use a subset of random values, which is common across T1 and T2. Wadnala extended the first-order scheme to second order as well. The overall structure of the scheme remains same. So, I will be discussing the steps of the second-order scheme of Wadnala. So, as explained in the previous slide, we are going to construct T1, which will map n minus l bits to m bits. m is nothing but the size of the output share, which is a group of 2 power l values. And the row is protected using the output masks y1 and y2. In the second step, we are going to construct another table T2, which will map l bit values to m bit outputs. So, the purpose of T2 is to unpack the values of one of the entries of T1. So, it is going to take the inputs of T1 and the shared randomness. In the final step, looking up T2 is going to give the outputs of S of x, the sharing of S of x. It turns out that the first-order scheme of Wadnala is secure. Again, it is proved secure in the probing leakage model, whereas the second-order scheme is prone to attack. We make a present demonstrated an attack on the second-order scheme of Wadnala. The essence of the attack is nothing but any two values from the second table T2 are going to reveal n minus l bits of the secret. They commonly depend on x superscript 1. So, this particular attack is possible because Wadnala scheme used the same output mask y1 and y2 across the table to protect. So, the moment the output mask is repeated, it is prone to attack. So, as part of our contribution, the first thing we tried is to fix the security attack, second-order attack on Wadnala scheme. So, naturally, since any two rows of T2 are going to depend on the secret, the idea would be to randomize table T2 using different output masks. But it turns out that it is not sufficient to protect only T2. We need to protect T1 as well. So, essentially, we will be randomizing both the sub-tables T1 and T2. But it's going to have increased the randomness complexity of the scheme, something like this, which is exponential in L. So, to reduce the randomness complexity of the scheme, we use this nice idea of using a three-wise independent PRG. Our scheme being a second-order scheme, we use a three-wise independent PRG to generate the output mask required, which will have only two random inputs, which is linear in L. So, we use the TASA-Villard three-wise independent construction to achieve this. There is also a similar work discussed, introduced by Isha et al., who introduced the robust PRG constructions to reduce the randomness complexity of the schemes. So, this PRG construction will not degrade, since it's a three-wise independent construction, it's not going to degrade the security if we replace the two random values using pseudo-random values. But there is one more observation here is that we still need to store the outputs of the PRG, which is going to go against the sole intention of compression. Storing the outputs of PRG is going to increase the memory complexity of the scheme. So, we came with this nice trick of computing the output mask on the fly. Essentially, the PRG outputs are computed on the fly instead of storing them. So, this will help us to manage the memory complexity of the scheme. And as the usual lookup tables, lookup table constructions will support preprocessing. The construction require only D shares to construct the randomized lookup table, whereas the D plus 1 share is used only in the final lookup. So, we made a slight tweak to the base construction, base compression table construction so that the compression scheme also supports preprocessing. Essentially, our scheme is divided into two phases, the offline phase and the online phase. As per the offline phase, we are going to generate the required randomness of the scheme and the construction of T1 is followed. I would like to insist that as per this offline scheme, we are going to use only two input shares. This being a second order scheme, the secret is divided into three shares X1, X2 and X3. So, the offline phase is going to take only X1 and X2. Since any two input shares can be picked uniformly random, the offline phase is independent of the secret. So, which is nothing but the preprocessing phase. In the online phase, after the availability of the secret, T2 is constructed using three input shares X1, X2, X3. It will also have the inputs of T1 constructed in the offline phase and the shared randomness. So, finally, we are going to compute the output mask on the fly and output the shares of S of X. So, when it comes to the security of our scheme, we prove the security using the composition model introduced by Batte et al. So, the idea of this composition is nothing but if we prove the S box to be secure independently, the block cipher may be using multiple times, the block cipher may be calling the same S box construction multiple times. So, it becomes complex to prove the security in the overall block cipher. So, the composition model will guarantee that if we prove the construction to be secure using the composition model, it's going to guarantee the security of this construction in the whole block cipher. In the bigger circuit. So, essentially, as per the probing leakage model, the standard we want to prove that any pair of values can be simulated independent of the secret. So, essentially, this means that the adversary will not gain, will not be able to infer any information about the secret using two observations. I would like to highlight few observations from the security proof. Since there are constructions of T1 and T2, is it possible to prove the security of T1 and T2 independently? It turns out that it is not possible since we have a shared set of common random values across T1 and T2. When there is a sharing happening between the constructions, it's not possible to prove them independently. And the cracks of the proof, the security proof lies when we want to show the simulation of T2 along with the shared inputs. Finally, is it formally possible to verify the security of our scheme? There are recent formal verification frameworks in the side channel community introduced by Bertha et al. and Caron's check masks. So, we tried the formally verifying our scheme. It is not a trivial, it's not trivial to use these frameworks for the lookup table based schemes. We are working on it. Finally, the implementation results. So, this slide shows the implementation results for AES-128 as highlighted in the box. For the case of L is equal to 3, which is where we have the optimal results. The S-box lookup table, the whole block side for AES-128 requires a memory of only 8 kB. This is with preprocessing. The randomized tables for all the S-boxes, which is nothing but there are 10 rounds for AES-128. And we have 16 S-box calls. On the whole, it is 160 pre-computations of randomized tables. It requires a memory of only 8 kB. So, when we compare the memory required with the other lookup tables, they require a minimum of 40 kB. On the other hand, the penalty factor associated with the online execution time is 45 approximately. Whereas, other circuit based schemes require a minimum, incur a minimum penalty of 64. We implemented the lightweight block cipher of present as well, which helps to further reduce the memory required. For the optimal case, it requires a memory of only 3.2 kB and it will have a penalty of approximately 12. Finally, I would like to conclude my presentation by summarizing the results of our scheme. For AES-128, particularly for AES-128, we require a memory of 59 bytes instead of 256 bytes. Which is a factor of 4 reduction. So, if you want to have the pre-processing advantage, our scheme requires a memory of 8.1 kB only. Whereas, it requires a memory of 40 kB for other lookup table based schemes. So, the randomness required is optimized using the 3-wise independent PRG. So, for the optimal case of L is equal to 3, it only requires 19 bytes of true random values. Finally, for any generic S-box, our scheme is going to strike a balance between the memory required to store the pre-processed tables and the online execution time. So, the scope for future work is, is it possible to extend the second order scheme for higher order levels? It's not a trivial extension, but it's an interesting future work to extend the compression for higher orders as well to favor the resource constraint devices. And not only the compression technique, is there any other ways to optimize the masking-based countermeasures for resource constraint devices? This is the imprint link where you can find the full version of our paper. And these are the references we used for this presentation. Thank you.