 Hello everybody, I'm Jan-Peter Danvers, and together with my colleague Mihir van Berendonk, we will take you through the world of table-based conversion from arithmetic to Boolean masking. Let's start with masking. Masking is a technique to protect sensitive values against side-channel attacks. The idea is that we're going to share this sensitive value into multiple shares, so that an adversary, as long as he is not able to obtain all of these shares, does not learn any information about the secret value. There are different types of masking algorithms. There is Boolean masking, where the extra of all the shares will give you the sensitive value, or there is, for example, arithmetic masking, where the sum of all the shares, modulo, sum number, gives you the sensitive value. And depending on the type of computations that you want to do, you might prefer one of these types. For example, for Boolean operations, we typically want to have Boolean masking, while for arithmetic operations, we typically prefer arithmetic masking. As an example, this is an implementation of the decapsulation of the encryption scheme Saber, that we implemented in a previous work. As you can see, some of these operations are in yellow. Indicated that these are arithmetic operations, and that we prefer to work in the arithmetic mask domain here. On the other hand, some of the operations are Boolean, indicated in blue, for example the hash functions, indicated that we prefer here to work in the Boolean mask domain. To convert from this arithmetic to this Boolean domain, we need conversion algorithms. For example, arithmetic to Boolean masking, or Boolean to arithmetic masking. In this presentation, we will focus on arithmetic to Boolean masking algorithm. There are several arithmetic to Boolean conversion algorithms. In this presentation, we will focus on the table-based methods. These table-based methods are very efficient, but can only be used for first-order masking security. Imagine that we have an arithmetic masking on the left in yellow, and we want to convert this into a Boolean masking on the right in blue. Table-based methods are going to proceed as follows. We're going to make a table with every possible sensitive values, masked with the small r. We're going to do that both for the arithmetic domain and the Boolean domain. Having this table, we can convert our arithmetic masking in the following way. We take our arithmetic masking and we go from our original masking under big r to the masking under small r. Now that we have this value, we can look it up in the table and go from the arithmetic domain to the Boolean domain. Once we're in the Boolean domain, we only have to remask to our original randomness r, and we have our Boolean value for the mask. While this certainly works, it is not very efficient. Imagine that we have an input of n bits. This means that our table needs to have 2 to the n entries, which does not scale very good. As a solution, table-based methods typically divide the inputs into small chunks. This allows the corresponding table to become smaller. For k-bit chunks, we only need a table of 2 to the k entries. So how do we proceed? Well, we take the first chunk, the least significant bits. Let's say these are 3 bits. We take these bits and we do a remasking again to this masked small r. And then we can use this to look up in the table the arithmetic to Boolean conversion. And after we've looked this up, we can do again a remasking to get the original randomness big r. And this gives us the first 3 bits of our Boolean masking. Exactly the same thing happens with the next chunk, the next 3 bits. So we take these bits, we do the remasking, we look it up in the table, we remask again and we get the resulting chunk in the Boolean representation. Now there is a problem. And that is the carry. More specifically, there is a carry from this first chunk that we looked at to the chunk that we're looking at now. And this carry needs to be taken into account. For reasons of security, but also for reasons of correctness. Now the different table-based methods will differ specifically on how they deal with this carry. In our work, we specifically look at a table-based method proposed by the Breisen in just 2012. The idea of this paper is to include the carry in a masked form into the table. This helps us correct for the carry at the cost of only doubling the number of entries in the table. We will have 2k entries in the table for carry 0 and 2k entries in the table for carry 1. At the output of the table, we will not only get our Boolean value, but we will also get a masked version of the carry in that iteration. This masked version of the carry can then in the next iteration be used to compensate for this carry when looking up. So we will have not only the arithmetic input, but also the carry from the previous iteration. In our work, we show that there is a security vulnerability inherent to this method. And the reason is this variable encircled in red. Now in this variable, we do not yet take the carry into account. And we show that this leads to a non-uniformity of the masked value. Now my colleague Michiel is going to give you more information about the attack and about the possible solutions to this attack. Hi, I'm Michiel van Berenhoek and I'll take over from Jan-Peter to describe our attack on this A2B conversion method and also two solutions that we propose to remedy the problem. As Jan-Peter already mentioned, the problem is with the value encircled in red. So let's go to the next slide and analyze it in detail. The value that is used for the table lookup is a masking of x1 with the mask r. But there is also the omitted carry that is used separately for the table lookup. This means that the actual mask depends in two ways on small r. There is both the normal mask r but also the carry which depends on r. If you look at the distribution of this mask value, a problem is immediately obvious. Shown here on this slide is a conversion that happens in three iterations where the conversion processes chunks of two bits. In a secure masking, you would expect the mask to take the values between 0 and 3 with equal probability. We see that this is indeed a case for iteration 0 when there is no carry yet that needs to propagate. However, in subsequent iterations, shown here on this slide iteration 1 and iteration 2, the carry creates a dependency in the mask value. As a result, the distribution becomes skewed and the mask value 0 has a higher probability to occur than other mask values. This is exactly what should be avoided in a masking situation because this now means that the sensitive value is still correlated with the shares. Since the shares are correlated with the sensitive value, we expect this to show up in a standard fixed versus random t-test. This is shown here on this slide for a t-test with 100,000 collected traces where the A to B conversion happens in eight iterations. Again, in the first iteration, there is no carry yet that needs to propagate and correspondingly, the mask is uniform and there is no t-test leakage. In subsequent iterations, there is a correlation between the processed values and the actual sensitive values, which is either fixed or random, and this is shown as high leakage peaks in the t-statistic. The theoretical flaw is therefore clearly detectable in practice and in the paper, we describe in more detail how this can be used to mount actual attacks on schemes that use this conversion. An important question is why the flaw was not caught earlier, in particular so since the method has been around since 2012. In the original paper, no experimental leakage tests were conducted and the explicit security proof was not written out in detail. We include an excerpt from the paper on this slide, which shows that the security proof was deemed too trivial to write out in detail. We believe that this shows the importance of both leakage tests and security proofs to have full confidence in the security of masked implementations. We also propose two solutions that remedy the problem and that avoid this non-uniform mask value. The first solution we call the straightforward fix of the original method and that is to use a different value for R in each iteration of the conversion. Concretely, this looks as shown here on this slide. Instead of a single table, we now have as many tables as there are iterations in the conversion. Each table is constructed for a different value for R, R0, R1, R2, and so on, as many as there are iterations, and the lookup happens in the table that corresponds to that iteration. Because the mask value and the carry now use different values for R, the complex dependency is resolved. As a result, the mask will be uniformly distributed and the original problem is avoided. The benefit of this method is that each iteration still requires only a single lookup. This was the original benefit of the method due to the brace and therefore we can keep exactly the same performance. The drawback of the method is that there is a significant extra memory cost. Each of the iteration requires its own separate table. There are therefore many extra tables and this requires many extra bytes in memory. Our second solution we call the dual lookup method. And in this solution, you will explicitly compensate for the carry that needs to be added to the input. Concretely, this looks as follows. The table now no longer contains the input carry, which is no longer necessary for the lookup, but still contains the output carry. The output carry is still shared in a Boolean masked format, which we will resolve into an arithmetic carry with a separate lookup. The second lookup requires a second table that essentially does a B2A conversion. In this table, the Boolean masked output carry is converted back to an arithmetically masked input carry. This input carry can then be added to the input and this way the carry can be resolved. The input carry is shared with the masked gamma and this mask is also taken into account in the construction of the table shown on the bottom of the slide. In the dual lookup method, the mask that appears is therefore equal to the sum of r and gamma. Since both are simply randomly sampled values, the resulting mask will take the uniform distribution. The benefit of the dual lookup method is that it all has a smaller memory consumption. Even though it requires two table lookups, each of the tables can now be significantly smaller. In particular, the original table can be halved in size because it no longer needs to take into account the input carry. The table that does the B2A conversion can remain quite small because it only needs to convert a single bit and therefore only has two entries. The drawback of the method is that it needs two table lookups and these methods will come with some cycle counts costs and therefore also reduced performance. Before, we highlighted the importance of both security proof and leakage tests. We explicitly conducted both for both our newly proposed methods and the details of this can be found in our paper. We analyzed the performance of table-based A2B conversion methods in detail. Results shown on this slide are for 16-bit A2B conversion on a 32-bit ARM Cortex-M4 processor. In our paper, we also have results for 64-bit A2B conversion or results on an 8-bit AVR processor. In the table, in the top row, we showed the de Bresse method for reference. However, we have just shown that this method is insecure and we stressed that it should therefore not be used in a secure conversion. Then, we showed three secure conversion methods. The first one is a method due to Coral and Cholkin, which was improved by de Bresse in his work. Next are our two newly proposed methods, the straightforward fix and the dual lookup method. The three methods offer trade-off between pre-computation cost, conversion cost and size of the tables. In terms of pre-computation cost, the lowest number of cycle counts is found for the improved method of Coral and Cholkin. When we look at the online conversion cost, the fastest conversion is found for the fixed method of de Bresse. In terms of memory footprint, the lowest cost is found for our dual lookup method. To fully exploit this low-memory cost, a technique called the LSB trick must be used, which is described in the paper. We also want to stress that the low-memory footprint of the dual lookup method sails well to larger conversions. For example, for 64-bit conversions, the memory footprint is a factor 4 lower than the memory footprint of the Coral and Cholkin method. Again, all of these results are readily available in our paper and we happily invite you to have a look. To conclude our work, we analyzed and compared table-based A2B conversion methods. Table-based methods are typically the most efficient ones to do first-order A2B conversion. They have already been heavily used in masking post-quantum cryptography schemes like Sabre or Kyber, and we expect that they will still be used more in the future. In our work, we found a security vulnerability in a table-based method that has been around since chess 2012. We believe that this shows the importance of both security proofs and practical measurements when proposing new masked implementations. To remedy the problem, we proposed two new methods that offer a time-memory trade-off. There is a straightforward fixed method that keeps the performance of the original method at the cost of an increased memory footprint. We also proposed a dual-lookup method that requires an extra table lookup and therefore has reduced performance, but this dual-lookup method keeps the memory footprint significantly lower than other methods. Finally, we aim to make our implementations publicly available at the link shown on this slide, and we hope that they will see use in future masked implementations. Finally, both Jan Pieter and I would like to thank you for listening to our talk and do not hesitate to reach out to us should you have any questions.