 Hello, my name is Norman La, and in this video I will present our work, Sight Channel Information Safety Coding using iterative chunking. We evaluated our new approach to the classic McElise hardware reference implementation. The other members of our team are Omnile Hagen from the University of Southern Denmark in Odense, Richard Petri from the Fraunhofer Institute for Secure Information Technology in Darmstadt, Germany, and Simona Samajiska from the Radboud University in Nijmegen in the Netherlands. To understand our motivation, I start with some background information and challenges about the target crypto system, which is the classic McElise or Niederheiter scheme respectively, followed by the approach of information set decoding and reaction-based attacks. Then I will explain a basic reaction-based attack on the decryption process of the Niederheiter scheme and what is the impact if it is combined with information set decoding approaches. Afterwards, I will introduce iterative chunking, how it is applied, and what is the impact when we use information set decoding as well. Finally, I will briefly explain how we evaluate the attack in practice and present the overall evaluation results. Classic McElise is one of the finalists for key encapsulation mechanisms in the third round of the NISPQC competition, and it is based on the Niederheiter crypto system with binary copper codes. Classic McElise is considered to be a conservative choice. The parameters are the code length N, the code dimension K, the guaranteed error correction capability or decoder capacity T, and the field size M. The authors of the submission defines these five parameter sets according to the different security levels. They are identified by the prefix scheme McElise, followed by the code length and the error correction capability. The operations of the Niederheiter crypto system basically work as follows. The key generation starts with randomly choosing an irreducible copper polynomial g of x of degree T over the extension field 2 to the M, and choosing a random list of N distinct elements in gf2 to the M, which are the support of the polynomial g. Both g and the list of alphas represents the private key. From these values, the corresponding parity check matrix is computed and returned as the public key. The encryption operation is rather simple. First, we generate a random binary vector E of M weight T and length N. The ciphertext or syndrome S is then computed by the matrix vector product of H and E. The decryption begins with the decoding of the syndrome to get the error locator polynomial sigma. Then we identify the roots by evaluating sigma at the private support, lists alpha 0 to alpha n minus 1. If we identify root at alpha a, the corresponding position in E is set to 1 or to 0 otherwise. The decryption process depends on the error correcting decoder. The currently established approach is the Balakamp-Mercy decoder, and there is also the Patterson decoder. The security of the Neleriter scheme relies on the decoding problem, which is defined as follows. Compute the error vector E given the binary parity check matrix H and the ciphertext S, which equals to H multiplied by E, where E is a uniform random weight T vector of length N. The fastest attacks are known to break the decoder problem use information set decoding on short ISD. Cure into projects like Prange, Stern, Mai, Moira, Tomei and short MMT, and Becker, Jou, Mai, Moira and short BJMM solve the problem of the classic McLeese parameter submitted to NIST with an infeasible cost, which is of course as desired from a security point of view. For example, the approach MMT requires 277.6-bit operations for the highest parameter set. Another security aspect is implementation attacks. Reaction-based attacks are known and feasible attacks to code-based crypto systems. The assumption for these attacks is that an attacker has access to a decryption oracle which returns success or failure for a given input. The basic idea is that reactions caused by prepared input lead to a secret key or plaintext recovery. Sidechain information, like timing or power consumption, can be used as unintentional feedback from a decryption oracle. An example of a provoked reaction is to feed a decoder with manipulated input such that its decoding capacity exceeds and it leaks exploitable information. Currently plaintext recovery approaches like the approach of Shufuan and Alita from 2009 reveal bits individually and require a high amount of queries, which is the code lengths for example 8192. So, the first contribution of our work is the introduction of iterative chunking for plaintext recovery on classic Macalese. In short, iterative chunking is an attack strategy which uses cumulative queries to recover multiple positions of secret information at once. Additionally, we optimized the parameters and the strategy. Therewith, we are able to significantly decrease the effort of a reaction-based attack. For example, by approximately 90% for the highest parameter set of the classic Macalese NIST parameters. The second contribution is the combination of iterative chunking and information set decoding to further decrease the amount of queries for successful plaintext recovery. In our work, we estimate the tradeoff between required queries and computational power so that we know how many queries can be saved if an attacker has high computational resources. We evaluated our attack approach with a simulation using an ideal decryption oracle and we also developed a practical decryption oracle to mount a side channel attack by exploiting the electromagnetic radiation leakage of the current reference FPGA design of classic Macalese. Let's have a closer look at reaction attacks and how we can mount them on the NIDA writer crypto system. Reaction attack has the following assumptions. First, we need access to a decryption oracle which is an implementation practice. For our work, we chose the NIDA writer FPGA implementation of Wang and Aleta which they present at the peak crypto in 2018. Further, we assume that an attacker is able to ask a decryption oracle for infinite times. We target a plaintext recovery attack so we assume that an attacker has no access to the output but he gets direct or indirect leakage for instance from protocol message or physical side channels. We were inspired by the work of Trufan and Aleta from 2009 in which they describe a timing attack on the Patterson decoder of Macalese FPGA implementation. Nevertheless, the FPGA design of our target instantiates the Bellacamp-Mercy decoder with a constant time implementation. This means it is not possible to apply the approach of Trufan at Aleta. However, the FPGA design shows a power leakage when exhausting the decoder's capacity T. If the syndrome contains more than T errors, the computation of an error-locator polynomial fails and the result is a random polynomial. The corresponding error vector shows a very low hamming weight after the evaluation with the support. To examine whether we can exploit this effect for our target, we perform the power simulation using the hamming distance for the register transfers. In the plot you see the simulated power trace. It is quite simple to identify the five different stages of the decryption process which are specific for the Bellacamp-Mercy decoder. It starts with an additive FFT which is used to compute the double syndrome in the next section. Then the Bellacamp-Mercy decoder is applied which puts out the error-locator polynomial. In the fourth section the polynomial is evaluated using another additive FFT call. In the last section the error vector is reconstructed which is the most interesting part for our desired leakage. At this step it is obvious that we can identify a significant gap between a correctly decoded error which has a high weight and a faulty one which shows just a small weight. Therewith we have the preconditions for a power attack. Now we know that there is exploitable leakage when we induce an additional error to the syndrome but how can we do this by just having the binary parity check matrix H and the syndrome or ciphertext S? Let's have a look at the encryption operation of the Neon Rider cryptosystem. Here you see an exemplary matrix H and an error vector E which equals to S when both are multiplied. When computing the matrix vector product one can simply search for the ones in the vector E and horizontally add the column vector at the corresponding positions. So the ones in E represent the errors and we know we now know that there are exactly T errors included. If we now want to induce an additional error we must search for a position in E which equals to zero and add the corresponding column vector of H to the original syndrome vector to get the manipulated syndrome as prime. Therewith we have constructed a syndrome which includes an additional error. How can we use this mechanism in a reaction-based attack to recover an entire error vector? The idea is to iteratively test each error vector position whether it is a zero or one. To do this we add a corresponding column vector of H to the syndrome and request the decryption oracle whether the decryption succeeds. If the oracle returns true the decryption was successful and the decoder was able to decode the syndrome. This means that we cancelled out an existing error in the syndrome before and we decreased the total number of errors in the syndrome. So it must be a one at the position in the original error vector. In the other case we increased the number of errors before which now exceeds the capacity of the decoder and the oracle returns a failure. Therefore we recover this position as zero. The total procedure requires any iterations since we need to ask for each position of E. How can we reduce the number of cravies to decrease the effort of the attack? So we can consider information set decoding but since it is infeasible for the entire number of bits in E we can first recover a specific subset of positions with reaction based attacks and then we spend computational power and perform information set decoding to solve the reduced decoding problem for the remaining positions. To identify how many cravies we can save with an available computational resource we need an estimated trade-off. There is a default upper limit for the recovery of positions with an reaction based attack which is K. Therewith we are able to construct a uniquely solvable linear equation which can be solved by using Gaussian elimination with a negligible effort. We evaluated the procedure with a mathematically derived estimation for the trade-off between required number of cravies and the computational complexity. The plot shows the resulting curves for all NIST parameter sets and the ISD approaches Stern and MMT. Unfortunately there is no compact representation of the concrete complexity of BJMM so we left it out for our analysis. As one can see we can save several hundred cravies if we assume for example a feasible cost of 40-bit operations. Now what is iterative chunking? We start by answering the question what happens if we ask for two error positions at once. If we do this we add two distinct columns of H and call this cumulative query a chunk. Then there are four different cases of an oracle response which are listed on the table on the right hand side. The first column of the table shows the initial state of the query chunk for the positions I and J. The second shows the state of the pair after flipping the values. The third column shows the total number of errors in the new state and the last column shows the oracle's answer. So if we have a look at the first row we assume here that the original error vector has zeros at the positions I and J. If we add the corresponding positions of the matrix H the resulting syndrome will include two additional errors and so it is not possible to decode it and the oracle returns false. We name this case a load chunk whereas we call the other cases high chunks. In the cases in the second and the third row the error positions are just switched since one error is added and the other is cancelled out. In total the error weight does not change and the decoding succeeds. In the last case both errors are cancelled out and so the decoding succeeds as well. As one can notice the density of ones in E is less than 2.1%. In the most cases E consists of zeros and as we just saw we can detect exactly this case. Now we introduce the term chunk size beta which is 2 here. What happens if we iteratively increase the input chunk size beta? In the table you see the eight cases when we increase the chunk size by one. So now the vector or the chunk has a size 3. Now there are four cases in which the decoding capacity is exceeded and four cases for which the oracle returns true. So we are not able to distinguish one specific case to recover the original bit pattern. But if we already have a recovered position where we get a one we are able to decrease the total number of errors in the syndrome by adding the corresponding vector to the syndrome to cancel out the error. Then we get the unique case again and we can identify the all zero case. Now we can iteratively continue this procedure to increase the chunk size to ask for even more positions at once. Therewith we developed the following attack strategy. Iteratively increment the chunk size beta if an additional one position is identified. Reduce the number of errors in the syndrome with the known one position. Since we have to identify the exact position of a one in a height chunk we apply so-called sub-junking which means that we apply the identification with a divide and conquer approach. If beta gets larger the possibility of a height chunk increases. This leads to more sub-junking of course. So there must be an optimal beta t which minimizes the number of queries what we will see on the next slide. If beta equals beta t we collect the height chunks in a so-called bucket. For example of size n minus k then we solve the bucket at the end using information set decoding. Now what is the optimal chunk size to minimize the number of queries? Here in both plots you see the number of queries for the chunk sizes from 2 to 24. On the left hand side you see the curves for the theoretical estimation for all proposed NIST parameters. And on the right hand side you see the curves of the iterative chunking approach implemented using the script language python. As one can see both plots show an equal behavior and you can also see that the number of required queries quickly decreases when we increase the chunk size. Here we can also identify the different minima for the number of queries which are at the chunk sizes 17 for the three highest parameter sets and 18 and 19 for the lowest ones. Now we know the optimal chunk size and again we can apply information set decoding to further decrease the number of queries. The graph shows our estimated trade-off between computational cost and required number of queries for Stern and MMT. As you can see we can save additional queries. For example we can save circa 80 queries if we spend 40 bit operations of computational power. Beside the theoretical and simulated evaluation we also mounted a practical experiment. The setup is depicted in the picture on the right. We integrated the design of Wanget Alita to a secure export which you see on the left side of the setup. The broad provides Xilinx Kintx 7 FPGA which you run at a frequency of 24 megahertz. We acquired the traces with an EM probe from Lange and measured the signal with a picoscope running at 500 mega samples. Further we developed various scripts for the acquisition, the side channel decryption oracle and the iterative chunking strategy which is the same script as for the simulation before. We built the practical side channel decryption oracle by using Welsh t-test to identify a gap in the error vector construction at the end of the decryption process. To do so we need two reference traces. One from a height chunk which indicates a successful decryption and another one for a load chunk indicating a failed decryption. Now we compare a trace of a query and decide if it is similar to a trace of a height or of a load chunk. To get a better alignment of the trace and get a little speed up for the computation we compress a trace within a clock cycle to just one value. Further we update the reference height trace since its behavior changes due to the removal of errors in the prepared syndrome. Here are the results of our evaluation. We evaluated the iterative chunking approach for all parameters proposed at the NIST competition. We determined the average required number of queries for a successful recovery of an entire error vector. Here we have the numbers for the optimal chunk sizes. For comparison we also give the number of traces when no chunking is used so the chunk size is one for all queries. The data for the simulation and the experiments were gathered from the same sets of 10 different key pairs and 10 different plain text ciphertext pair per key pair so we have 100 in total for each parameter set. Since the practical oracle do not fail we have the same numbers for the simulation and the practical experiment. In the last two columns we also give the theoretical prediction and the number of required queries for applying information set decoding at a cost of around 2 to the 40 operations. As one can see we are able to drastically reduce the number of queries with iterative chunking. For example we save circa 90% of the queries when mount an attack for the highest parameter set and reduce the number of from 6,530 to 654 which also follows the theoretical prediction. And we can reduce the numbers further to 577 queries when we use information set decoding. To conclude our work we finally could show that iterative chunking reduces the number of queries by an order of magnitude. As we just saw from 6,530 to 654 queries which is a reduction by circa 90% for the highest parameter set. We saw that the additional use of information set decoding decreases the amount further. For example by spending 40 bit operations we could reduce the queries from 654 to 577 which is about 11.7%. We showed the application of iterative chunking for a specific information leakage but we can of course use any feedback for a succeed or failed decryption. So we consider a broader application to other code-based schemes. With that thank you for your attention and your time.