 Hello everyone and thanks for attending to this presentation of our article Learning Parity with Physical Noise. First we need a bit of context. I guess most of you have heard about learning problems. They are getting more and more used in calligraphy. So they are computationally hard problems that can be used to build calligraphic schemes. One of its most famous applications is probably post-quantum public key encryption schemes. Indeed, some of the most efficient lattice-based schemes are relying on learning problems such as LWE, MLWE, LWER etc. For instance, CABER and SABER are both NIST finalists and are both relying on learning schemes. So NIST finalists in the post-quantum cryptographic contest. There are also other less famous but still useful cryptographic applications to learning schemes such as homomorphic encryption, identity-based encryption and many other cryptographic good things. Among all the learning problems, we will take a particular interest into the learning parity with parity noise problem, which is probably the most, straightforward, the most minimalist learning problem. So as in any learning problem, the idea is that an adversary is given public key vectors and noisy inner-products between this public key vectors and the secret key. And it is computationally hard to retrieve the secret key. In the learning with parity noise case, the inner-product occurs over F2. So we have a secret key which is an n-bit vector, public n-bit vectors, and the error added to the inner-product is generated, is a binary error, so generated according to a binary law with a known error parameter. And given the public vectors and the noisy inner-product, it is hard to retrieve key, the key. There exist a decision version of this problem, but it won't be useful to us in this article. We will only look into the finding version. LPN seems minimalist and should be easy to implement. However, when trust is gained in this problem and we start to look into real implementation question, there are many issues that are rising and mostly the error generation. Indeed, the error generation has to be cryptographically secure. Elsewise, an adversary would just be able to remove the error and solve the system of equation which would just be a linear system of equation easy to solve. Therefore, we need a cryptographically secure PRNG, which is time-expansive, and this PRNG also proved to be a weak link in side-channel attacks, against side-channel attacks, against implementation of schemes that rely on LPN. So, this is where in exact computation steps in. In order to remove the pieces PRNG, we can add the error directly when computing the inner-product. If we use a processor in given condition, for instance, in our case, we used voltage overscaling and clock manipulation, so physical conditions, we can get him to compute its operation with a control error. By doing so, we will end up with the same results as by computing the correct inner-product, then a bigger generated error. Problem is, when doing so, we rely on a physical assumption rather than a theoretical one. It is not that much of an issue because at the end, when you are implementing a scheme, you always rely to some extent on a physical assumption that your implementation will be the same as your theoretical assumption. But there are some other issues that are raised by in exact computation, and this will be the first thing we'll talk about. So, first, in order to compute a physical inner-product, there are two standard architectures that come to mind. First one is serial, the other one is parallel. Basically, they both start with a layer of on-gates, so you compute n's between the key bits and the public vector bits. Then, you have to absorb all of the n's output. You can either do it parallelly, which means 2x2, then you absorb the results, etc. So, for instance, for a 32 bits physical inner-product, you get 5 layers of on-gates, or you can serially absorb them, which means absorb the 2 first, then add the third, add the fourth, etc. So, why should we consider different ways to compute this inner-product? It's because, sadly, we do not manage to control the error so good that there are no dependencies, and there are mainly output dependencies, which means that the probability of error depends of the correct value of the inner-product, which is not a property we want, since the correct value of the product depends of the key, which means it will create unbalanseness in the result, and it will, to some extent, leak some information. So, we denote epsilon 0 or epsilon 1, so probability of error, knowing that the correct output of the inner-product is 0 or 1, and we denote by delta the distance between those epsilon and their average. So, basically, the bigger this delta is, the more output dependent the error is. So, we want to mitigate it as much as possible. If we reach a delta of 0, it would mean that there are no output dependencies, epsilon 0 equals epsilon 1, and we are in the LPN case, which would be ideal. So, now let me introduce a few other tweaks we can do to the architectures I just presented you in order to try to mitigate the delta. First, we can use a G2E clock. Basically, it means clock manipulation so that instead of the deterministic error, we also add jitter, which is random noise. That would be a good idea, because it means that the error will come from both deterministic so maybe without the dependency causes, but also pure random one. We can also try to use power gating, which means connect our circuit to the ground in order to lower the difference of tension needed to do the transition between 0 and 1 and 1 and 0 and try to balance it a little bit more. We can also try to use a serial architecture with bigger and more balanced gates. So, we studied all these architectures and we ended up with the following simulated deltas. Basically, what we see is that the best way to mitigate delta is the Kc, so it's a G2E clock, mainly because of the probabilistic cause of randomness that is not output dependence so that mitigates the output dependency effect in the total probability of error. But what we can sadly also see is that even if we manage to reduce the delta coming from the standard architectures, we do not annihilate it, which means that there remains output dependency and the security of a scheme implemented with a physical inner product cannot reduce directly to the security of LPN. This is what motivates the next section, which is a security reduction that proves that our physical inner product still relies on the security of LPN. So, in order to dive into our prediction, we first need to modelize our physical problem. In order to do a mathematical reduction, we need to properly define what we call the learning with parity noise with output dependency distribution. So, it's still a learning problem. It relies on CMAZ and LPN. We have a K, N-bit, secret key. Two nice parameters, epsilon 0, epsilon 1, that have probabilities that remain inside 0 and 0.5. And as an LPN, we generate uniformly public key vectors that are called X, compute the inner product with a secret key and add an error. At the same time, the distribution that the error follows depends on the correct value of the inner product. This is why we have two nice parameters. Note that this definition is more general than the LPN distribution. So, it means that if we take epsilon 0 equal to epsilon 1, we end up with LPN distribution. And this observation allows us to make a double reduction with just one proof. Because what we did is build an algorithm that allows us to transform LPNOD samples into noisier ones. So, what does it mean? It means that an adversary which has access to an LPN solver, for instance, can with given LPNOD samples to transform them into LPN samples, then use its solver. So, it means that with some assumptions on the noise parameters, LPNOD is at least as hard as LPN. So, this is the site of the reduction that interests us the most because LPN is a known secured problem. So, it gives confidence into our problem. But this algorithm also allows us to see the reduction the other way around because an adversary which has LPNOD solver can this time take LPN samples, transform them into the LPNOD that he can break using its solver. So, LPN is also at least as hard as the LPNOD. So, it says that both this problem seems to have a similar security at least for the finding version of these problems. So, what are the ideas that are hidden in this algorithm? Basically, this algorithm takes a distribution as an input. It also takes the parameters of this distribution and some parameters that will be used inside the algorithm. And it outputs a sample that follows another LPNOD distribution with noise parameter epsilon 0 prime, epsilon 1 prime. First, it checks that if the transformation is trying to do is possible. So, for that we have equations that are not detailed here because they are quite complex and not that useful. But basically, what it says is that any batch of samples into another one, which is pretty logical. I mean, if we are able to transform some LPN samples into a noiseless one, it will be a pretty useful algorithm. Once we know this algorithm will be successful in this transformation, we first compute the last bit of the output sample. So, we know that this sample will follow a LPNOD distribution with prime epsilon 1 prime. So, for we know what will be the distribution of its last bit and we can simulate it independently of the public vector that will be put before it. Once we have it, we use a rejection sampling to generate the public key vector. So, basically, we use our input sampler that just creates some pulse coming from the input distribution. Add them some noise with the pernuit parameters algorithm took as inputs and check if the last bits correspond with the one we generated earlier. If it does, we can just output the algorithm. If it does not, we keep doing it until it does. When you write the probability that the output of this algorithm follows, you can show that it follows the right distribution. So, we have an algorithm that takes a LPNOD distribution as an input and output as another one. So, this allowed us to have a concrete security estimation of LPNOD because what this algorithm allows is to take some LPN samples following the noise parameter epsilon and transform them into LPNOD samples with noise parameter epsilon prime plus delta epsilon prime minus delta or the other one, but it's the same for the formula. With epsilon, epsilon prime and delta following the formula you see on the screen. So, it means that with this parameter, LPNOD is at least as hard as LPN. And when we have a scheme that relies on LPNOD we can have a security equivalent LPNYs. In this part of the presentation we will introduce our FPGA prototype implementing a physical linear product. We have designed a full digital 512 bit LPPM processor targeting a Xilinx Parton 6 LEX75 FPGA mounted on a Sakura G board. No special purpose markers are required as PLLs or DCMs and all the fabric elements from FPGA have been used. Our LPPM processor is composed by an inner product block a viable delay line a voltage sensor for fault tolerance and an error control mechanism that acts also as a finite state machine. Regarding the inner product it is composed by a parallel XOR3 and a serial XOR3. The inexact computing is achieved at capturing metastable state and leaching events on the output of the inner product block to generate errors. This allows us to remove the needs of a random number generator. We have designed a compact and full digital delay line composed by 16 carry four units. Carry four units are fast chains of XOR in slice M and L used to implement arithmetic functions. Our delay line offers 64 taps providing 30 picosegons per tap. Very compact 64-1 multiplexer has been implemented on lutes and in-slides F7 and F8 MOOCs in order to choose the proper tap to generate the given probability of error. The error control model is implemented as a successive approximation register SAR to calibrate the LPPM and the startup. Seven batches of 1024 inner products are computed at the end of each batch the probability of error is computed and the control word CNTL variable delay line is updated from the most significant bit to the least significant bit. After the calibration, the error probability is around 0.25. We have collected a huge amount of LPPM samples from our prototype. The delta has been found equal to 8.2% indicating that this prototype suffers from non negligible output dependency. To mitigate such dependency we complemented the basic design in order to provide data independent glitches. More specifically, we introduced those data independent glitches and the input and the output of the three rex or three as shown in the slide. Adopting this solution, the normalizer difference is reduced to 5.8%. The LPN or D as defined before is at least as hard as the LPN with parameter N and Epsilon with the key, secret key key where Epsilon is delta over 1 minus 2 times delta. Concretely, for the delta equal to 8.2% around an error probability of 0.25 reached in the FPGA prototype we have an LPN or D with those given parameters which security reduces to an LPN with parameter N equal to 512 and Epsilon 0.20. Using the best known attack against LPN, this gives us at least 80 bits of security for a scheme which would rely on our implementation. We now move to other results. Masking helps against differential power analysis and structure the error but also mitigates delta. Indeed, an adversary does not have access to Z which is the physical inner product computed on one of the share of the key but only it's a leakage L which gives a probability of observing Z equal to I given the leakage L that is equal to 1 over 2 plus a parameter delta that is in the range of 0, 1 over 2. The only imbalance that an adversary cannot serve will therefore give us an Epsilon 0 prime and an Epsilon 1 prime defined as shown on the slide. If delta is 1 over 2 then the recovery of Z is perfect so that we have that Epsilon 0 prime is equal to Epsilon 0 and Epsilon 1 prime is equal to Epsilon 1. If delta is equal to 0 then there is no leakage so that we have that Epsilon 0 prime is equal to Epsilon 1 prime that is in turn equal to Epsilon. So that the latter shows that in case an implementation is masked and then the adversary does not exploit the leakage then exploiting the output data dependency with a filtering attack is not possible. We collected 4 million power consumption traces to estimate Manga signal to noise ratio or SNR. ISSNR values have been found on the first stage that means the end layer where the dependency on the circuit is still quite high. Yet ISP of SNR are in the order of 10 to the minus 5 which means that Bula masking can be considered as an interesting candidate for countermeasures that can efficiently leverage on key homomorphic features of the LPPN. We ported the basic LPPN design to a more reason and actual platform the Xilinx Arctic 7100T which is a 28 nm technologie FPGA mounted on a new AE CW305 board. For this implementation we have found a delta of 1.5% that is strongly reduced compared to the Spartan 6 design. Such reduction confirms the technological dependency of this aspect. To conclude now we summarize the pros and cons of the inexact computing applied to APN and its FPGA prototype. So after spros we can consider good performances and good features for SCA protection. As cons we have to consider that the LPPN relies on physical assumptions and induces data dependent error. Security reduction handles the data dependent errors which are then self being reduced by smart hardware implementation. As next step we will investigate other learning problems such as solving with errors in order to build even more advanced primitives. Thanks for your attention.