 Hey there and welcome to the talk on chosen ciphertext K-Trace attacks on Mast CCA to secure Kyber I'm Silvan Streit from Fraunhofer Isaac in Munich, Germany and today I'll present to you work Which was conducted together with my comber from Rambo's labs Julius Hermelin from Universität der Bundeswehr München Robert Primers from Garz University of Technology Simona Samajiska from Thradbad University Thomas Schaumbagger from cheer Munich myself and Emanuele Streeter from Fraunhofer Isaac and Christine from Ferdinand from NXP Semiconductors In this talk, I'll first go through some backgrounds Present to you our K-Trace attacks followed by our results and conclude with a short conclusion First lattice cryptography a lattice is defined by a basis By a basis given here as a consisting of the basis vectors in such a lattice I can we can define the learning with errors problems also called LWE Here it's given as the circuit vector s, which is distorted by a small error vector e and This results in a new vector t highlighted here in orange The computation of t with the knowledge of s and e is straightforward however going back to s without the knowledge of s or e with Going back to s without the knowledge of s or e with only the knowledge of t is rather tedious This problem is used in a number of schemes just to highlight three here Frodo uses the ring over the integers modulus q with a dimension of n of over a thousand New Hope uses ring LWE The ring here is a polynomial ring which allows for faster computation The polynomials are up to a degree of 1024 and reduced with a reduction polynomial Kyber as another level on top of this polynomial rings as as a generalized vector field also called a module This module is of degree K K is two three or four depending on this different security levels K is two for Kyber five twelve K is three for Kyber seven six eight in case four for Kyber thousand twenty four This vector generalized vector field still relies on the polynomial ring underneath with the polynomials being up to the being of degree 256 on this crystals Kyber's defines a key encapsulation method this first Starts with Alice generating a uniform basis and then two secret vector sampled from a small binomial distributions and T is computed from that as I showed you in the slides before and Published as a public key together with the basis a Bob then generates a secret key M, which which they will use in the future communication and performs again an LWE problem For newly generated R with two error vectors and further embeds this message into the into the into V UNV are then sent back as a ciphertext Which Alice can use together with a secret S to re create regenerate and recover this message M You can see that this works down here as the major component cancels and only M Remains with some small error terms, which can be neglected This of course leaves out a lot of detail I want to men and mention just one detail the fuchisaki Okamoto transform, which is important to avoid a chosen ciphertext attack and here Alice Performs again the decryp the encryption part as part of the encryption is deterministic With the knowledge of the message and the public key to assert that this was an honestly generated ciphertext Our attack is a site channel attack so we We assume an attacker that is powerful enough to record a site channel It doesn't to be need to be such such a powerful attacker to open the chip and use electromagnetic measurements We simply assume an attacker who can measure do a simple power measurement of an other device and simply record the power consumption of the device while it's computing With this we're able to choose our attack step as the decryption step before this Before the verification of the ciphertext So we're attacking this this multiplication here and as the ciphertext is verified after this smart this Decryption step is performed. We able to still perform a chosen ciphertext attack Let's have a closer look at this decryption step here. We have the multiplication of the secret vector s and the the ciphertext to you direct computation of this would be Rather slow, so there's already performance optimization built into kyber using the entity to allow to allow usage of the convolution theorem the entity is similar to a to a fast Fourier transform and so we essentially in a Fourier domain or an entity domain in this case and Which allows to use this point-wise product and improve improve the complexity from n squared to n log n Not also kyber uses ciphertext compression So the ciphertext is not directly transmitted as you but rather transmitted as c1 Which is a compressed representation of the ciphertext, so we as an attacker do not have control of the over the lowest bits of you Our point of attack is the inverse entity after the point-wise product of the secret key s and our ciphertext you Let's have a closer look at this inverse entity This should be similar familiar if you've seen the fast Fourier transform before we have the entity domain coefficients on the left side and the regular domain coefficients on the right side and There before the entity is performed in a butterfly structure So always pairwise addition and subtraction of the coefficients with a multiplication of the face Here we have the butterfly operation between nearest neighbors Here we have them between neighbors with a distance of two and here we have them between neighbors with a distance of four and The the face here is the nth primitive root of unity in the integer field modulus q However, normally for this polynomial ring you would use the two nth primitive roots of unity However, kyber doesn't have a two nth primitive roots of unity in its integer ring So the entity splits into two half entities of 128 coefficients each this is a detail that's important for the attack and So the odd and even coefficients never mix so we can attack and Consider them separately. However, they mix in the pairwise point-wise multiplication for the site general analysis We we assume a similar attack as was used In previous works here, we record the leakage at the different points in between the layers So we assume a leakage during the load and storage operation in between the layers in the pq clean implementation under arm cortex m4 This was done in previous work by template matching of a Hemingway leakage for the 16-bit signed integer So to highlight this and prior work Especially those two prior works that focus on the entity by robot pre met Peter pestle and Stefan mine got published in chess 2017 as a single trace site channel attacks on mass lattice based encryption and they first used the belief propagation network to To be able to exploit this leakage at different points throughout the entity They used this butterfly They represented this butterfly operation within the entity as a simple factor graph of addition and subtraction within a belief propagation and We're thus able to combine the different leakage knowledges of the different positions in their second paper presented in latin crypt 2019 They merged those those factor factors to a single butterfly factor within each butterfly and further improved performance Those attacks already were showed this paper already showed a practical attack on a cortex m4 However, the main limitation here was that noise tolerance with masking was rather low This brings me to the contributions of this paper in this paper We present a no novel sparse chosen cipher text attack strategy with a higher noise tolerance This means we first generate a cipher text which is sparse in the entity domain after compression after decompression, sorry and Further we are able to risk recover this product of s and u which I'll call w hat in the following with a simple side channel leakage of the inverse entity And further we present an attack strategy how to recover the long-term secret s with one 2k traces and From this partial knowledge of w hat and a sparse support of you had Our attack is further applicable to mass implementations of kyber as well And we verified implementation or attack via an implementation, which will publish open source And our belief propagation was written in rust speed optimized a multi-threaded And we have a simple python simulation for the leakages with pq clean and the further analysis first the sparseness our inverse entity Improved our belief propagation for of the inverse entity improves drastically by employing sparseness sparseness means we for example, we set every second coefficient here to zero and This allows us for example within this green blocks to only have all the leakage values only depend on the signal value so those three leakage points here all depend on w zero and No other unknown code no other unknown value So we can combine them straightforward through our belief propagation and throughout the rest of the entity of course as well So the first challenge is to generate those sparse vectors One straightforward approach was to use the structure of the entity here as well We want to have something sparse in the entity domain and we want something compressible on the left side. This is a this is a Requirement that we need to fulfill in order to be able to send this to our victim as Kyber uses this decompression step and we do not have control over the lowest bits So for this we can use for example set in an intermediate layer here after layer one set all coefficients to zero except the first from this we already know that the sparseness is Is fulfilled as only the top coefficients can be non-zero and all the bottom one will be zero as the top and bottom half never mix After the first half after the first layer of the entity on the left side It's only two coefficients which are non-zero also as they are the only ones that depend on this intermediate value And so all we have to do is try all different intermediate values at this point here after the first layer to find something that is compressible on the left side This is possible down to a quarter of the coefficients for the regular Kyber and even a eighth of the coefficients for Kyber 1224 however the sparseness is only in contiguous blocks and To improve on this our first approach was to rearrange the order of the layers in the entity Which look quite promising as it shifts the sparseness and spreads it throughout our and entity representation However, this does not change the values. It just promotes the indices and so doesn't give us a valid entity output If we just write them in this order So this was essentially a C fail So we had to resort back to a different solution to use for example BKZ as a solver for a shortest vector problem so for this we can look again at the compression which reduces and the Reduces the number of bits to D bits for Kyber 512 and 7 6 8 to 10 bits and for Kyber 1024 to 11 bits Q being around 14 bits So we we have a multiplication by 2 to the power of D and a division by Q in a rounding operation and Our requirement is to find something that is sparse in the entity domain and thus and at the same time Compressible so it should be and multiplied by the by 2 to D close to zero modulus Q So we can write this again as a shortest vector problem modulus Q with a sparse support in the entity domain and this multiplication of 2 to D 2 to the power of D and We were able to find such solutions using BKC with a block size of 70 or a block size of 80 depending on the different scenarios and With a block size of 70 for all Kyber variants. We were able to reduce the number of zeros number of non zeros down to 64 out of 256 for Kyber 1024 We could even use BKC 80 to reduce the number of zeros down to 32 as here the compression is a little more relaxed This takes some some time only a few minutes or for the for the 32 non zero coefficients a few hours However, this can be all performed as a pre-computation as it's independent of the secret key Now we have this sparseness sorted out Now we have to recover from this sparse knowledge of our s hat Coefficients we have to recover the original s again. We can use the the structure of the entity For example, if we know all the coefficients in the top half and we have no knowledge of the bottom half We can still recover them by simply computing back the last few layers of the entity as the top and bottom never mix again And then do a small brute force of this single coefficient, which only depends on two input variables Which are further sampled from a small polynomial Small binomial. So here the binomial region is between minus two and two So it's five different values to the power of two so it's only 25 different value pairs Which we have to brute force for each coefficient For the distributed sparse case, we cannot use again the strike cannot use the structure of the entity again We have to resort back to BKZ, but again here we were able to solve this with BKZ in most BKZ 18 most cases However, this requires again some computational time. We only used a sage implementation here But it we showed that it's it's possible and this is again an offline part of the attack Which can perform can be performed after the traces were recorded Last but not least how does masking affect our attack? We could not find a masked implementation to verify this with however Normal cases of masking normally consider masking of the secret So you have the secret split into two shares and then perform the computation each of the two shares individually Our ciphertext is publicly known So there's normally no need to mask the ciphertext and thus our sparse vector is multiplied to each of the two shares individually We can attack both shares independent of another and combine them within the same trace So our attack is barely affected by masking as We can recover this the same the same coefficients of s hat within the same trace And then combine the traces afterwards once we already know the unmasked coefficients of s hat our results first We have them for the contiguous sparseness case, which is easy to generate and easy to solve afterwards Here in the y-axis. We have the success rate on the x-axis We have the sigma as the standard deviation of the noise for the 16-beat Hemingway leakage in The top we have for the masked case in the bottom. We have for the unmasked case The different lines are for the different number of non zeros first. We have the blue for the non sparse case 256 out of 256 are non zero and for example the gray line is the 32 non zero coefficients, which is only applicable to kyber 1024 The purple line is the most sparse case for the other two kyber variants, which already improves the Already has a non zero success rate with up to a sigma of 1.2 This just to note also these values are all average over 25 experiment runs with a 95 95% confidence interval given by the shaded area By further spreading the sparsent and distributing the sparseness We are able to shift those grass further to the right and improving the noise tolerance And so in the mask case and we can have a perfect almost perfect success rate a very high success rate of for a sigma of 2.2 for kyber 1024 Note that there's barely any difference between masks and unmask as masking of the secret key Of the secret doesn't change our attack at all or barely changes our attack as we just have to do twice the number of belief propagation graphs however, each of them Needs to convolve independently Converge independently To summarize our results in a final table and here we have in the first column We have the sparseness so first is the most sparse case was is only applicable to kyber 1024 And then on the bottom we have the non-sparse case Our main focus was this key trace attack here, which means for kyber 512 We need two traces for kyber 7 6 8 we need three traces for kyber 1024 we need four traces Why do we need those number of traces which each trace we have we can we have 64 non-zero coefficients? This means we can recover 64 coefficients of s hat as As consists as we need a quarter of the coefficients to recover s within each vector component We need as many vector components as many traces as we have vector components So those are the this is why this K this number represents the K as the number of vector components in kyber With the K trace attack we can recover with a high success rate of over 70% with a maximum noise tolerance of 1.2 And if we could and normally you could assume that you can repeat the measurement multiple times if it fails You can increase noise tolerance up to noise tolerance of 1.4 For kyber 1024 We can further go and have an even sparser case and thus increase the noise tolerance up to a sigma of 2.7 in the six in beat hamming weight leakage With a comparison to previous work first with an unmasked case with peter pestle's 2019 work they were able to perform a real attack with a sigma of 1.3 so this means our K trace attack is viable and in a real setting and Further compared to mass simulations before our sigma definitely exceeds this whereas the previous work was only up to a sigma of 0.5 Note for those numbers here given in brackets for kyber 512 as BKC solving for kyber 512 due to the larger binomial distribution and is rather tedious and takes a lot of computational time over a few days and if you want all of your cases to be solved It's easy to simply improve an increase in number of traces by one and then being able to solve it with BKC 40 Okay, before we come to a final conclusion Let's simply discuss our further applications further applications to different schemes For example, New Hope it's already in and uses an entity So they're the application of the attack would be rather straightforward for different implementation in different algorithms Depends on the implementation if it uses an entity ring in the implementation this our attack would be applicable here also and for the other Cases when using Karatsuba to cook it might be interesting to further Investigate whether special sub blocks exist within Karatsuba to cook to also improve belief propagation here or use it here Further we focused our attack on the PQ clean implementation as this was the similar implementation as used in previous work for comparison And this already includes laser reductions like Barrett and Montgomery which represent a significant improvement in performance however PQ the current PQ informantation further includes merging of layers and butterflies and 32-bit loads This makes templating and profiling phase more difficult which we skipped in our paper as we relied on previous works here But once you have this sorted out and for example, you could template the 16-bit multiplications and This of course would be more difficult, but then the belief propagation network would look similar as now Possible countermeasures include for example extra masking of the input as I discussed previous Normally the masking focuses on secrets and not on publicly known variables However, if you also mask the input for example this you had here You could complicate our attack as we cannot assume the sparseness within the inverse entity anymore Further of course classical shuffling and hiding mechanisms within the inverse entity Of course and would alter our attack and make the templating rather difficult again To conclude our my talk on chosen ciphertext K trace attacks on mass CCA to secure Kyber I presented to you a novice bar CCA TQ CCA strategy with a higher noise tolerance than before This works with a sigma of up to 1.4 or for Kyber 1024 up to a sigma of 2.7 Compared to prior work in a similar setting this they only succeeded up to a sigma of 0.5 our attack strategy further is applicable to recover the secret s from 1 to K traces with increasing noise tolerance and With the number with the increasing number of traces Also our attack is applicable to massed limitations of Kyber and if you want to verify or news for yourself You can check out our implementation on github, which we publish open source with this paper and To verify it This brings me to the end of your talk if you have any questions You're more than welcome to contact me or the other authors and I want to thank you so much for watching and I hope you have a great day