 Hello everyone, I'm Prasna Ravi from Nanayang Technological University, Singapore, and this work was jointly done with Sudarshan Haroi from University of Birmingham, Anupam Chettopadhyaya and Shivan Basant from NDO Singapore. Cash and encryption keys has been used as a standard technique for efficiency reasons in several protocols such as the TLS and IKE, mainly because it reduces the overhead from generating new keys after every encryption. And the security of several public key schemes in such a scenario when static keys are reused has been studied for a very long time dating back to the attack of Blythand Bahar reported way back in 96 on RSA. And these attacks typically work by acquiring the decryption device using malicious cipher texts and the attacker basically makes several guesses for the decrypted message and computes the corresponding shared keys and checks whether the shared keys actually match or not over several cipher text queries. And he obtains such information over several invalid cipher texts and he will be able to recover the secret key. And lattice based public key encryption schemes are also known to be susceptible to such chosen cipher text attacks when the static keys are used. And there are two ways to protect against such attacks. The first straightforward technique is to totally avoid key caching but it has a considerable impact on performance as you're using middle keys. And the second more concrete technique is to adopt chosen cipher text security which concretely removes the presence of a key mismatch oracle. Now the Fujisaki Okamoto transform is one such technique that's been adopted by several lattice based schemes to provide chosen cipher text security. And this technique mainly works by performing a re-encryption procedure after a decryption and this helps to check the validity of the received cipher text. And if the cipher text is invalid and the effort transform detects that it's an invalid cipher text and it completely rejects it. So the attacker for an invalid cipher text will in no way get any information about the decrypted message as the effort transform either aborts the procedure or just generates a pseudo random key. So this assumption ideally holds in a black box setting but when you bring side channels into the picture one can ask a question as to whether it's possible to utilize side channel information to get information about the decrypted message. And in fact a de-unverse et al demonstrated such an attack where they utilize timing information over variable time implementations of error correcting codes used in a couple of post quantum schemes lack and ram stick to recover some information about the decrypted message here and they were subsequently able to perform a full key recovery. Now this work raised several questions about the existence of other side channel information that could potentially be used to perform such attacks. In this work we basically externally generalize such approaches and we apply the attacks over six lattice based schemes that were all round to candidates in the next standardization process for post quantum cryptography and we identified two types of EM side channel vulnerabilities which can be used to efficiently instantiate an oracle that can subsequently be used to perform a full key recovery. We perform all our attacks on implementations in the ARM Cortex M4 microcontroller taken from the PQM4 library which is a publicly available library with optimized implementations for the same platform. And we demonstrate all our attacks on the same platform and all our attacks could be performed in a matter of minutes and only using a few thousand traces. Firstly, I start a bit with some background on lattice based public encryption schemes. Now all the targeted schemes in this work are based on the well-known learning with the error or learning with rounding problem and six out of the 26 schemes in the NIST competition in the second round were based on different variants of the LWE or LWR problem. Frodo was based on the standard LWE problem which involves computations over matrices and vectors. New Hope, LAC and Round 5 are three schemes that are based on the ring LWE or LWR problem which involve computation over polynomials and polynomial rings and Kyber and Saber which are also third round candidates are based on the model LWE or LWR problem and they involve computation over matrices and vectors of polynomials in efficient polynomials. And all the aforementioned schemes are secured in the chosen ciphertext model. However, in their core they contain chosen plaintext secure encryption scheme which is based on the well-known paradigm of the LPR encryption scheme which is named based on the authors Lubyshevsky, Peikert and Regev. I'll briefly explain the LPR encryption scheme using the ring LWE problem. And let's assume that all the operations performed on the polynomial ring are q equal to zqx over x per n plus 1 where all the polynomials have a degree n minus 1 and all the coefficients lie in the integer ring zq. In the key generation procedure we typically just generate an LWE instance. So we have a component A which is a public constant that's generated from a public seed and we sample two small polynomials S and E where S is the secret polynomial and E is the error polynomial and these two polynomials are sampled from a very narrow distribution. Now we generate the component T which is the LWE instance that's equal to A times S plus E and the pair A and T form the public key while the secret polynomial S forms the secret key. When you look at encryption we sample three ephemeral polynomials S prime, E prime and E double prime from a very narrow distribution. And we generate a component U which is an LWE instance that's A times S prime plus E prime and the message M which is to be encrypted is a bit vector and since all computations are performed on polynomials we need to convert this bit vector into a polynomial. Now commonly it is done by directly mapping one bit to its corresponding coefficient. So if let's say a bit is one then it's mapped to a Q by two which is the center of the integer ring ZQ and if a message bit is zero then it's encoded to zero. Now we generate another LWE instance as T times S prime plus E double prime and we add the encoded message polynomial to this LWE instance in order to hide it within the LWE instance. And the corresponding components U and V are formed the ciphertext and in the decryption procedure we compute the product U times S and we subtract this product from the ciphertext component V which generates the message polynomial but it contains some errors. And this error message polynomial is decoded back to the message bit vector using the poly decoding function. And this scheme is basically secured in the chosen plaintext model and since these schemes are based on the LWE or LWE or LWE are problem they are associated with a certain decryption failure rate. And however the key criteria for CCA security is to have a negligible decryption failure rate and there are two ways to satisfy this criterion. Some schemes are design parameters with negligible failure rate that's a straightforward way. However there are other schemes such as round five and lack which utilize error correcting codes to artificially reduce the decryption failure rate. So how they do that is that they take the message vector M and then they basically encode it using an error correcting code to convert it into a code word and then they use that code word within the encryption procedure and within the decryption procedure we basically decrypt to the code word and then what we do is we apply a decoding procedure to basically get back the message from the recovered code word. Looking at chosen ciphertext security as stated earlier the effort transform has been the technique that's been widely used in several lattice based schemes and it's basically used to convert a chosen plaintext secure public encryption scheme into a chosen ciphertext secure key encapsulation mechanism and basically forms a wrapper around the encryption and decryption procedures using several instantiations of one-way functions and basically works by checking the validity of the ciphertext through a re-encryption after decryption. So looking at the encapsulation procedure we have a message here that's generated from a random cvue and we basically hash the message with the public key to generate a random r and if required we encode the message to a code word and we subsequently perform an encryption procedure and it's important to note that the random r here is used to deterministically generate the ephemeral secrets s' e' and e' in the encryption procedure and subsequently the shared key k is nothing but is basically obtained by hashing r with the ciphertext and in the decryption procedure we first perform a decryption of the ciphertext to get back the code word c' and we decode it back to the message m', then we repeat the same steps that's performed in encapsulation that is we hash the message with the public key and get r' and we perform the re-encryption procedure here and we compare the generated ciphertext with the received ciphertext and if they are the same then we know that the ciphertext is valid, it generates a random key. Now one can easily see that for an invalid ciphertext we only generate a pseudo random key and hence an attacker cannot gain any information about the decrypted message for any invalid ciphertext. Now this is the overall block diagram of the cc secure decapsulation and in order to attack the cc secure decapsulation we need to answer two key questions. Firstly, how do we utilize side channel information to instantiate an efficient plain text checking oracle? Now if you look at the block diagram of the cc secure decapsulation we can see that the secret code word is first supplied to the error correcting codes if a scheme chooses to utilize an error correcting code and subsequently it's provided given back to the effort transform. Now in order to gain information about the code word we supposed to obtain side channel information from some operations within the error correcting code and in some schemes which do not utilize the ecc the message is directly given to the effort transform so we supposed to obtain some side channel information from the effort transform to obtain some information about the secret decrypted message here and the second question that we can ask ourselves is how do we craft chosen ciphertext such that the output of the oracle instantiated oracle can be used to recover the secret key. So we start by first explaining how side channel information can be used to instantiate an binary oracle. So we start so first I'll explain about the experimental setup. So as stated earlier we perform all our attacks on the arm cortex and for microcontroller on implementations taken from the pkm4 library. Our implementations run at a frequency of 24 megahertz and we obtain EM side channel information using a near field EM probe and record measurements from a lecro oscilloscope at 500 mega samples per second. Firstly we look at se over error correcting codes used in the round five scheme in round five a novel error correction scheme called xcf is used which is a linear parity code and you can correct up to of f errors per code word. So we have a message here that is k bits and there is a corresponding message polynomial that's mp and the encoding procedure typically works by computing a set of two f registers according to this particular equation these registers are also binary polynomials and they can be looked at as corresponding bit vectors and the code word is nothing but the message appended with the register set r. In the decoding procedure we again reconstruct the register set r double prime from the obtained message m prime and we flip those bits in m prime which satisfy this particular condition. I'm not going into too many details about this particular condition however an important point to note is that the decision making process as to whether to flip a particular bit or not is implemented as a majority logic operation using bitwise operators which we found by analyzing the implementation and we also observed that if the code word is valid then all the inputs to this particular major majority logic operation are all zeros. However if a code is invalid even with a single bit error one or more in fact several inputs are non-zero and we observed that the differential behavior this differential behavior based on the code words validity can be easily observed through side channel information through side channels and in particular we're interested to distinguish between two types two particular values of the code word one is an all zero code word c equal to zero which corresponds to code word for an all zero message m equal to zero and the other code word which is an invalid code word is c equal to one with a one bit error in the lsv position. Now the reason why we are interested to distinguish between these two particular values for the code word is that as we'll see later in the talk one can construct stores and ciphertexts that can decrypt to these particular code words irrespective of the value of the secret key and in fact those stores and ciphertexts can subsequently be utilized to perform a full key recovery. So we are basically interested in distinguishing between the two cases c equal to zero and c equal to one and in order to do that we utilize a t-test based reduced template approach to distinguish between the two cases. So this technique works as follows. So we first construct ciphertexts for code words c equal to zero and c equal to one which can be easily done by simply utilizing any value of c according to this particular encryption function and we basically collect n traces corresponding to the decryption of c equal to zero and c equal to one and we basically normalize the traces and compute the t-test between them and we can see that there are several peaks in the t-test which shows that the two computations are easily distinguishable at those points at several points in fact and this t-test plot was obtained for this particular parameter set of long five and we choose those points that are above a certain threshold comfortably above the t-test threshold of plus or minus 4.5 and we construct a reduced trace set using those points of interest and then we take the mean of the reduced traces and use them as reduced templates for the two cases and in the attack phase one can perform a simple template matching with the reduced templates to determine the class corresponding to the attack trace. Now in case of lack, lack is also another ring LWE scheme that utilizes the BCH code for error correction. As stated earlier, DeAnvers et al demonstrated a timing attack on a variable time implementation of the BCH decoding procedure but in our case we are utilizing the EM side channel and the decoding project typically works by computing a syndrome for the received code word and it's well known that the syndrome is zero for a correct code word, a valid code word, however it's non-zero for an invalid code word and hence we hypothesize that the syndrome computation can probably be targeted through side channels and can be used to efficiently distinguish between the two cases. Even here the two cases are nothing but c equal to zero which is a valid code word and c equal to one which is an invalid code word with a single bit error and we follow the same approach to distinguish the two cases and we can here again see that the t-test gives rise to several peaks showing that there are several points that are easily distinguishable. And looking at Kyber which is a scheme that does not use any error correction we need to analyze the decapsulation procedure to understand where is to probe for side channel information. Now we can see that the secret decrypted message m prime is immediately given to a hash function here with the public key and we feel that since this operation directly operates over the decrypted message this can be a very nice target for to obtain side channel information about the decrypted message and here again we like to differentiate between the two cases m equal to zero and m equal to one and the deficient property of the hash function ensures that even a single bit change between the two cases can help us to efficiently distinguish the two computations over the EM side channel and we capture EM trends corresponding to somewhere close to the end of the hash computation to nicely distinguish between the two cases and here again we utilize the t-test based template approach and we can here again see that there are multiple points about the t-test threshold showing as the two computations can be very easily distinguishable. So in the second part of this talk we'll be talking about how to utilize this side channel oracle to craft chosen cybertexts and correspondingly perform secret key recovery. So here again we assume that all operations are performed over the polynomial ring rq and if you look at decryption the first operation is the multiplication of u times s and if you look at the product r all every coefficient of r is dependent on all the coefficients of s and after this multiplication all the other operations are coefficient wise operations and hence even every bit of the decrypted message is dependent on all the secret coefficients. Now in order to break this dependency with all the secret coefficients we basically choose constant values for the ciphertext components u and v and if we do that then the resulting product is nothing but a scalar multiple of the secret and hence each coefficient of r i is only dependent on its corresponding coefficient of s i and subsequently even every bit of the message is only dependent on one on the corresponding secret coefficient. So we easily break the dependency by choosing constant values for u and v and now what we do is we need to choose values for k u and k v the constants based on certain conditions. Now if we look at the value of the message bit m i prime we see that in the the first bit m m zero prime is dependent on both k u and k v however the other bits are only dependent on k u. Now we choose those values of k u and k v such that the first bit is a function of s zero however the second bits always decrypt to zero no matter what is the value of the secret coefficient. Now in this way we ensure that the decryption output can only take two possible values zero or one and the decryption output m prime only depends on s zero only depends on one particular coefficient and subsequently we can run a brute force over the values of k u and k v such that the different values of k u and k v can uniquely identify the value of s zero based on the value of m prime. So in to put it more clearly we can construct a nice x o table and here we have a table for this parameter set of round five and with just two chosen ciphertexts we can uniquely distinguish every possible candidate for the secret coefficient. In round five round five utilizes ternary polynomials and hence we just require two chosen ciphertexts to uniquely distinguish every candidate for the secret coefficient based on the values of c prime whether it is zero or whether it is one. While the previous technique helped us to recover s zero the same technique can be adapted to also perform full-key recovery. The way we do that is by exploiting a property of polynomial multiplication in the ring wherein multiplication of polynomial s with x per i rotates the polynomial s by i positions in an anti-cyclic fashion and if we choose u and v such that it has non-zero coefficients the ith position then the first bit of the decrypted message is no more dependent on the corresponding coefficient s of zero but it depends on s of n minus i that is the coefficient of the n minus ith position and all the other bits of the message are zero. That's by simply changing the location of the non-zero coefficient of u and v then we can fully recover the secret key one coefficient at a time and as we can see we exploit the efficient rotation property of polynomial multiplication in the ring to restrict the number of values of the decryption output to just two and onto some experimental results here we perform practical experiments on one parameter set each for round five lag and kyber and you can see that the number of attack rates ranges from 2.9 k to 7.7 k and the time taken is ranges from 4.5 minutes to 10 minutes so the attack can be considered to be very practical both in terms of time and also in terms of the number of traces and also the number of pre-processing traces remains constant because we essentially have only two templates which are built using 50 traces each. Now because all schemes differ in certain minor technical details some schemes require us to adapt the attack in a different manner so looking at Frodo which involves computations over matrices and vectors the rotation property of polynomial application does not hold and hence the position of the dependent bit in the decryption outputs output keeps changing based on which secret coefficient you want to recover and hence in order to recover the secret coefficient in the ith position one has to distinguish between m and all 0 m and m that is equal to 2 power i meaning the ith position one and the ith position and zero at all the other positions so this requires us to generate many more templates and these are the number of templates that are required to be generated for different parameter sets of Frodo and with respect to New Hope it utilizes a retentancy approach wherein it encodes and decodes a single bit into multiple coefficients of the message polynomial so in case of New Hope 512 one bit is encoded into two coefficients and two coefficients are decoded back to recover a single bit and in case of New Hope 1024 the retentancy factor increases to four so even though you choose the ciphertext according to our earlier technique every message bit at least here in case of New Hope 1024 every message bit will be dependent on four secret coefficients which cannot be avoided and in case of New Hope 512 it depends on two coefficients and hence instead of recovering integer coefficients we now need to I uniquely distinguish between pairs and quadruples so in New Hope 512 you have to distinguish between 289 candidate pairs and in New Hope 1024 you have to distinguish between 85,000 cases which requires a lot of ciphertexts and we do not run a brute force to generate a unique distinguisher for all the cases but we rather propose a two-stage randomized approach which heavily reduces the effort to search for the required ciphertexts I'm not going to the details of this technique it is available in the paper however we ran some attack simulations and could perform full key recovery assuming an ideal plaintext checking oracle and for New Hope 512 it required about 6,000 queries and for New Hope 1024 we required about 26,000 queries to perform full key recovery and now on to countermeasures masking serves as a very good countermeasure against the attacks that we proposed over targeting both error cutting procedures and the effort transform however to the best of our knowledge there is no masking scheme that's available to protect the error correcting code and it's not been studied yet so it can be an interesting avenue for research now however on schemes that target the effort transform masking can concretely protect against an attack as there are many masking schemes that are known that that are known to provide full security for the complete CC security encapsulation procedure and our attacks again reiterate the need for evaluating concrete masking cell countermeasures to protect lattice based schemes against side channel attacks so to conclude we basically propose generic approaches to perform side channel assisted chosen ciphertext attacks and we perform attacks over six lattice based schemes which are all part of the second round of the misunderstanding process and in fact the three schemes have also progressed to the final round and we identify vulnerabilities over the error correcting procedures and also the effort transform to efficiently instantiate a side channel plain text checking oracle and we subsequently also perform full key recovery on implementations of the arm cortex and for microcontroller and all the demonstrated attacks could be performed in a matter of minutes only using a few thousand traces and as stated earlier our attacks again reiterate the need to actually evaluate concrete masking countermeasures for lattice based schemes to protect against attacks such as ours thank you if you have any questions please feel free to ask