 Hi My name is Alexander Nilsson. I'm a PhD student at Lund University in Sweden and at the security company at Venicam Today I'm going to present the paper a key recovery timing attack on post-quantum primitives using the Fuji Saka or Kumoto transformation and its application on Frodo Chem It was authored by myself and Thomas Johansson at Lund University and Shanggu also from Lund University Who as well as the time was working at the University of Bergen, Norway? So let's start with some preliminaries as we all know when crypto is implemented We run the risk of introducing new weaknesses not anticipated by the theoretical models under which many schemes are often proven secure Obviously this is due to the inability of these models to sufficiently capture the real-world behavior of silicon Whether it is implemented in software or hardware This disconnect between the theoretical and mathematical models and on-silicon behavior brings us to our first observation Implementing crypto is hard. We know this because over the years many so-called sign-general tags have surfaced We have the first sign-general tag on RSA and if a helman dating as far back as 1996 And open SSL has been attacked again and again and there are of course many more attacks than I've listed here But these are the most impactful attacks against the classical schemes that we use today But what about post-quantum schemes? well, yes, we have those as well for example against a McElise in 2010 and 2013 and against Bliss in 2016 and The most recent attack is against lack and ram steak That exploited the timing variations when executing the error correcting codes that these schemes employ to reduce the decoding failure rate The authors showed that the attack could be generalized to cover any scheme Which inherently employ error correcting codes, which are not implemented in constant time manner and here today I'm also presenting a general attack But we are targeting the Fuji Saka Okamoto construction, which was not previously known has to be implemented in a constant time manner And this is the core of our contribution Namely that even though as we'll see later therefore transform does not actually handle any secret information It still needs to be implemented in constant time This was apparently not well known before so we can see that the list of schemes that at some point in time Did not implement therefore transformation in constant time. It's not so short We have for example photo cam, lack, bike, HQC, ROLO, and RQC and We want to know that this list is by no means comprehensive In the paper we have selected photo cam to demonstrate the attack due to its conservative design and The designers clear statements of otherwise using constant time and side channel resistant implementation I also wish to add that partial attack details on a theoretical attack on lack is included in as an appendix to the paper Before we go any further I think it would be a good idea to very shortly go through some background just to establish some of the core concepts that we are talking about I'll try to be as brief as possible First up is public key encryption Which is most simply defined as a triplet of algorithms namely key generation encryption and decryption But many of the schemes in the NIST post quantum crypto standardization project are not actually defined as a PKE scheme But are instead defined as a key encapsulation scheme That instead of encrypting a plain text directly produces a shared secret by which one can use symmetric crypto to encrypt a variable sized payload And instead of encryption and decryption algorithms we talk about encapsulation and decapsulation algorithms Now the security of a PKE or a chem can be defined in a variety of different models But the common strategy is to analyze a PKE scheme in the model indistinguish ability under chosen plain text attack Which is defined as a security game where the adversary only has access to the public key and the public encryption as well as the polynomial bounded amount of computational resources The goal of the game is to for the adversary to form a cybertext determine which out of two publicly known plain text the ciphertext corresponds to This model is sufficient for some use cases But not for all for all the use cases a stronger model is desired and I mentioned in CCA here It differs from the CPA models in that the adversary now has access to a decryption oracle Which answers with a corresponding plain text to any ciphertext except for the specific ciphertext used in the challenge of course in the next PQC project many submissions use the same approach to construct a CCA secure chem from a CPA secure PKE This approach make use of the Fuyusaka Okamoto transform Before we talk more about the fo transform I want to mention an important property that many of the PQC schemes both in and outside the NIST project share The property I'm talking about is the way that these schemes encrypt messages on a high level They do so by first encoding it and then by adding a randomized error vector This means that when they Decoding the error vector must be removed by either some form of lattice technique or by decoding algorithm Basically, it's on this property that is cured of the scheme's hinges one But the thing that we care about today is the fact that if we modify the ciphertext of these schemes with a small amount We can still decrypt the same original plain text However, if we modify the ciphertext with a larger amount we suddenly fail to decrypt or decrypt another unrelated plain text This property is one of the reasons why many of the schemes are only CPA secure in the basic construction It's also one of the things that is fixed by the fo transformation So let's talk a little bit about how the f-form transform looks like and how we can make the scheme more secure So to convert a CPA secure PKE cipher into a CCA secure chem We can use the algorithm one here as shown Basically, we take as input a public key and output a ciphertext and a shared secret First we pick a random value M Then we use the pseudo random function that generates values indistinguishable from true randomness From the sampling of this hash function we get the random looking values R and K The next step is to call that an encryption function from the PKE scheme as Input we have the public key and the message M Which are as a source of randomness in order to make the schemes deterministic Note here that the encryption function here is a call to in CPA secure PKE cipher And I would also like to point out that there are many variations of the fo transform And the one presented here is the one that is used by photo chem But essentially all camps in an East PKE C's projects use a similar construction Okay, so that was the encapsulation function next step is how to do the decapsulation Decapsulation is just as conceptually simple as the encapsulation function. Although we need a few more steps to describe it What we do is to decode the ciphertext and then re-encrypt it and compare the new ciphertext to one that was received Algorithm two looks like this where we have the ciphertext as well as the secret and public keys as inputs The output is hopefully the same share secret as was given by the encapsulation function like I said first we Decrypt the received ciphertext using the secret key. We call that decrypted ciphertext M prime Then we use the same hash function as in the encapsulation function to generate our prime and K prime The public key M prime and R prime are in the same manner used to generate a new ciphertext called C prime here The next step is the important part because here we just compare the two ciphertext with each other And if they are equal then the shared secret is computed identically to the encapsulation function If they are not equal however, we generate a different shared secret This decrypt and crypt compare procedure is what provides us with CCA security It's also what is removing their non deterministic decoding property from the underlying PKE scheme So now the question is how do we implement this in software? Well, it's common knowledge that all secret dependent Undressing and branching must be implemented in constant time But what about this comparison here? Because the original ciphertext is not a secret and neither is C prime since it's also known to the sending party in an attack scenario So you would be forgiven to believe that this comparison do not need to be implemented in constant time But as my co-authors and I will show This would be a mistake Okay, so with the background out of the way we can proceed to talk more about the attack I'm first going to talk about the attack in general terms before we dive into the details of the attack on Frodo Ken So as we saw before on the previous slide We know that in many cases the comparison step in the effort transform is often implemented with the C function mem comp And here is how we can use such an implementation at the top. We have the two ciphertext C being the ciphertext as received by the decapsulation function and C prime being the re-encryption of the decoder plaintext The two values are compared in the effort transform with the mem comp function Which has a short short circuiting behavior as our first assumption For the second assumption we make use of the non-deterministic property of the underlying CPA decryption function That is a small modification to the ciphertext C before entering the decapsulation algorithm Will result in a C prime that is identical to the original ciphertext This means that mem comp terminates at the fourth comparison in our example here However, if we make a large modification instead Then we assume that C prime will be decoded to something completely different This is assumption 3, which means that with high probability mem comp terminates at the very first comparison step This means that due to the short circuiting behavior of mem comp We get a very different timing profile for these two cases The strategy then is to instead make modifications at the end of the ciphertext in order to enlarge the differences as much as possible By doing this we can Using for example binary search find out the exact amount of modifications necessary to flip C prime to a different ciphertext Then we simply perform the attack multiple times while measuring the execution time of the decapsulation function in order to distinguish between the two cases When we have done this using the recorded knowledge of M we can extract secret information from the chem scheme we are attacking Although this part is highly dependent on the actual scheme in question We can summarize the previous slide as an algorithm for finding out whether or not a ciphertext modification D Would result in a decryption failure in the CPA secure decryption function Here we have algorithm 3 Where we input a plaintext M and amount of modifications to perform in the value Variable D The output is whether or not the value D results in a different value of C prime or not first we encode the plaintext as in the original encapsulation function Then we do the modification the main step of the attack is to send our modification To the target and record the side channel information of the decapsulation function In our case we simply time the execution The last step is to determine whether or not the CPA call resulted in a different ciphertext or not We use the side channel information to make this decision Unfortunately using this information to mount an attack requires adaptations to the actual chem scheme But algorithm 4 here highlights some of the general steps First we loop a predetermined number of times in such a way that we find a Set of D and M values such that the value of D represents the exact maximum Modification possible without causing a decryption failure in the CPA call We propose to use binary search to find this value Then we use the set of plaintext and corresponding D values to somehow extract the secret key In order to provide the more details on the attack we must limit ourselves to a specific scheme We use here for the chem as an example, but we note that there are other possibilities The key generation algorithm for Frodo chem specification looks little like this But I have simplified it as much as I could so that we do not get hung up on unimportant details I Wish to draw your attention to the last equation for the public key Where both b and a are publicly known matrices and both s and e are arrowed matrices with small values Both are secrets and s is saved as part of the secret key Make a note of this equation because I'm going to refer to it later The encapsulation algorithm is too large to properly display here in full details So instead I'm going to once again show a simplified version a Uniform random plaintext is first chosen. It is then used to generate some set of random bit strings That in turn determines arrow matrices s prime e prime and e double prime the cipher text Contains two parts One denoted as b prime being s prime times a plus e prime and the second part denoted as c capital being s prime times b plus e double prime Plus the encoding of the random plaintext them These matrices are converted to bit strings using the Frodo pack algorithm And finally return the chat secret and the ciphertext Continuing with our simplified presentation of Frodo. We now arrive at a decapsulation function For the first step the decapsulation removes all the noise and hopefully recovers the initial plaintext We call this value m prime then we use the m prime to re-encrypt using the same steps as in the encapsulation function This results in the new ciphertext parts b double prime and c prime And lastly we do the comparison to get the chat secret Just like in the general effort transform The useful observation here is that a quality check in Frodo chem is implemented with memcomp as previously discussed In the decapsulation step we have the step where we compute c minus b prime times s By doing some substitutions one could easily see that this actually confused encoding of the random value and prime plus s prime times e minus e prime times s plus e double prime From the specification of Frodo, we know that s s prime e e prime and e double prime All have small entries in their respective matrices This means that the entire tail part of this equation will also have somewhat small entries and can be regarded as noise and We call this the combined noise matrix or as e triple prime Now we observe that the values in s prime e prime and e double prime are known And we'll also know that e is equal to b minus a times s From the equation in the key generation From this we can see that we will have linear equations for each entry in the secret key matrix s If we can only figure out the value of e triple prime Each time we run our attack algorithm. We get The value of a single entry in the combined error matrix e triple prime Such entry provides us with a single linear equation of the secret matrix s Therefore we need to collect a roughly as many entries as we have entries in s All right by using a lemma from the FrodoCAM specification We know that the range of values for each entry in the noise matrix that FrodoCAM can handle without experiencing a decoding value as You can see I'm going to save some time here by skipping all the details of deriving this range and The bounds are that each entry of the error matrix must be greater than or equal to minus 2 raised to the power of D minus Bp minus 1 Symmetrically It must also be less than 2 raised to the power of D minus Bp minus 1 This means that we can recover each index ij from e triple prime By determining the value x0 such that e triple prime plus x0 equals the upper interval limit We can do this by using the generic side channel attack as previously described So using binary search we can do this efficiently since we can detect when the value x is either too large or too small So for each tested value of x we get two possibilities If the value is not large enough To cause a decoding failure the two ciphertexts only differs somewhere in the last positions Meaning we get that close to normal execution time But if on the other hand the tested value of x is larger than FrodoCAM's design interval Then we end up with totally different ciphertext Ciphertext which means that the memcom function will terminate early and we get a shorter execution time Due to the implementation of memcom we decided to introduce the added noise x at the tail part of the ciphertext in order to enlarge the time difference Now that we know more of the attack. I think it's time to take a look at some of our results We start by only measuring the memcom function as employed by FrodoCAM. Here we have run 10,000 measurements for three different cases First we set x to zero That is we make no modifications, but simply run the decapsulation algorithm with a valid ciphertext Then we make the smallest possible modification and Set x to 1 and finally we make the largest possible modification And for each case we measure the timing behavior of only the calls to the memcom function As you can see the Difference is huge. We basically go from around 5,000 clock cycles to close to zero I should note that there are no in-between cases the timing behavior Will always fall into one or the other category as you can imagine It would not be difficult to construct a distinguisher for these two cases However, if we restrict ourselves to only measure the entire Decapsulation function as a real-world attack would force us to Then as we can see the differences are no longer that huge This means that we must collect many more samples before we can distinguish between these two cases The differences in runtime are actually as low as 0.04% So in our lab environment, we got good results with 97,000 decapsulations per binary search In order to extract a single entry in E triple prime and for FrodoCAM 1344 we have a noise matrix with a size of 1344 times 8 Combining this we can make a complete key recovery using an estimated 2 to 30 decapsulation queries For FrodoCAM 1344 AES So we have just about reached the end of this talk and I wish to reiterate that the attack We have shown is a general one and that all PQC schemes should quickly move over to use constant time implementations Also for the Fuyusaki Okamoto transform and so finally for the conclusion I have here an excerpt from the FrodoCAM specification where the authors claim to have an implementation That is protected against timing and cash attacks. I have spent some time looking through the FrodoCAM code base And so far I found nothing that would invalidate this claim in so far as dealing with secrets are concerned But as we have shown in our paper Implementing crypto is hard and Even when you do everything right it turns out that you had to do just this little bit more Thank you all for your attention and please if you have any questions whatsoever. Do not hesitate to contact us Thank you very much