 Welcome to the first session of the morning and this session is going to start with the best paper award. Yeah, okay, so when we were putting together the program, we wanted to do two tweaks to how the best paper award worked, I guess. The first is we wanted to allocate a longer slot to the best paper in recognition of that. So this is sort of halfway between an invited talk and a standard talk, I guess you could say. And the second thing is we put the invited, we put the best paper award talk the morning after the banquet. It's kind of a carrot to encourage people to come along. I'm kind of glad that that sort of worked out. There's plenty of people here. And the reason for that is that we also wanted to give out the best paper awards to the authors before the presentation, rather than at the Rump session, partly because that means they'll actually be here. So the best paper award for chess 2018 was voted for by the editorial board. As you heard the other day, we had 180 or so submissions, 40 or 50 or so of which were accepted. So this is a fantastic achievement, you could say. So I'd like to invite Martin and Amit, Kenny's not able to be here onto the stage to receive the award. So we also have a gift. We weren't able to locate any cold boots, but we did get some wooden shoes. That's a pretty bad joke, but I'll take it anyway. So yeah, we have three of these. Hopefully you can take Kenny's home as well for him. So without further ado, we'll hand over to the session chair to start the session. Thank you, Dan. So Amit is going to give the talk. Okay, so hello everyone. Today I'll be speaking about cold boot attacks on Ring and Modular LWE secret keys when the NTT is used to store the key. And before I start, this is based on joint work by myself and my supervisors, Martin Albrecht and Kenny Patterson. Okay, so first of all, what are these cold boot attacks? Well, they were originally investigated in the literature at least by Haldeman et al. in 2009. And the attack basically involves an attacker who has physical access to a victim's machine. And the idea is that there will be some cryptographic secret key materials stored in the victim's memory so the attacker can eject the memory, plug it into their own machine, and then essentially take a reading of the bits. And at this point, the attacker has two challenges. The first is to locate the key material in the memory. And the second is to use data remnants effects to actually recover the secret key. Okay, so clearly these work on any cryptographic primitive where there's a secret key stored in memory. So the attack's quite a general one. However, the attacker we're modeling here is an extremely powerful one. So it's one that has physical access to a victim's machine. Okay, so a few more details on what Haldeman et al showed in 2009, while they showed that there's this data remnants where if you cut power to RAM, eventually any information in the RAM goes towards this kind of ground state, as shown by this decaying picture of the Mona Lisa. Okay, what they also showed is that you can actually slow down this sort of decay by calling the RAM chips to extreme temperatures. So even if you just use compressed air for the calling, you can actually achieve a less than 1% bit flip rate towards some ground state after some period of time. So I'll speak more about what this ground state is in the next slide. But you can also reduce this bit flip rate by using more extreme methods of calling, such as using liquid nitrogen. Okay, so yeah. So all of this is basically based on the fact that when you cut power to RAM, eventually the bits in the memory will decay to either a zero state or a one state. In fact, there will be regions that decay to zero states and regions that decay to a one state. And once again, you call the RAM to extreme temperatures to slow down this decay. Yeah, and before I move on, when I say RAM, what I really mean is DRAM here. Okay, so now we move on to an example. Suppose we have a 12-bit secret key in the victim's memory. So this is the true value of the secret key. Well, the idea is that the attacker comes along, they freeze and extract the RAM, and then perhaps they take this kind of noisy reading of the secret key, where some of the bits have flipped. And at this point, the attacker needs to detect these bit flips, correct them, and correct them in order to recover the secret key. Okay, so eventually, if the attacker was to leave the RAM at, say, room temperature or any sort of temperature, eventually we reach the ground state of these regions of zeros and regions of ones. Okay, so in addition to these standard bit flips that go towards the memory's ground state, there are also these retrograde bit flips that actually go away from the memory ground state. However, these occur at a much lower rate of row one, where row one is roughly 0.1% according to the experiments of Haldeman et al. So in addition to this, I'll be using row zero to denote the standard bit flip rate. Okay, so whenever we launch a cold boot attack, a standard sort of assumption is that the number of bit flips we'll see is the number of bits in the key that we're attacking times the average of the two rates. And implicitly, in order to make this assumption, we're assuming that half the bits of the secret key are in the ground state. Okay, so before I move on, whenever I give you a bit flip rate that contains two sort of numbers, the first one's going to represent row naught and the second one's going to represent row one. Okay, so what's kind of known about these cold boot attacks on popular cryptographic primitives? Well, for DESON AES, it's been shown that there are extremely effective cold boot attacks even at fairly high bit flip rates of more than 50%. And just to give you some background as to how these attacks sort of work or what they take advantage of, well, for DESON AES, these attacks assume that not only do you have the secret key stored in memory, but you also have a number of rounds of the key schedule stored in memory. So essentially, taking advantage of this kind of redundant information between the key schedule and the actual key, you can actually launch very effective cold boot attacks on DESON AES. A similar story holds for RSA, where it's been shown that you can correct flip rates of roughly 40% in the standard direction in just a few seconds. So once again, this attack uses redundancy in what's stored in the memory. In particular, if RSA is implemented according to the PKCS1 standard, you don't just get the secret key in memory, you also get some other functions of the secret key in memory. And more recently, there's been some work on cold boot attacking entry, and it was actually shown that at fairly low bit flip rates, but still the realistic ones that Haldeman et al showed were feasible. You can actually recover entry keys in a matter of minutes to hours using fairly straightforward enumeration techniques. Okay, so next, to motivate our work, as you've already heard throughout this conference, NIST is running this process or competition, and many of them are actually based on the LWE problem. So a natural open question in the cold boot area is to ask, are there effective cold boot attacks on some of these LWE contenders? And what we actually look at is attacks on any scheme that uses an NTT to store its secret key in the memory. Okay, so next we get on to defining our cold boot problem. And in order to understand the problem, we first need to understand what an LWE key looks like, or at least what the LWE keys that we'll be attacking look like. So in actual fact, we'll be focusing on ring and module LWE, which are two of the perhaps main efficient variations of LWE used in NIST proposals. And in order to define the keys, we need to fix a polynomial ring. And the polynomial ring we'll be using is going to be the power of two cyclotomics, because this allows you to use an NTT. Okay, so in other words, we're going to be fixing the ring R sub-q, where this ring is essentially the ring of polynomials whose coefficients are integers module OQ, quotiented out by x to the n plus one. Okay, so these are a degree at most n minus one polynomials whose coefficients are integers module OQ. So for ring LWE, the secret key is simply one of these ring elements or polynomials. However, for module LWE, the secret key is a collection of d of these polynomials. However, there's an interesting trade-off between d and n. So what module LWE schemes tend to do is use a smaller ring dimension n at the expense of using a non-trivial d. So for example, Kyber, which is a module LWE-based scheme, uses n equals 256 and d equals 3. Whereas the ring LWE-based new hope uses a much larger ring dimension, but we only have one polynomial making up the secret key. Okay, so how are these keys stored in memory for some of these LWE proposals? Well, it's been kind of said before that number theoretic transform or NTT speeds up multiplication, polynomial multiplication, from roughly n squared operations to n log n operations. So in order to take advantage of this kind of speed-up, it's often the case that polynomials in the secret key will simply be stored in the NTT domain so that you can perform very fast multiplication without needing to apply an NTT to the secret key every time you want to do a polynomial multiplication. Okay, so next we can actually define our cold boot problem. So essentially what we want to do is to decode a noisy NTT or recover this s from s tilde, where s tilde is the NTT of s plus some error vector delta. So we're actually going to be making an assumption throughout this talk, and that is going to be that we have kappa bit flips where kappa is much less than n. And of course, you don't want to make this assumption in general, however, for the purposes of our attack, we actually require this assumption. Okay, so next, giving more details on what this error vector delta looks like, well, remember delta corresponds to bit flips. However, delta is also represented as an integer modulo q vector here. So essentially what this means is that the components of delta are not small when considered as integers modulo q, because they're associated to bit flips. However, if we assume a low number of bit flips, then delta's components should have a low hamming weight when written in some binary sign-digit representation, or B-S-D-R. So briefly, what is a B-S-D-R? Well, it's essentially a binary representation where each bit has a sign attached to it. And we really need this sign-digit representation because there are bit flips that go from 0 to 1, and bit flips that go from 1 to 0. Okay, so an example of a B-S-D-R of 7 is 1, 0, 0, minus 1, since 7 is 1 times 8, take away 1. And this example kind of highlights that B-S-D-Rs are not unique because we could also have the binary representation of 7. So this is kind of something that will be ignoring for the purposes of this talk, but if you want to see more details on this, please read the paper. Okay, so once again, if we assume Kappa bit flips, we would assume that we're essentially assuming that the B-S-D-R of delta should have a hamming weight of K. Kappa, sorry. And whenever I say the B-S-D-R of delta, what I really mean is the concatenation of the B-S-D-R of its individual components. Okay, so the final thing to be said about this entity problem is that S has small coefficients, which is essentially a standard design choice in that all of these ring and module LWE schemes use. Okay, so just to come back to these examples of Kyber and New Hope, in actual fact, Kyber consists of three relatively low-dimensional ring elements. So in order to recover a full Kyber secret key, we have to decode three noisy entities in a relatively low dimension, whereas for New Hope, we have to decode one noisy entity in a relatively high dimension. Okay, so finally we get onto our attack, and our attack splits into three main components. The first is a divide and conquer sort of strategy to reduce the dimension of our problem. And essentially what this component uses is that we can compute an entity using fast Fourier transform techniques. Okay, so the second component that I'll describe is how to work a solution up from one of these low-dimensional instances, all the way up to the solution to our original problem of decoding the noisy entity. And finally, I'll speak about how we actually solve these low-dimensional instances that we get from dividing and conquering. Okay, so first of all, how do we divide and conquer? Well, if you recall, the entity is essentially a Fourier transform over the integers module OQ, and the explicit formula for the entity used is given on the slide. And you'll actually see that there's... This isn't quite analogous to a Fourier transform because of this factor of a negative of half j in the summand. And yeah, so nonetheless, despite this difference, we can still use fast Fourier transforms, fast FFT techniques to quickly compute an entity. So essentially what that means is that we can write an entity in dimension n in terms of two entities in dimension n over two, and so on. Okay, so what are the formulae that allow us to do this? Well, they're given by the two equations in the box here. And the main thing to take from this formulae is that taking the sum and difference of the ith and the i plus n over two components of an n-dimensional entity gives you something in terms of the ith component of an n over two-dimensional entity. Okay, so moving on. How do we use this to divide and conquer our instance? Well, recall that our instance is given by this s tilde, where s tilde is a noisy entity. So in actual fact, taking the sum and difference of the ith and i plus n over two components of s tilde, and using the formulae from the previous slide, we can actually get equations one and two. And we call the first of these the positive fold and the second the negative fold. Okay, so if you look more carefully at the positive fold, you'll see that it essentially has the same form as our original instance. It's essentially, on the right-hand side, what we have is two times, yeah, a noisy entity. The only difference is that there's a factor, a constant factor of two in front of the entity. And this essentially means that we can divide and conquer the positive fold once again, using the same techniques. However, for the second of these equations, so the negative fold, we have this annoying factor of omega to the i plus a half in front of the entity, which kind of prevents us from dividing and conquering in any effective way. So essentially, we can repeatedly fold down the positive folded instance, or repeatedly divide and conquer down the positive folded instance. Okay, so the question is how many times can we actually do this and can we actually reach a trivial dimension by repeatedly dividing and conquering? Well, in order to understand the answer to this question, we have to look at the new error vectors that are introduced when we divide and conquer. So these are actually given by delta plus and delta minus. And if we write delta l as the left n over two components of delta and delta r as the right n over two components of delta, essentially what delta plus and delta minus are are the sums and differences of delta l and delta r. Okay, so, yeah, what this essentially means is that if we have... if the bsdr of delta is kappa, then we expect that the bsdr of delta plus and delta minus should have a hamming weight of kappa as well. And this is essentially because we're making the assumption that kappa is extremely small compared to n. Okay, so essentially delta plus and delta minus have a much less sparse representation when written in bsdr compared to delta because we have the same hamming weight, but the ones and minus ones are packed into half the dimension. Okay, so essentially what this means is that if we repeatedly fold, we are packing kappa ones and minus ones into a smaller and smaller amount of space. And if we do this, eventually these sort of noise terms will approach a uniform distribution because even packing a small number of ones and minus ones into a smaller and smaller amount of space gives us something that's fairly uniform looking. So if we divide and conquer so many times that this error term approaches the uniform distribution, then what we're asking to do to solve the low dimensional instance is to decode a noisy entity where the noise is essentially uniform, which is clearly an ill-defined problem because anything is a solution to that problem. Okay, so the final note on this slide says that the s terms stay the same size, so whenever we divide and conquer, the s terms or the thing that the entity is applied to stays the same size, so it doesn't cause any problems. But remember, this delta term or the error term is what really causes us issues when dividing and conquering. Okay, so to summarize the divide and conquer component of our attack, we start off with this top level instance, which is our original noisy entity. We can divide and conquer this once, and then we can divide and conquer down the positive fold because it has the same form as our original instance, and so on until we reach some bottom level pair of instances. And it's important that these bottom level instances represent well-defined problems. So essentially, the number of times you can divide and conquer depends on the parameters you're attacking, but for the purposes of this talk, we're going to assume that we divide and conquer three times. And so the next component of the attack is how to work a solution up from the bottom level all the way up to the top level. And in fact, the way our attack works is that we work the solution up one level at a time. So what I'll describe is how to work a solution up from the n over two-dimensional level to the n-dimensional level. So from the second to top level to the top level. And then working the solution up the other levels is an entirely analogous process. Okay, so if you recall the formula for delta plus and delta minus, well, essentially, we're just adding the left half of delta and the right half of delta. And once again, making this assumption that we have kappa, bit flips where kappa is much less than n, if we expand delta plus in the B-SDR, then the ones and minus ones in that either come from delta L or delta R. So if we're given a solution to delta plus, we can expand it in its B-SDR, and then guess which ones and minus ones come from the left half of delta and which come from the right half of delta. And what this ends up meaning is that if we have kappa bit flips, we require at most two to the kappa guesses to work a solution up one level. Okay, so another thing is that whenever we make a guess for working the solution up, we can actually verify this guess by plugging into the sibling instance or the parent instance. So each time we make a guess, we can verify it. And what this means is if we want to work up k levels, then we require at most k times two to the kappa guesses. Okay, so of course there's a small complication when the bit flips in delta L and delta R collide, but for the purposes of this talk, I'm going to ignore it. But we do take this into account in paper. Okay, so what we have so far is that we built this divide and conquer tree using the structure of the NTT. And if we have an oracle that solves one of these bottom level instances, we know by guessing how to work the solution up one level and continually work it up one level at a time until we reach the solution to the top level instance. The question that remains is how do we actually solve this bottom level instance? And a good starting point is to compare our instance to an LW instance. So remember, we're solving a bottom level instance here, so the dimension of this NTT is going to be small. And actually, in order to compare our instance to an LW instance, it's quite useful to take an inverse NTT on our bottom level instance. So essentially we've transformed the problem of decoding a noisy NTT into a problem of decoding a noisy inverse NTT now. So in this section of the talk, we're going to be decoding a noisy inverse NTT. Okay, so here's a table of comparisons between our instance and a standard LW instance. So the main comparison that I want to highlight here is the last one, so the fact that delta does not have a small euclidean norm. However, the analogous term in LWE does have a small euclidean norm. And this is one of the main difficulties in trying to solve our instance using LWE techniques. However, we can begin by looking at, like, how to solve this using LWE techniques and see how far we get. And we'll do that by looking at the bounded distance decoding problem. Okay, so briefly, what is this bounded distance decoding problem or BDD problem? Well, essentially, you're given a lattice and the input to the problem is a target vector T along with a radius R. And you promise that the distance from this target vector to the lattice is at most R. And the answer to the problem, or the solution to the problem, is the closest lattice point to the target vector. Okay, so how do we embed our instance into some BDD problem, the kind of LWE way? Well, the first thing to note is that we'll be using, we'll be embedding our n-prime dimensional instance, where n-prime is, you can think of it as being 32, as an example, into a two-times n-prime BDD instance. And the way this is usually done is that we define the target vector where the first n-prime coordinates are zero and the second n-prime coordinates are given by our noisy inverse NTT. Then we construct this lattice that satisfies the linear relation that the NTT inverse on the first n-prime coordinates added to the second n-prime coordinates gives you zero modulo Q. And finally, you use a BDD solver, some BDD solver, to find the closest vector in the lattice and hope that the offset vector is delta S, which is our solution to the bottom level instance. Okay, so why does this work? Well, we know that delta concatenated with the minus NTT inverse of delta is a lattice point, and the offset from the target to this lattice point is our solution delta S. So a perfect BDD solver will actually return this lattice point if it's the closest one to the target. And essentially, we can guarantee that this is the case if the norm of delta S, or the norm of our target offset vector, is less than half length of the shortest vector in lambda, in the lattice. So to emphasize the point, our main sort of success condition that we're aiming to satisfy is to essentially, given the perfect BDD solver, we want to make sure our target offset, which happens to be delta S here, so our solution has a norm that is less than half length of the shortest vector in lambda. So you want delta S to be short. However, there's an immediate problem here, and it's that delta S is not short. Because if you look at delta as an integer vector, then the components of it aren't small. Even though it... Well, the main reason for this is because it corresponds to bit flips. So a first step in trying to short in this target offset vector of delta S is to consider a base 2 to the L SDR of delta instead of delta in the offset, in our target offset. And in order to do this, we actually fixed L equals the log of root Q. We redefined the lattice by introducing this tensor product, so I'm not going to go into any of the details really here. And we update the target vector, and if you go through the same analysis using these updated target vectors and lattices, the offset vector that we're aiming to be short is now the base 2 to the L SDR of delta, along with S. And the interesting thing is each component of the base 2 to L SDR of delta has a size at most root Q, as an absolute value of at most root Q, because we fixed this particular L. Whereas the absolute value of the size of the original delta had norm around Q, so a maximal norm of around Q. So essentially this base 2 to the L SDR of delta is a shorter vector than delta is, even though it's in a high dimension. So there are two main things to note about this technique. A lattice that we're actually running BDD on or trying to solve BDD on is actually higher than it was before. It has a higher dimension than it had before. So the increase is from 2n prime, which is the dimension of the old lattice, to 3n prime, which is the dimension of this lattice. And using this tensor product, we actually introduced a new class of vectors. For example, one vector in this class is 2 to the L minus one, followed by zeros. And this class of vectors, well, they all have a norm of roughly root Q. So essentially our offset vector is now the base 2 to L SDR of delta, along with S, but our lattice now contains shortish vectors of length root Q. So we actually haven't achieved our aim yet. Our offset isn't shorter than all of the vectors in our lattice. Okay, so in order to shorten this target offset further, what we end up doing is we deploy this kind of hybrid guessing approach, where we basically want to shorten the offset vector of base 2 to the L SDR of delta. And in order to do this, we simply guess the upper bits of each symbol in the base 2 to L SDR of delta. So fixing our particular L, we have that each component of delta, when written in the base 2 to the L SDR, has two integers attached to it. And what we end up doing is we essentially split these integers into the upper bits. So these are the red blocks in this diagram and the lower bits, which are the yellow blocks. So the idea is that we guess the upper bits, we guess the red blocks, we update our target vector, and now our offset vector is in terms of the yellow bits. So essentially, we shortened our offset vector without changing the lattice or anything like this. Okay, so it turns out that this strategy actually does solve the problem, more or less. Okay, so all of this was kind of described in terms of having a perfect BDD solve. But in practice, the BDD problem is an extremely hard problem to solve for uniform lattices at least. However, our lattices have some structure attached to them because they use an entity in the definition, in the definition. And as our experiment showed, our lattices stray far away from what the theory of uniform lattices kind of tells us. So as evidence of this, we actually obtained, well, we had a 96-dimensional lattice, and we obtained a BKZ 90 reduced basis of this lattice. Then what we did was to investigate this kind of non-uniformity of our lattice, we plotted the log length of the Gram-Schmidt vectors against the basis vector labels. And what you'd expect for a uniform lattice is this blue line. However, for our lattice, we observed the red line. And clearly this, well, the difference between the red line and the blue line is quite large. And essentially what this means is that we can't really rely on the standard lattice theory to analyze the performance of our attack. So instead, we actually run these, run or create a BDD solver in order to understand how our attack performs. And we use the BDD solver, well, we build a BDD solver by using BDD enumeration. Okay, so that concludes the description of the attack. Now, what is the overall complexity of the attack? Well, the steps of the attack kind of divide into these natural components. So first of all, divide and conquer just asks us to add a few integers together so the complexity of this stage is fairly trivial. And we actually had to, we also had to reduce a basis in order to perform BDD enumeration. However, the lattice basis is kind of fixed over all Colbert instances when attacking a single scheme. So this is essentially done once and for all. Because if you notice, we kind of solve BDD on the same lattice multiple times. The next thing is this BDD enumeration phase. And this actually ends up dominating the complexity of our attack. And in particular, this is made worse because if you remember, in order to shorten our target offset vector, we had to make some guesses of the top bits. And essentially, once we make a guess of the top bits, we have to run a BDD enumeration for each guess. So in our attack, we don't run one BDD enumeration. We run many, many BDD enumeration. So this is the phase that actually dominates. And finally, working the solution up to tree, although it costs 2 to the kappa bit flip, 2 to the kappa sort of operations, roughly speaking. This actually doesn't end up dominating our attack in terms of what we found in our experiments. At least not for the kappas that we analyzed. Okay, so here are our experimental results. So we got these results by producing 200 Colbert instances for the Kyber and New Hope parameters. We then varied our attack parameters, the various attack parameters, and ran experiments using the different configurations of attack parameters. And these are essentially our best figures. In addition to running experiments on our NTT attack, we also estimated how long we would expect a naive cold boot attack to work if the NTT was not used to store the secret key. And this is given by the last column in this table. Okay, so the first thing you'll notice is that for Kyber, we actually analyzed much larger row noughts than we did for New Hope. So for Kyber, row nought is roughly, well, we went up to roughly a few percent, whereas for New Hope, all of the row noughts we analyzed were much less than 1%. And this is essentially because New Hope uses a much larger ring dimension than Kyber. So this change in parameters really does seem to affect our attack. The second thing that's interesting here is that we can compare the cost of our NTT attack compared to the non-NTT cost. And for Kyber, which is the module LWE-based scheme, we actually see that there's quite a large gap between the cost of attacking an NTT encoding of a secret versus the cost of attacking a non-NTT encoding. However, for New Hope, this kind of comparison is less clear because sometimes the NTT attack is cheaper and sometimes our estimates of the non-NTT attack is cheaper. So to conclude, we've shown that the structure of the NTT can actually be exploited by cold boot attackers. For Kyber parameters, it seems that the NTT, at least for the bitflits we looked at, seems to allow for a faster attack than the case where the NTT is not used. For New Hope, this phenomenon was not really observed in our experiments, but nonetheless our recommendation for the time being would be that if cold boot attacks are a concern, it's worth not storing your secrets using an NTT. That's not to say you shouldn't use an NTT, you can still use the NTT, but just don't store the secret in memory using an NTT. So, yeah, the idea behind this recommendation is to sort of guard off against improvements to our attack because the NTT really is introducing some structure that a crypt analyst might be able to exploit. Okay, so future directions. The first one would be to kind of study how to solve these general LWP-like problems with these strange low-harm-weight BSDR secrets. And the second would be to try to exploit the rich algebraic structure of the NTTs further. So if you saw our lattice, it was highly non-uniform, it had a lot of structure to it. So, yeah, I think we all kind of think that if we were able to exploit the structure of the NTT further, we could certainly speed up our attack. And yeah, that's all I have to say, I've got some references, and I'll be happy to take any questions you have. Thanks. Thank you. We have time for our questions. All right, see, they're just references. I didn't want to leave them up. Well, that question or comment, no? Maybe I have one question. What is the cost? What is the loss if we don't represent the secret with NTT? What is the loss? Yeah, what is the loss? You recommend if callable attacks are a concern. It isn't worth not storing secrets using NTT. There are efficiency loss if we don't... Ah, yeah. So usually whenever you multiply two polynomials, it's easier to multiply the NTT of a polynomial with NTT of the second polynomial. So essentially whenever you want to do, like, a times s, it's faster if we already have the NTT of a and NTT of s to multiply the two polynomials. However, if you have just a and s, you have to first compute NTT of a and then NTT of s and then do the multiplication operation. So it does affect the efficiency of some of these schemes but only slightly. Yeah, it's negligible compared to the... No. Well, I'm not unsure if it's negligible but... Yeah, I don't really know how it affects the concrete performance of these schemes, actually. Question? Comment? Nope. Please thank the speaker again.