 Hi, my name is Joe and this talk is on our paper that's been accepted to Crypto 2021, lattice reduction with approximate enumeration articles, practical algorithms and concrete performance. This is joint work with Martyn Albrecht, Shibai and Jan Wiley. So before we start, I'd like to give you a high level summary of what this talk and what this work are ultimately about. This is a talk about blockwise lattice reduction and in particular, we focus on the role of an algorithm called enumeration inside these blockwise reduction algorithms. And crucially, what we're really going to focus on is how changing how these articles behave inside a blockwise reduction algorithm can change the running time by an exponential factor. There's obviously some clear motivation that's needed for this work. And in particular, you can view this via the fact that lattices are the basis of five out of the remaining seven schemes in the NIST PQC process. Understanding the difficulty of solving the problems upon which these schemes are based, such as the LWE variance and entry, is crucial for setting cryptographic parameters. If we don't truly understand how hard these problems are to solve, then we may set either parameters too large and compromise on efficiency or too small and compromise on security. And although there are many approaches to solving these problems, the best practical attacks against these schemes require blockwise lattice reduction. So before we begin, we'll go over some polymeries. So if we have some set B comprised of vectors B0 through to BN minus 1 of some linearly independent vectors, then we refer to the set of all interlinear combinations of the vectors in B as the lattice L of B, or simply L. You may also have heard of B by a different word, which is referred to as a basis, but for the sake of this talk, we shall just refer to it as B. Lattices have an associated invariant known as the volume, and you can see that this is efficiently computable for some set B when the vectors are linearly independent. This volume also has a nice geometric property as well. It corresponds to the density of the lattice. If your volume is small, then it implies your lattice is very dense. Whereas if you have a large volume, it implies that the lattice is rather sparse. When thinking about solving cryptographic problems, typically speaking, we focus on the shortest vector problem. In this work, we focus on a variant of the shortest vector problem known as Hermite SVP. In particular, the definition of this is if we want to solve delta approximate Hermite SVP on an n-dimensional lattice L, we require to find a vector V that has norm less than delta times the volume of the lattice normalised by the dimension. Typically speaking, one might consider delta as this square root of N over 2 pi eta. The reason for this is because this is exactly the leading factor as predicted by the Gaussian characteristic. Because this value depends solely on the dimension of the lattice, for the rest of this talk, we'll simply refer to this as GH of N. So just to give you a pictorial representation of what a lattice looks like, you can see on the left-hand side here that we have a regularly repeating set of points, and you can also see on the right-hand side we have a picture that shows the shortest vector in this lattice. Geometrically speaking, you can view the volume as the area of one of these squares that appear on the slide. So for example, if you took a point of the origin and went out to the point that is closest to it on the right-hand side, then this also forms part of a square. And the volume of this square is the volume of the lattice. When attempting to solve HSVP, typically what one does is replace the exact problem within approximation. And the typical way to do this is to embed an exact HSVP solver inside some block-wise reduction algorithm. The intuition behind why one might do this is because you're replacing the basis by a set of short, nearly orthogonal, linearly independent vectors. And the reason for this is you're trying to balance out the contribution of the lengths across all of the vectors in your basis. In particular, we take the basis for some lattice of rank n and decompose it into projected sublattice of rank k. And the way that we form these projected sublattices is if we have some projected sublattice l of ij, we project the vectors in the basis from bi up to bj or thug and liter the span of the previous vectors in the basis. And this provides a very powerful time quality trade-off. If the rank of our projected sublattices is close to the rank of the lattice at large, then the output basis is more likely to be short and near orthogonal. By contrast, if we make k rather small, then the problem will be faster to solve because it's less challenging. But the quality of the output basis is unlikely to be as good as we would want. So there's a real trade-off here between how large we choose these blocks to ensure that our output basis looks a certain way. But we also need to balance this against the running time. One way to gauge the quality of these algorithms is using a measure called the root Hermite Factor or the RHF. And put briefly, the root Hermite Factor of some lattice l is the length of the first vector in the lattice divided by the normalized length that we might expect. So this term in the denominator here is the length you might expect if the norms of all of the vectors in the lattice were evenly distributed. Because this quality is independent of the rank of the lattice, we normalize by this term 1 over n minus 1. But this is just to normalize the quality for what, so we can compare across many different lattices. One blockwise reduction algorithm that features quite a lot in the literature is called BKZ. And what BKZ does is you divide the lattice up into blocks of size k, and you first find the shortest vector over the first k elements, and then you shift one along by projections and find the shortest vector in that sub-lattice. And you continue this process many times until you end up with a basis that looks a certain way. There are some theoretical guarantees on how BKZ operates, but you can essentially think about each block having the shortest vector in that block in the first position. It can also be useful to run BKZ on these predicted sub-lattices in a recursive sense, so that you have some guarantees about the shape of the basis or how the norms are distributed before you start doing the rest of the work. But again, this is another trade-off that needs to be considered, because pre-processing also comes with the cost. Once you have a blockwise reduction algorithm such as BKZ, you do need an exact solver to reduce the blocks. And in this work we consider an algorithm known as enumeration. Enumeration works by walking across all of the points inside a ball of some radius r that intersects with the lattice. So we're looking for all of the lattice points that are inside some ball with a particular radius. If you center this ball at the origin, then this has the same effect of making sure that all of the vectors inside the sphere are less than length r. Without going into too much detail about how this works, what one first does is you start with the final vector in the basis, and you produce all combinations of that vector that are inside this ball, and then you gradually walk your way across the rest of the basis until you're finished with all of the points inside that ball. This algorithm has been very well studied, and to date the best known variance of this algorithm run inside BKZ in time 2 to the power of k log k divided by 8 times some exponential term. But crucially they also run in space polynomial in the rank of the lattice. In this work we're focusing on improving this exponential term without compromising on the root-hermite factor. We want to make sure that no matter what we do here we maintain the quality of the basis as output by BKZ. One classic approach to doing this is known as pruning, and intuitively the way you can think about this is that the previous explanation considered finding all of the vectors inside this set s intersect L. But not all of the paths in the enumeration tree are likely to lead to solutions. The enumeration tree is very large in size, and we don't expect there to be that many short vectors. As a result, in a typical approach is to cut off some of these paths that we don't need, and this is referred to as pruning. We simply cut off branches in the enumeration tree that are unlikely to lead to short vectors. There are many different approaches to pruning, but the most popular one is referred to as extreme pruning, and this can provide a speed-up that is exponential in the dimension of the lattice. We'll come back to how extreme pruning works later, but the idea is that one sets a parameter that is rather aggressive, so you expect to cut many, many paths, and whilst this drops your success probability exponentially, the cost of re-randomizing your basis and then running pruning over that basis again is cheaper than the speed-up that one... Oh, sorry, it's less expensive than the speed-up that one gets, and thus you still end up with a speed-up even though you're going through this process of randomizing the basis, re-reducing it, and then running an extreme enumeration ever. Now, there are other approaches to this problem, and a rather natural one is to merely loosen the requirements on the solution, so in particular it may be wise to accept a longer vector from our enumeration oracle. We refer to this process as relaxation, and so just to give you an example, we might accept a vector that is at most a factor alpha, which where alpha is some constant greater than one, longer than is predicted by the Gaussian heuristic. This was recently studied by Leon and Winn, and they showed that if one of relaxes enumeration in site BKZ, then you can get a speed-up of around 4 alpha squared times 1 minus rho all to the power of k divided by 4, where rho here denotes a pruning parameter in the very 0 to 1. As rho approaches 0, the pruning becomes more aggressive, but the speed-up also increases. One disadvantage here though is that we're accepting a vector that is further away from being ideal, and this can lead to our final basis being less strongly reduced. And so the approach we take in this work is we ask, what happens if we increase the size of the blocks to accommodate the loss in quality that we get by using these approximate oracles? So as before, we'll let alpha denote the approximation factor that we're willing to accept. Our approach is to replace enumeration in block size K by enumeration across a block of size the smallest integer greater than K, which we denote as K alpha. In particular, this K alpha should be chosen such that the root-hermite factor does not increase. In other words, we numerically solve equation two at the bottom of the page here, where we're looking for a value K alpha, such that when we find a vector of length alpha times GH of K alpha, normalized, we do not increase the root-hermite factor. And this K can be, or this K alpha can be uniquely determined. The way to solve this is to set this equation here up as an equality, and then round the value of K alpha up. So our main result for this work is we show that the speed up from relaxed enumeration is enough to overcome the extra cost from increasing the block size. This speed up holds for both fixed pruning parameters, i.e. when we choose row, and the numerically optimized pruning parameters that are chosen by the FPI-LLL library. For those of you who aren't aware, FPI-LLL is a lattice reduction library that has been used in many different cryptographic works, and one of its crucial features is that it sports a numerically optimized pruner that, given some basis, will compute optimal pruning parameters for speed and quality and success probability. Just to go into more detail quickly, if we assume the enumeration in rank K costs 2 to the power of C0K times log K plus C1K plus C2 many polynomial time operations, where C0, C1 and C2 are just some constants, then if we have some fixed parameter row, we can solve alpha GHK alpha HSVP in rank K alpha, faster than solving GHK alpha HSVP in rank K by a factor of at least equation 3. Just to guide you through this a little bit, it's the term on the left-hand side that really matters here. So it's this alpha to the power of a half minus C0 eta K. Here eta is some parameter that is inside this range here, but you can see that this really isn't as important as the alpha to the power of K term, essentially, that we have at the beginning. Our proof technique works by showing that we have a bound on how large K alpha can be compared to K. So in particular, we don't expect K alpha to be much larger than K, and as a result, we can prove some inequalities based on how large the enumeration tree will be for K alpha relative to enumeration in rank K. So just to show you some results quickly about how this works. So in this table, we have this term here, T alpha, which denotes the time for solving alpha HSVP in rank K alpha, and T1 is the time that it takes us to solve GH SVP in rank K. And we've just chosen some fixed pruning parameter here, rho, which equals 0.01. In this table, you can see that we have an alpha term on the left-hand side that just shows the results for differing values of alpha. And then in the middle of this table here, you can see both the time complexities for what running enumeration in rank K alpha with alpha GH SVP, and how this compares to the baseline in the third column. You can see here that we get exponential speed up as alpha increases, which is exactly what our previous theorem showed. But crucially, this does not yet approach the state of the art. So as I previously mentioned, the leading term in the state of the art is K log K divided by 8, whereas here we have K log K divided by 2e. And this second term as well is undesirable, simply because the approach of extreme pruning in practice provides a larger second term. So the natural question now is whether we can do better. And the answer is yes, you can. So firstly, a reasonable approach is to try and use F pi LLL to compute the pruning parameters, because it may well be that for some fixed value rho, the pruning parameters that are chosen are not optimal. And secondly, we can also try to integrate the super exponential improvements from this work that's cited here at the bottom, which I'll refer to as ABF plus 20 for the rest of this presentation. The crucial part of that ABF plus 20 is that they provide an algorithm that does achieve the state of the art with a leading one over eight term. And thankfully, these ideas appear to be somewhat orthogonal. And we found that we can achieve a speed up over the state of the art when using numerically optimal pruning parameters and the improvements from ABF plus 20 whilst achieving the same root Hermite factor. So here we have some simulated costs for achieving the same root Hermite factor, which in particular is K to the power of one over two K. And so you can see here the simulated costs with our line fit that if we have out three equals 1.05, we end up with a cost of K log K divided by 8 minus 0.596 K plus 12.09. And we have a similar cost for our three equals 1.2, but you'll notice that the the coefficient in front of the lower order K term here has decreased. And so we have achieved an exponential speed up. But of course, we also need to be careful, because as I've previously mentioned, when we relax enumeration, we can end up with a lower cost, or sorry, a lower quality solution. And the way we do this is we increase the rank of our enumeration oracles inside BKZ. In particular, we define a new BKZ variant that extends the BKZ variant from ABF 20 or ABF plus 20, sorry, simply by adding alpha as an additional parameter. As a further speed up, we also used our relax oracles during the pre-processing stage when it's advantageous to do so. The subtlety here is that when one pre-processes a block, it is typical to shrink the block size quite a bit. So if you're pre-processing a block by using BKZ, if you shrink the block size too much, relaxation may no longer benefit you. And as a result, when it is no longer advantageous to do so, we simply use the default setting. But when it is advantageous to do so, we use these relax oracles. There are lots of parameters that can be customized in this BKZ variant. And so to avoid going into detail here, we provide a full table of configurations in the paper. So this is just to show how this work and ABF plus 20 have moved the state of the art in the last few years. So as reported in ABF plus 20, the running time provided by the FPLLL library achieves time k log k divided by 2e minus 0.995k plus 16.25. ABF plus 20 from last year achieves a running time of k log k divided by 8 minus 0.547k plus 10.4. And in this work, we do not move the leading term at all, but we do exponentially accelerate the enumeration oracle by decreasing the second term here from 0.547k to 0.654k. This is all for achieving a root home factor of k to the power of 1 over 2k. So this is exactly what we would expect for BKZ asymptotically. You can see here in this graph that we have managed to lower the cost of achieving lattice reduction in certain positions. So for example, if you consider 2 to the 256 operations, you'll notice that artwork moves the crossover point to around rank 500. Whereas just two years ago, it was around about a rank 400. And indeed, we make a larger improvement compared to ABF plus 20 versus ABF plus 20 against the baseline provided by FPLLL. There is a following takeaway from this, and that is that this work does not utilize the regular re-randomization provided by extreme pruning. So as I mentioned earlier on, extreme pruning has this property where one aggressively sets pruning parameters. And if a solution is not found, one re-randomizes the basis and then repeats the process. Our work doesn't do this. And in particular, we believe that this is due to the expensive pre-processing that we need to ensure that our algorithm works. So we may yet find further speed-ups if it is possible to find a way to reconcile these strategies. If pre-processing can somehow be preserved whilst doing this regular re-randomization, then it may be that a further speed-up is possible compared to our approach. And so just to demonstrate this via a graph, you can see here that the expected number of solutions, and thus the success probability is roughly constant as the dimension of the lattice increases. So you can see that it roughly covers around one with some noise. And this is the same for all values of alpha. This is essentially a proxy measurement for the success probability because we would expect the expected number of solutions to decrease as the rank increases because the success probability goes down. That is not what we observe here. And so there still may be some speed-ups by forcing this number of solutions to decrease as we increase the rank. But despite all of this, this work does not invalidate any existing NIST security claims. Although we may have moved the crossover for enumeration for two to the 256 operations from rank 400 to rank 500-ish, the singly exponential saving is still far faster than enumeration. And none of the speed-ups in this work are enough to lower any security claims for any of the NIST-based crypto systems. With that, I would like to say thank you for your attention, and I look forward to answering any questions you might have on Zilith.