 Hi, my name is Fernando and I will be presenting on the success probability of solving unique SVP via BKZ. In our work, we investigate the cost of solving the learning with errors problem via lattice reduction. Our contributions are the following. We extend previous work by AlchiMetal and Dakman Soledetal to estimate the cost of low probability attacks. We experimentally verify the resulting model against Gaussian secrets, but also against binary and ternary secrets and errors as found sometimes in the literature. We will also explain some low probability attacks that have been reported in the previous literature by Albrecht et al. And we finally determine the effect that these low probability attacks could potentially have on security estimates for crypto systems. And we now present some preliminaries. We look in particular at the search variant of the learning with error problems where there is a fixed secret vector S, which coefficients have been sampled from some distribution and that has dimension N. And then we have an error vector E of dimension M, where the coefficients have also been sampled from some distribution. And usually the two distributions for S and E are the discrete Gaussian. Then we have matrix A of uniformly sampled entries over ZQ. And we are given the double A and AS plus E modulo Q. And we are also given the task of recovering the secret S. One possible way of doing this is via the primal attack strategy where we construct a lattice containing a target vector T, which is essentially a concatenation of the vectors S, the vector E and the number one. And we know that by construction our lattice contains this as the unique shortest vector up to sign. Then we use lattice reduction to recover this unique shortest vector. Essentially this is reducing the learning with errors problem to the unique shortest vector problem. Now when looking at this strategy there are two immediate questions. First is how do we go about costing the attack? And second is what happens if we change some of the parameters. For example if we change the probability distribution from which S and E are sampled. We now define some more useful objects. Given the linearly independent vectors B1 to BD of dimension D, we say that this form of basis for the full run lattice lambda compose essentially of the integer linear combinations of the vectors B. Given a basis we can also run the Grashmead orthogonalization algorithm to obtain vectors that we denote as B star. And now we define with this a basis profile. This is the set of the norms or the square norms in this case of our Grashmead orthogonalized vectors. Finally we also define given a basis, we define a projection operation pi that takes the basis and takes an index and then essentially maps any vectors in the dimensional space. On to the orthogonal subspace that is orthogonal to the first in this case the first k-1 vectors in our basis. Given a lattice basis and a reduction algorithm A, it is possible to predict the resulting basis profile for a reduced basis. There are two ways of doing this. The first is the geometric series assumption that states that for every algorithm A there is some coefficient alpha between 0 and 1 such that if we use that algorithm to reduce a basis its profile is going to be in a geometric series with alpha as the multiplicative coefficient. The other way instead is to use an algorithm specific simulator such as the BKZ simulator by Chen and Nguyen which can predict the effect of running multiple tours of the reduction algorithm on the given basis and then it predicts the profile of the basis output after these tau tours, let's say. Now I mentioned tours of BKZ and we will see that these are the main steps in lattice reduction algorithms that we will be analyzing in this work. So I will now talk briefly about what a BKZ tour is. On the left you can see pseudocode describing what a BKZ tour does and on the right you can see the basis profile for a certain lattice. Essentially on the right we are plotting on the x-axis the index of the basis vector and then on the y-axis we are plotting a log plot of the norm of its grime-schmidt vector. What a BKZ-beta tour does is that we first look at the first beta coefficients in the lattice basis and we look at an orthogonal projection of the lattice span by those basis vectors and it calls a shortest vector problem or a call that is able to solve exactly the shortest vector problem in this saplatis. Then if it finds a vector that is shorter than what was before it will insert that vector at the beginning of this local block of dimension beta and then it will run some LLL to remove some linear dependencies. Then we move our attention down to the right and essentially we repeat this process on the next beta coefficients that will be overlapping or a call for solving the shortest vector problem is called and then a shortest vector is found and is inserted at the beginning of this local block. This is repeated multiple times and as it is repeated we can see that the lattice basis profile is changing and somewhat its slope is growing a little bit. Then we get to the end of the basis and at that point it is not possible to keep moving to the right and so instead one reduces the dimensionality of the projective saplatis that is being searched for short vectors. Now after enough tools are run the basis profile will start to converge toward the geometric series assumption predicted for our lattice and for our algorithm. Here for example we can see in blue the basis profile of our radius basis and in orange we can see that almost exactly below the blue line the GSA had already predicted that this was going to be the basis profile output by BKZ. Now the BKZ beta tool as we said is a fundamental component of blockwise lattice reduction and from it we can define two different algorithms. First we have BKZ beta where in our definition we run a fixed number of tools using a fixed block size beta. The other algorithm is progressive BKZ where we run BKZ tools with an increasing block size. Again we have an input parameter tau and basically this says for every block size starting at 3 run tau tools of BKZ beta. Now if we were trying to use one of these two algorithms as part of the primal attack we would not necessarily wait for the two algorithms to naturally return but rather after every BKZ beta tool we would check if we were able to find a solution to the learning with errors instant that we're trying to solve. Now going back to our question we're trying to see what the cost of solving the learning with errors problem is. So our objective as a reminder is to recover this vector T inside the lattice and in practice the recovery of T follows directly from recovering an orthogonal projection of T during a BKZ beta tour. Now the analysis for this was done by Alke Metall in the New Hope paper and essentially the idea is since recovery of T follows from recovery of this projection and recovery of this projection requires all the vectors in the basis to be taken account for. This will most likely happen when we are running the SVP oracle over the last beta indices of our basis after it reached almost the GSA the geometric series assumption profile. So really what we're doing is if we want to solve the problem we want this projection to be the shortest vector in that subspace. Therefore what we have to do is estimate the length of a projection of the shortest vector and then we estimate what the lattice basis profile is going to be after reduction and we try to choose a block size that here we denote by beta star to mark it as the prediction done using this model. We choose a beta star such that the projection of our target vector is shorter than what the GSA says that we will find at that index in the basis and if it is the SVP oracle will essentially recover this projection and place it into our basis and this will lead to recovery of the full secret vector. This approach by AlchiMetal was originally experimentally verified in Albrecht et al. in 2017. What the authors do is they set up for various secret dimensions some search LW instances and then they look for what is the predicted optimal block size for solving the LW problem using the AlchiMetal analysis and then they try to solve the problem by using the primal attack strategy. And indeed in this table we can see on the top row that if one chooses exactly the optimal block size one is then able to recover the LW solution with very high probability. However they also observed that if one were to choose less than optimal block size that the theory would not directly predict to be successful it is still possible with a relatively high success probability to recover a solution to the LW problem. Now in this table we show the results where the gap between the block size and the optimal block size beta star was 5, 10 or 15 relatively small. However it is not a priori clear whether this gap could increase as the secret dimension increases. If that were the case one could imagine that for cryptographically sized parameters maybe bigger gaps still allow for high probability attacks. And this of course could be a problem because the shortest vector oracle used inside the BKZ beta tour is the main component of complexity in the attack and this complexity is exponential in the block size. So reducing the block size too much would lead to a significantly cheaper attack. Our main contribution in this paper is to extend work by AlchiMetal and by Dagman Zaledetal to predict exactly these success probabilities for lower than expected block size. Essentially what we do is that we try to simulate the probability of solving the unique shortest vector problem instant that is part of the primal attack strategy as we would be reducing the lattice basis. So here more or less what we are doing is we are taking a description of an LWE instance and then we are starting to account for the probability of having solved it so far. So we say okay we are running tau tours of BKZ so for one to tau we will first simulate what a BKZ simulator such as the one in Chen'en Nui Yan predicts our basis profile is going to be at this point in time. And then we are essentially looking at the probability that if we are at this point where we still have not reached what the geometric series assumption assumes to be the state of a reduced basis what is the probability that we are able to recover the projection of the target vector T that would lead to a solution. And this can be done by computing the probability that such projection is the shortest vector in the local block that has been reduced at that moment. And what we do is we essentially model the square norms of these projections to be essentially a chi-square distribution that is scaled to match the variance of the LWE secret distribution. Then what we do is that we essentially accumulate the probabilities of having indeed found this short projection. And we move on to the first step again in the loop where we assume okay if we have not found yet the shortest vector we will run another tour. And so we will again use the BKZ simulator to see what the state of the basis is going to be at the end of the second tour. We will compute the probability of winning at the second tour and we will accumulate it and so on for all tau tours. Finally we return what is essentially the probability of solving LWE with a block size beta or smaller. For progressive BKZ the idea is exactly the same but we are increasing beta as the tours go. And we simulate tours happening until the probability of solving LWE is essentially one. There is here a gotcha in the fact that we are assuming that these probabilities that we are accumulating are for independent events. And indeed this seems to be enough the case because the BKZ tours are not just looking for this projection of the target vector but they are also re-randomizing the basis. And so although there are some exceptions that we address in the paper overall this seems to be a valid simplification. Next we run experiments to verify whether our algorithms that we call USVP simulators are successfully able to predict the probability of solving LWE given a certain block size and a certain algorithm. How we went about it is that we chose LWE parameters that were expected to require a block size of around 60 to be solved. Where we assume LWE to be parametrized such that the secret and the error terms are sampled from a Gaussian distribution with a certain standard deviation sigma. In this table we summarized the parameters that we used and we also show what the expected successful block size would be if we were to follow the alchemical analysis. Then we decided to study mainly two batches of experiments in some we kept we use a discrete Gaussian distribution to sample error and secret. And in the other set we decided to instead try to attack binary instances of LWE that have either binary error and secret or ternary error and secret. Because we wanted also to see if these analysis holds in those cases where the distribution is much more concentrated than in a discrete Gaussian case. And indeed what we do is that we reuse the same parameters for the secret dimension the same in the same queue. And we observe that a ternary secret distribution will have standard deviation square root of two thirds. Just like we have chosen for the discrete Gaussian case in the case of secret dimension 100. And the centered binary distribution that we describe in the paper has a standard deviation one just like we had chosen for smaller than 100 secret dimensions. And so this will allow us to compare directly the cost of solving LWE with a discrete Gaussian secret and error or with a binary or ternary secret and error. And we would like to point out that the USVP simulators are not able to see the difference. Both rely essentially on on the standard deviation of the secret and error distribution. And so they should both be able to predict the same hardness for the two different problems. We also run multiple other variants of these experiments. We will now look at some results but more can be found in the paper. First we're going to look at Gaussian error and secret when we reduce the basis using BKZ and progressive BKZ. These are lots of plots but they are essentially the same plot repeated for different secret dimension. And what we are showing here is for BKZ run only for five tours or only for 10 tours or only for 15 tours and for progressive BKZ where every block size is used once. We printed in a dashed line what the USVP simulator predicts to be the success probability of solving LWE with the algorithm and with a block size that is smaller or equal than beta. On the x-axis we have beta and on the y-axis we have this probability. Then the crosses are the experimental frequencies with which these algorithms are able to solve LWE. From what we can see the simulators are able to relatively well predict the fact that this probability will not go instantly from 0 to 1 at exactly the expected value of beta but rather that they grow. And it also successfully predicts the fact that it's more likely to solve LWE using 15 tours of BKZ rather than just using five tours of BKZ. On the other hand if one wanted to only run five tours of BKZ would need a larger block size to reach the same success probabilities. Then we look at the case of binary and ternary error and secret. We will just plot progressive BKZ here for simplicity and we plot three different cases of progressive BKZ where every block size is used once or five times or ten times. Here we can see again that the predictions from the USB PC simulator seems to more or less match what we find experimentally. And that is that the probability of winning with a certain block size or smaller grows as the expected block size is approached and does not just go immediately of course from 0 to 1. Something interesting of course is that the dash lines which are the predictions are not aware of the fact that the secret is binary or ternary but rather they are only aware of the standard deviation of the distribution from which the secret and error were sampled. Therefore it would look like that ignoring possible combinatorial attacks like in the case of the hybrid attack if one were just to pick a binary say learning with error instance and try to run the primal attack without exploiting the fact that the error is binary in any sort of combinatorial way. Then the harness seems to be the same as if the learning with error instance had a discrete Gaussian error and secret with a variance matching that of a binary distribution. Now we would like to look at one particular case of an effect that was somewhat disturbing our predictions. Here the plot that we are seeing is a similar plot on the x-axis we have the block size and on the y-axis we have the probability of solving LWE with a smaller or equal block size to beta. In this case we are looking at secret dimension 72 which is the smallest that we use and we are using progressive BKZ to solve some Gaussian LWE instance. And we can notice that although generally the prediction by the USVP simulator seems to be located around what the experiments find lots of experiments seems to be succeeding at solving the problem with significantly lower block size on the left. And also that some experiments seem to have a lower than expected success probability on the top right. We believe we found the cause for this discrepancy and that the cause is essentially the effect of sample variance. Now when we are given a LWE instance to solve we are essentially building this lattice basis that contains as we said a target vector T which is being sampled from a certain distribution. And in our case we had this distribution be the secret and the error distribution for example a Gaussian distribution with variance 1. Then the coefficients of T are essentially identically and independently distributed coefficient sample from this Gaussian. Now when the Gaussian theoretically has variance 1 in practice we are only given a finite amount of samples from this distribution which are the coefficients of the target vector. And these coefficients we have a certain sample mean which is the mean of the coefficients the mean value of the coefficients. And also we can define something called the sample variance of these coefficients that essentially plays the role of the variance of the distribution that the coefficients were sampled from in that we expect it to be the same. However given a particular instance not necessarily it will be the case that the sample variance exactly matches the variance of the secret distribution. And since the projections of the target vector really depend on the sample variance because the target vector has been sampled already when is given to us in the form of a learning with error instance. Our simulations will be off if the sample variance is not close to the variance of the secret distribution. To verify this theory that is that the sample variance being off causes our predictions to be off as well. We decided to rerun these experiments but we specifically tried to sample instances of LWE with the sample variance was at most 2% off from the variance of the secret and error distribution. This way we know that the sample variance will not differ in our predictions for the projections of the target vector should match. What we can see in this plot is that indeed for the same predictions once the sample variance is controlled the success probability significantly better matches our predictions. A good note about this is that well it might seem artificial to limit the LWE instances only to quote unquote good LWE instances where the sample variance is close to the standard deviation. The sample variance itself has a variance that decreases with the dimension of the problem. And so really for cryptographically sized instances we don't expect the sample variance to deviate significantly from the variance of the secret and error distributions. Finally we use the USVP simulators to also explain the success probabilities reported in Albrecht et al. And indeed we can see that our simulator seems to explain why for some slightly smaller block sizes we could see non-negligible success probabilities with BKZ. Having found a way of predicting the success probability that smaller than optimal block size will have we could now wonder what its impact will be on the security estimates for lattice based cryptographic protocols. Here we are looking at a table that contains some estimates that we run for the three finalists, chem finalists from the post quantum cryptography standardization process run by NIST. What we did is we took the parameters available at the time of writing that is the parameters from the second round of these schemes. And we use the LW estimator script to find what is the block size predicted using the methodology by Alki Metall that is beta star. And we reported here on the leftmost column. And then we decided to use our USVP simulators to see whether what is the expected successful block size. If instead we were to use a BKZ 2.0 with 15 tours or if they were to use progressive BKZ with either one or five tours for every block size. And these are the numbers. Now you can see that actually all these numbers appear to be larger than the successful block size predicted using the Alki Metall methodology. And this result was already found for some other schemes by Dagman Soled et al. Where their scripts were essentially just recovering this expected successful block size and also these seem to be larger than what the Alki Metall methodology originally predicted. But then since we are extrapolating somehow the full probability distribution of the successful block size of solving LWE. We can also look at what is the standard deviation of the successful block size. What we can see here is that the standard deviation stays relatively small even for cryptographically sized parameters. And throughout it seems never to reach a value of 4 for example. So summarizing we see that actually the expected successful block size is larger than Alki Metall and that is the variance of the successful block size stays relatively small. Now both observations should be good news. The fact that the variance stays small means that our successful block size cannot be significantly smaller than the expected successful block size. That is if we were to choose something significantly smaller we would very quickly incur in an attack that has essentially zero success probability. So it should not be possible to run low probability attacks by just saying let's speak a much smaller block size. Then on the other hand the fact that the expected block size is larger than predicted before means that the Alki Metall methodology underestimates the hardness of the learning with error problems. And this is good news because it means that previously choosing parameters should still be secure against the primal attack. And it is a little bit counterintuitive because these USVP simulators are essentially accounting for the success probability of smaller than optimal or than previously predicted block sizes. So how is it possible that the expected block size is growing overall? Now we believe that we have identified why this expected block size is growing. And this is not really caused by the USVP simulation but rather is used by the fact that internally we are not using the geometric series assumption but we are using a BKZ simulator. So just to recap here and show this effect here we are looking at the part of the basis as the GSA profile predicted for Kyber 512. And in red we have the GSA line that we saw before while in purple we have the logarithm of the expected norm of the target vector during the attack. Now what the Alki Metall methodology does is to say let's find the point of intersection between these two lines and which is beta the block size. Such that it reaches from the end of the basis to this intersection. However since we are not using the geometric series assumption the line for the predicted reduced basis profile is slightly different and it's the one predicted by the Chen and Yan BKZ simulator. With this line if we were to look at the intersection between the basis profile and the purple line that represents the norm of the projections of the target vector. This intersection has moved to the left and so choosing the block size using the simulator output will lead to bigger block sizes. Now this effect will carry then to our USVP simulator and similarly to the code by Dagman Soled et al. Simply because the GSA is never directly considered and instead simulations are always done. This should then explain why the expected block size reported by this work is bigger than that by reported by the LWE estimator which internally uses the Alki Metall methodology and hence the geometric series assumption. In conclusion in our work we captured the success probabilities of smaller than expected successful block sizes. And we show a methodology that allows us to predict these probabilities. The effects seems to be consistent across a secret and error distribution so we also show that using binary or alternative distribution does not directly impact the course of the primal attack without any extra combinatorial step. Finally we show that the even accounting for the low success probability block sizes over all the hardness of the learning with error problems does not seem to be significantly impacted. More details and many more experiments can be found on our eprint and also the code and the data used for producing all the plots and also the code for the USVP simulators can be found on GitHub. Thank you very much for your attention.