 Hello everyone, I am Maxime Bro from the University of Limoges and I'm going to present improvements of algebraic attack for solving the rank decoding and min-rank problems. This is a joint work with Magalie Bardet, Daniel Cabarcas, Philippe Gaboris, Ray Perner, Daniel Smithstone, Jean-Pierre Thilich and Ravier Verbel. OK, so first of all I will present why are these problems important before describing both of them and I will give a reduction from one to the other. I will then describe what are algebraic attacks in general and then I will give the modelings and complexity for both problems. And I will conclude giving comparison with previous attack and sum up of our contributions. So why are these problems important? First of all, rank decoding is at the core of rank-based cryptosystems and min-rank is at the core of multivariate-based cryptosystems. Not also that min-rank is important in rank-based cryptography as well because of some reduction I will mention after. Moreover, two rank-based cryptosystems, Rolo and RQC, made it to the second round of the celebrated NIST post-common tomb sterilization process and one multivariate-based cryptosystem, Rainbow, made it to the third round. The min-rank problem is easy to describe. One, just as a set of K matrices, M times N, which coefficient in a final field FQ and an integer R, and he wants to find a non-trivial linear combination of those matrices, which is of small rank. This problem has been proven to be NP-complete in 1999. Before describing the rank decoding problem, I will give a more general problem, hard problem in coding theory. So it's known as the decoding problem. So the input is a code and one receive a word, which is Y equals C plus E, where C is a code word belonging to C, plus a certain error of a small weight. So the weight is here. It's equal to R, which is an integer. And the problem is to output C. So we want to recover the code word, that is to say removing the error. This problem yields to different kind of cryptography, depending on the metric one chooses. So with the Euclidean metric, for instance, it yields to lattice-based cryptography. And with the rank metric, the one we're gonna focus on today, it yields to the rank-based cryptography. So this problem for the amic metric has been proven NP-complete in 1978. And randomized reduction from an NP-complete problem was given for the rank metric in 2017. Now let's define the decoding problem for matrix code. So a matrix code is a subspace of all the matrices of size M times N with entries in FQ. And this subspace is of dimension BK. And so in this version of the problem, one receives a matrix Y, which is equals to C plus E, where the rank of E is small. And so the rank here, the metric we use is the classic rank metric for matrices. So it's the rank of the matrix over FQ. And we still want to output C, that is to say to recover the matrix of small rank E. And so the basis of C, the matrix code here, is a set of K matrices, M1 up to MK. And so by definition, the code word C is a linear combination with entries in FQ of those matrices. Thus this problem is perfectly equivalent to mean rank, since one looks for E, which is a matrix of small rank, which is a linear combination over FQ of the matrices here. So it's Y plus all the M matrices. So now I will present the rank decoding problem, which is the decoding problem for FQM linear codes. So in this case, C is a subspace of the vector of lengths N with entries in FQ to the M, the extension field of FQ of degree M. And this subspace is of dimension small K. So here one receives a vector Y, which is always C plus E, where E is of small rank. The problem is that here, we need to define what is the rank of a vector in this space. And so that is what we call the rank metric in FQ to the M, to the power of N. And so I'm gonna define this metric on a toy example. So here we have a vector V of lengths five over F2 to the four. And what we're gonna do is that we're gonna use a basis B of F2 to the four, seen as an F2 vector space to enroll or unfold each of those coefficient of V into these bases. So you can see, for instance, here, one will be written like that, one and zeros after. And so doing this for all the coefficient of V, we're gonna get a matrix with entries in F2. And so we're gonna define the rank of V as the rank of this matrix. So there is a reduction from the rank decoding problem to mean rank because when we have an instance of the rank decoding problem, we can write every word of C. So recall that it's a vector of lengths N with entries in FQ to the M as an M times N matrix as we did before to define the rank metric over this space. And so this matrix will have coefficient in FQ and one gets a matrix code over FQ of lengths M times N and dimension MK. Thus, the rank decoding problem reduces to mean rank. Be careful, the converse is not always true because the rank decoding problem has more structure than general mean rank instances. So our attack is an algebraic attack. Algebraic attack means that one models a problem with a system of algebraic equation and try to solve it. In cryptanalysis, the often unique solution to this system of equation can be the private key or the plain text. So in our case, it's gonna be something which is related to the plain text and it's really easy to recover the plain text from it. For the rank decoding problem, the solution of this system of algebraic equation will obviously be the error of small rank E and for the mean rank problem, the solution is the vector x1 up to xk which yields to the small rank matrix. To solve this system of algebraic equation, there are two classic approaches, the generic ground basis algorithm such as F4 or F5, but there are also specific linearization techniques and that's what we're gonna use in our attack. So linearization, it means that one is gonna use the fact that sometimes the number of equation in a system of algebraic equation can be greater than the number of distinct monomials that appear in the system. This allows one to solve the system directly by linearization using those monomials as new variables. Thus, one only has to solve a huge linear system and no longer require generic ground basis algorithm. This works perfectly when there is a unique solution. That's important in our case and that's why we use the rank exactly equal to R in our instances instead of smaller than or equal to R. Moreover, if this new huge linear system is sparse, one can take advantage of this sparse CD to use Widemann's algorithm instead of the classic Strassian one. So here is a linearization today example. So here we have a system of four equation over this polynomial ring, F2 with a three variable x, y, z. And so we want to find the only point where all do polynomial vanishing. In this polynomial ring with three variables, there are 20 distinct monomials of degree less than or equal to three. But if you look carefully, only five of them, including one, appear in this system with four equation. So we can write the macrolymatrix, this one associated to the system and we look for a vector in the right kernel of it of this form with a one at the end. And doing so, we're gonna get affectation or the values of those monomials, those distinct monomials that appear in the system. And so here it's 0101. And as we have here, the value of z, it is really easy to go from the affectations of those monomials to the final solution. But sometimes this step can be really hard. And that's why I'm gonna describe how we do, we deal with it in our modelings after and recall that it's really important that we have only one solution. So both our modeling for min rank and the rank decoding problem rely on an important fact, which is that one can write matrix M, usually of small rank as a product SC, where S is the basis of the column space of the matrix and C is the matrix containing the coordinates of every of those columns in this basis S. Of course, if the matrix M is not of small rank, this decomposition would be trivial. And it is also important that the roles of S and C are perfectly permutable. We can consider C as a basis of the row space of M. And then S would be the coordinates of every row of M in the basis C. So recall that in the min rank problem, one wants to find an FQ linear combination of K matrices, which is of small rank. So here we have variable Xi. We call them the linear variable because they appear in a linear combination. And so we're gonna write that this matrix, since it has small rank, can be written S times C. Now you can notice that if we look at the first row of this matrix, it belongs exactly to the row space of the matrix C, because it's gonna be a linear combination of the rows of C. And so if we put this line at the top of the matrix C, we're gonna get a matrix of rank R because C is of full rank R. And it has R plus one rows. So all the maximal minors of size, of course R plus one in this matrix will vanish. And so this yields to a system of algebraic equation depending only on the linear variable Xi and the coefficient, the entries in the matrix C. And of course we can do this for all the M rows of this matrix. An important fact is that expanding each of those maximal minors with respect to the first row, the one in purple over there, whose coefficient are linear form in the linear variable Xi. And considering all the maximal minors of C as new variable Ct, one gets a bi-linear system in the variable Xi and Ct. This system, this modeling is called the super minors modeling. If one doesn't have enough equation to solve this system, so it means that we don't reach the linearization bound, we can get more equation multiplying every of those equation by degree B minus one monomials in the linear variable Xi. And so the new variable, so there are UB of them, we'll have this form, they're gonna be degree B monomials in the Xi, multiply by Ct. And so all those variable will belong to FQ and it's the same for the equation. And so the complexity as it's a linear system, the complexity to solve this system will be, it's given here, it's written classic because it's a complexity to solve a linear system. So it's gonna be using, for instance, stress and algorithm. But sometimes, especially when B is greater or equal to two, it's more interesting to use the V-domain approach for solving sparse linear system because all those equation will have only this much term per equation. So it's a really sparse system. And so sometimes we're gonna use V-domain approach and that's why there is a mean in the complexity. Now, I describe the fact that sometimes we are what we call under-determined, so it means we don't have enough equation, but on the other hand, sometimes not multiplying the original equation of the degree one in the XI and one in the city, the billionaire system, we already have way too much equation. And so to make the complexity of this attack less expensive, we can remove some of the columns of the original set of matrices we are looking for a small rank with. And we're gonna get less variable and less equation and we're gonna try to remove as many columns as we can so that the problem has the least complexity. Now I'm gonna describe our modeling for the rank decoding problem. As I presented it before, there is a perfect equivalence between the decoding problem for matrix code and the mean rank problem. But in the rank decoding problem, there is a reduction to mean rank, but it's not equivalent. So it means that the FQM linearity in the rank decoding problem gives a strong structure to this problem and that's the one we're gonna use to solve it instead of just using the mean rank instance which is associated to it. And so when we want to solve an instance of the rank decoding problem, recall that we receive a word Y which is C plus E. And so the idea is to add this word to the code C so that now we have a new code C tilde which contains all non-zero multiple lambda E of E. And so we're gonna compute the parity check matrix of this new code. And as we did before, we're gonna write E which is a small rank as a product of two matrices S and C. But this time as E as coordinates in an upper field FQ to DM, we're gonna need a basis of FQ to DM as seen as an FQ vector space in front of it. And so we're gonna get this system of algebraic equations and you can note here that we explicitly gave a vector which belongs to the kernel of this matrix, of this matrix, sorry. So we know that the rank of this matrix will be smaller or equal to R minus one. Thus all maximal minors of this matrix will vanish. So this system of algebraic equations is the one considering the previous attack against the rank decoding problem which was given by Bardet and Hall at Euroquip 2020. The starting point of our new attack is to use the fact that those maximal minors, so recall that it was the maximal minors of this matrix, can be written as linear combination of maximal minors or determinants of the matrix C. And this is due to this formula which comes from the Cauchy-Bines formula which generalizes the formula for determinant of square matrices. And so we can consider those determinants, those minors as new variable Ct. So that's the first fact mentioned here. And so it yields to a linear system in the Ct's. Another important fact, recall that it was hard to solve sometimes the last step from going in linearization to go from affectation to every monomial to the affectation for every single variables. So here we have a lot of solution for this system because the matrix C can be written, for instance, in systematic form to make it unique. And so what we're gonna do is that we're gonna put the identity in front of the matrix C. And this is really important because as we consider Ct's which are determinants of this matrix as new variables, every single entries of C will be one of those maximal minors. In fact, we just have to take r minus one columns in this part with the identity and one here to get, for instance, this coefficient or this one depending on which r minus one row columns we choose here. And so those new variables, the Ct's, includes exactly the coefficient of C. And so this new modeling is called the max minors modeling. Now the complexity to solve our attack against the rank decoding problem. So we have a linear system in the variable Ct's and M times this binomial coefficient equations over Fq because we're gonna enroll all the equation which were minors over Fq to the M into the ground field. And so that's why we're gonna get M factor here. And so as long as we have this condition which is fulfilled, so more equation than variable, we can solve this system. And as it is not sparse, we won't use the V-domain algorithm. Exactly like we did before for min rank and it's gonna be exactly the same here. If we have way more equation than variables, we want to remove some of those equation and variables. And we're gonna do this, puncturing the code, which means exactly the same as we did for min rank. We're gonna remove some of the coordinates of some of the columns in the min rank associated instance so that the problem is easier to solve. And so we are looking for the biggest integer P such that this condition is fulfilled and so it's gonna remove variables and equation. And when we find this value, we can solve this system at a lower cost. On the other end, if we want to solve a problem but we don't have enough equation, we can guess some of the columns of the matrix C. So it's gonna remove some variables because we're gonna know more maximal minors but this can be done only at an exponential cost. So that's why we're gonna have here the cost to guess columns in the matrix C. But doing so, we're gonna remove variables and we won't change the number of equations. So we need to find this time the smallest integer A so that this condition is fulfilled and it's called the ebrich case. Oh, and I forgot to mention it. The one where we puncture the code is called the super over-determining case. Notice also that this approach, this hybrid approach, which consists in guessing some of the column of the matrix C can also be done for the mean rank problem. So when we don't have enough equation for the mean rank in our mean rank modeling, we can also, instead of multiplying the equation by the XC variable, we could also guess at an exponential cost some of the columns of the matrix C. Now you may wonder how can we solve a rank decoding instance when the hybrid approach does not work? So it means that A, the integer A is way too big. So the exponential cost of guessing columns in C will be too big. In this case, we can combine both modeling and this rely on the fundamental fact that the variable cities in the max minors and support modeling are exactly the same. They are both determinants, maximal minors of the matrix C. And so as there is a reduction from the rank decoding problem to mean rank, we can perfectly combine those modeling. So recall that in the max minor modeling, we have monomials of the form a degree one in CT and degree zero in XI because they only have degree one in CT. And in the support minors modeling, we have degree one in the XI and degree one in the CT. So we're gonna, if we don't have enough equation, multiply those equation by degree B minus one monomials in the linear variable XI and here degree B monomials in the XI variable. So we're gonna get new variables that are perfectly compatible in both of those systems. And so the complexity in this case where the hybrid attack is too expensive is called the under-determined case. And so the complexity to solve it is either the classic approach or the video man approach where here TB will be the number of term per equation, which is the average between both modeling. So here is the comparison of our attack to the previous one against the rank decoding problem. So the previous one was by Barden Hall at Euroquip 2020. And so you can see it's in the last column here and ours is here in bold. And so you can see that our attack is always better. And it was possible to go below the claim security of all those crypto system because usually they were in the super over-determined case. So it means that here you see that P is important, sometimes nine or even 40. So it means that we could puncture or remove 40 coordinates of the initial code. When B equals zero, it means that we are not in the under-determined case. And when A is positive, for instance, here, it means that we add to guess three columns of the matrix C. Here is the complexity of our attack against the mean rank problem. So it applies to the rainbow crypto system, which is a multivariate based crypto system. And you can see that our attack improves significantly the previous attack on mean rank. But be careful, the mean rank attack is usually not the best one against those parameters for rainbows because the best attack is usually either RBS or DA. But we can see that for two of our new attack, the complexity is even better than the best attack on the rainbow crypto system. So now I'm gonna conclude giving you a sum up of our contributions. So we improve significantly the best known attack against the rank decoding problem. We also gave a new algebraic attack against mean rank. Two rain-based crypto systems, Rolo and RQC, did not reach the third round of the aforementioned NIST standardization process, especially because of our attacks. Nevertheless, in the report on the second round, NIST emphasized on the importance to keep studying rain-based cryptography. To quote them, NIST believes rain-based cryptography should continue to be researched because it offers a nice alternative to traditional naming metric. Thank you for your attention.