 Good morning everyone. My name is Magalie Bardet. I'm a researcher at the University of Rouen in France and a member of the Inria Research Institute. I'm going to talk about an algebraic attack, a rank metric code-based scripted systems. This is a joint work with Pierre Brio and Jean-Pierre Théliche from Inria, Maxime Brosse, Philippe Gabori, Vincent Néget, Olivier Ruitata from the University of Limage. You all know the McKellis cryptosystem that was invented in 1978 at the same time as the famous RSA cryptosystem. The system is based on the hardness of the decoding of a random linear code. The public key is a linear code that looks like a random linear code but that has an efficient decoding algorithm. The secret key is a hidden structure that provides the efficient decoding algorithm for the public code. To encrypt a message, we just add an error to it and the decryption is just recovering the error. There are seven among the 17 games mission to the NIST competition for post-quantum cryptography that rely on the McKellis cryptosystem with different families of linear codes. The main drawback of this system is the size of the public key which is the description of the linear code. The challenge for the systems is to reduce the size of the public key by using some structure while keeping the public decoding hard. In the last decade, a sequence of proposals have been made in rank metric code-based cryptography starting with the GPT cryptosystem based on gabidulin codes that are analogs of Ritzelman codes in Hamming metric. It was totally broken by Overbeck in 2005 with a care recovery attack. Recently, proposals have been made that have led to two different camps that are currently in the second round of the NIST competition for post-quantum cryptography, namely the Rolo and the RQC cryptosystems. Rolo relies on LRPC codes like LTPC codes in Hamming metric and can be seen as an analog of the latest base cryptosystem and true. It has been shown that the security of RQC relies only on the decoding program in rank metric. The advantage of these schemes is that their structure allows for smaller public keys than other proposals in the Hamming metric. Note that it is also possible to build signature schemes using rank metric. Our contributions in this paper concern algebraic attacks in order to solve the decoding problem in rank metric. An algebraic modeling is a set of equations whose solution is the error we are looking for. We start from the modeling from Uhrisky and Johnson, which is a set of billionaire equations depending on two blocks of variables and we add to it a new set of minors of a particular matrix that are of rather small degree and involve only one of the two blocks of variables. We provide a new strategy to solve our algebraic system that is specific for this kind of system and is more efficient than a generic strategy. In particular, while generic strategies will first consider the equations of degree two, we first compute a criminal basis for the minors and then add the billionaire equations to end the computation. The complexity analysis of our strategy gives better results than the previous combinatorial attacks, which means that the security parameters for the cryptosystems have to be increased to reach the same security level. We also provide various benchmarks to support our results. For instance, consider a family of linear codes over f2 to the n with length n and dimension n over 2. Up to now, the most efficient generic decoding algorithm was the information set decoder that is a combinatorial type of algorithm and have a complexity in 2 to the r times n whereas our algebraic attacks have a complexity in 2 to the r times log of n. We will now go into detail starting with some definitions. A linear code is just a vector subspace of a finite vector space, usually over a finite field. We denote by j, a generator matrix with rows from a basis of this vector subspace, every element in the code is a linear combination of the rows of j and is called a codeword. If the length n is a composite integer, then a word of length m times n can be viewed as a matrix with m rows and n columns. We can now define the weight of a codeword by the rank of the corresponding matrix and the distance between two words is the rank of the difference of the two corresponding matrices. Decoding a word corresponds to finding the nearest codeword with respect to the rank metric. Given a generator matrix j and a word y, decoding y is equivalent to find a word m such as y minus m times j as the smallest possible rank. This can be rephrased as a mineral search problem given a matrix y corresponding to the word y and matrices g1 up to gk corresponding to the rows of j find a linear combination of rank not greater than r. The problem is done to be a NP-hard program and the best known algorithm solving it have exponential complexity burns. In this talk, we will focus on some particular codes specified as linear codes over fq to the m which is an extension of fq. By fixing a basis of fq to the m over fq, we can see any fq to the m linear code of length n dimension k is a matrix code over fq of length n times m and dimension k times m. The benefits of choosing such families of codes is that the non-families of matrix codes that have an efficient decoding algorithm are fq to the m linear codes and such codes have a shorter description than general matrix codes. We gain a factor m in the size of the public key. From a security point of view, even if the general decoding problem is not known to be NP-hard for fq to the m linear codes, there is a randomized reduction from the decoding problem in the hamming metric to the fq to the m round decoding problem. This means that decoding in the hamming metric which is NP-hard is easier than decoding fq to the m linear code for the rank metric. Progress in the design of combinatorial algorithm to solve the rank decoding problem suggests that the additional structure coming from the fq to the m linearity may not have a significant impact on the complexity asymptotically. We will see that algebraic attacks can take advantage of this structure. We will now address the algebraic modeling of this decoding problem. We start from the work by Oeski and Johansson in 2002 and try to solve it using Gromner-Basie's techniques following Levy-DVL and Pere in 2006. We add the word we want to decode to the public code. Then, thanks to the fq to the m linearity, the code CY contains two to the m vectors of white R that are all multiples of E for any lambda in fq to the m that is 90. If the error was small enough, the code CY contains no other small rank vectors. This means that we can search for a particular solution and specify some of the unknowns. Such codewords are characterized by two conditions. The rank of E is equal to R and E belongs to the code CY. The second condition can be easily described algebraically using a parity check matrix of the code which is a generator matrix of the dual of the code and the equation E times the transpose of HY equal to 0. The condition that E as a rank R can also be algebraically described. If alpha is a generator of fq to the m over fq, then E as a rank R means by definition that the coordinates of E generate a vector space of the dimension R over fq. We call S1 up to SR a basis of this vector space and S is the matrix of the coordinate of these spaces in fq to the m. Then C is the coordinates of E in this basis S and we can write E as a product of two matrices of a rank R. We now have a set of equations whose solutions are exactly the codewords in CY of rank at most R. Note that this system has a lot of solutions indeed for a given codeword E. Any basis S will give a solution and also any multiple of E is a solution. This allows to specify some variables or in another word to search for a particular solution. Here we will assume that E as a rank exactly R and we will specialize the identity in S, 0 on the first column and 1 on the first column of C. With the specialization, with high probability, the system has exactly one solution. The system is affine and billionaire with two blocks of variables, the C ones and the S variables and we look for the solutions over fq. If we choose m and k linear in n and r in the order of the square root of n, then we have another determined system with many more equations and variables. Finally, the system has only one solution, hence the Grumner basis for any monomial ordering as this very simple shape. We want to understand the behavior of the Grumner basis computation and its complexity and then find a more direct way to compute the solution. Complexity results regarding billionaire systems are known for generic homogeneous billionaire systems. Foger, Safael Dean and Speller were describe the behavior of a generic Grumner basis algorithm for such systems leading to a complexity boon for the Grumner basis computation of such homogeneous systems. They also show that for generic affine billionaire systems at some point in the computation, there will be degree falls, which means that some polynomials will have a smaller degree after a reduction and they give a complexity estimate for the cost of the Grumner basis computation in terms of the degree where those first degrees fall occurs. It is largely widespread that the first degree fall is a good measure of the complexity of Grumner basis computation. This is not always true, although in some particular cases, as for generic affine billionaire systems or semi-regular over-determined systems, it can be proven that it is the case. For generic affine billionaire systems, the first degree fall are related to the left kernels of the Jacobian matrices associated to the system. More recently, in PQ Crypto last year, several authors show that for the Kipnis-Chamear modeling of the mean rank problem, the Jacobian matrices have a particular shape that leads to degree falls in a much smaller degree than in the generic case. In fact, the Jawansson modeling also has the same particular shape and we will use it to produce a new algebraic modeling for the rank-decoding problem. The set of maximal minors of the matrix c times h transpose that are minors of size r belong to the ideal generated by the initial quadratic equations. To prove that, we compute formally the Jacobian matrix of the system, the elements in its left kernel identify the first degree falls and compute formally the value of the polynomial of smaller degree after reduction that are exactly the maximal minors of c times the transpose of h. We are therefore able to compute directly the polynomials of a degree r that occur during a Krebner-Basic computation in degree r plus 1 without doing all the computation in degree r plus 1 that the generic Krebner-Basic algorithm would do. We have just formula for the result of the computation. Let's have a closer look at those equations. The matrix c times h transpose has r rows and n minus k minus 1 columns, hence there are r choose n minus k minus 1 equations over fq to the m or m times more over fq. Moreover, each equation is a minor that is the determinant of a sub-matrix and we can express the determinant of a product of matrices as a sum of determinants using the Cauchy-Binet formula. Now we consider our polynomials as polynomials in the new variable ct that are maximal minors of c, a variable ct, hence represents a polynomial of degree r with factorial r monomials. Note that as we specify the first column of c, sum of the ct r of degree r minus 1. Viewing in terms of those new variables, the polynomials pj are linear polynomials and we can try to solve this linear system. Solving this linear system can be done by instrumenting a matrix whose columns correspond to the variable ct and the rows correspond to each equations that are indexed by j, which corresponds to the subset of columns we take for the determinant, and i, which corresponds to the highest coefficients of the polynomial pj expressed in the basis of fq to the m over fq. We take the highest coefficient. As the linear system is homogeneous, there is a vector space of solutions. If we have more rows than columns and the matrix has full wrong, then the right kernel has dimension 1 and the solutions are the value of all the variables expressed in terms of a particular one, which is the free variable. We simplified the system maximiners into a new system having fewer monomials of our factorial r and r minus 1 through n minus 1 equations of degree r minus 1. Experimentally, we noticed that m has maximal rank with very high probability. We distinguish three different cases. The most favorable one is the overdetermined case. We call it overdetermined because we have more equations than unknowns for the linear system. In that case, we can express all minors of c in terms of a particular one, and we can add to the original quadratic system a lot of equations of degree r minus 1. In the intermediate case, we produce some equations of degree r minus 1, but we do not have the value of all minors of c. And the hardest case is the underdetermined case. We don't produce any equation of degree r minus 1, just reduced equations of degree r. Here is the experimental strategy we are proposing in the paper. We compute the initial quadratic system, then we compute the maximal-minor equations that are of degree r. We linearize the set of maximal-minors with respect to the ct variables and take the resulting equations of degree r minus 1 if we have some, or the resulting equation of degree r if we have no full of degrees. Then we add those equations to the quadratic system and compute the d-column of basis, which is the truncated column of basis obtained by ignoring any computation in degree greater than d. All our experiments are done using the F4 algorithm implemented within MAGMA. The experiments suggest that r is sufficient in the overdetermined case and that we need r plus 2 in the underdetermined cases with values intermediate values. We compare our strategy with the results given by Levy-Divale and Peret in 2006. All parameters but one were over-determined cases. We give the size of the extension field, the parameters n, k of the code, the length and the dimension, the rank of the error, n sub s and sub c and n x are respectively the number of variables in s in c and the number of quadratic equations. This column gives the degree and the number of equations that we produce after simplifying the system maximal-minors and this is the time needed for the echelon form computation to compute those equations. For r equal to 2, we produce linear equation of degree r minus 1. We compare this strategy with the general criminal basis computation. The first row gives the timing for the general criminal basis computation just over the first initial equations and here we have to add the timing to compute the max-minor system and then to reduce it and add it to the initial system. The benefit of our strategy is to decrease the maximal degree of the polynomials during the computation. This has an impact on the size of the matrices built by the F4 algorithm during the computation. For r equal to 2, the benefit is not so clear and sometimes constructing the max-minor equation and simplifying them may take more time than the direct computation but for a larger degree the gain is important. For the only under-determined case, we couldn't find directly the criminal basis due to a lack of memory and we had more than 200 gigabytes of memory. The successful strategy was to first compute the criminal basis of the minors which is here and then add the resulting equations that are of various degrees between 2 up to 5 to the initial quadratic equations. Then the total timing is less than 3 minutes. We give here the theoretical complexity of our approach for the row and RQC cryptosystems depending on the heuristics that for over-determined systems the criminal basis up to degree R will give the solution whereas for under-determined systems we need to go up to degree R plus 2. Basically the system for the first and second security level in the miscompetition correspond to over-determined cases whereas the parameters for the third level correspond to under-determined cases. We see that in all cases our estimate bones are smaller than the complexity given by the combinatorial attacks. In conclusion with a good understanding of the first steps in a general criminal basis computation we were able to perform the first step theoretically and get a formula for new equations the minors that depend only on one block of variables. We then inter-reduced those equations before adding the quadratic ones using a change of variable that saves the factor of factorial R on the number of columns of the matrix. This leads to an algorithm more performance than all previously known attacks. I will finish this presentation with spoilers. Since the submission of the paper to Europe it we worked with the authors of the PQ Crypto paper and he improved as a result by using two additional strategies. First as the equations in the minors only involve the C variables it is more efficient to specialize the identity in the C matrix instead of in the matrix S. Then the minors have degrees from R done to 1 and the minors of degree 1 are the variable C ij. In the over-determined case the simplification of the minors gives directly the values of the variable C and then solving a linear system gives the value of the S variable. The second improvement consists in performing an excessive search on some columns of C. To reduce to the over-determined case in fact this seems to be the best strategy for the parameters selected in the NIST submission ROLO and RQC and I give you the complexity estimate that we have for ROLO and RQC with these new strategies that completely break the parameter from the NIST submission. The consequence is that the value of R has to be increased by 2 to keep the claim security level. Thank you for your attention. This work was partially funded by the French INR through the CBCRIP project and by the European Regional Development Fund and the French Region Normandy.