 of our toolkit for HFE-based multivariate schemes. And this is joined work by John Charles Fouyere, Ludovic Perrette, and Jocelyne Ricketham. And Jocelyne is gonna give the presentation. So, hi, everyone. Today we'll talk software toolkit for HFE-BH multivariate schemes. So, co-work with Jean-Charles Fouyere and Ludovic Perrette. So, today we have the emergence of the content computer, and it is a danger for the multivariate geography. For example, for RSA or discrete algorithm problem-based schemes. So, for this reason, NIST has begun a post-contact geography standardization process in the end of 2017. And one year. So, there is 69 candidates, and one year after, we have only 26 candidates at the second round of the competition. And in particular, we have four over the nine candidates in the signature schemes are multivariate. We have libraries for code-based, and that is based on multivariate, based on schemes, but we don't have libraries for multivariate schemes. So, for this reason, we close the gap with MQsoft. So, what is MQsoft? It is an efficient library implemented in C, and we use the SSE and AVEX2 set instructions. Then, we have in particular implementation of Matsumoto email-based scheme. For example, we have Quartz, GUI, and GEMS. So, GUI is the candidate at the first round of the competition, and GEMS is at the second round of the competition. And for Quartz, it is in 2001, we have NSE competition, it is an open competition, and Quartz has to meet for this competition. And we have also generic application. We have arithmetic in GF2 to the end, in GF2 of X, and in GF2 to the end of X with a linear-type route-finding. We have also operations for multivariate code-based systems in GF2, for example, evolution or change of variables. And finally, we have constant time connection against the timing attack to protect secret data. So, what is Matsumoto email-based scheme? So, the idea is the public key is a multivariate code-based system in GF2 in our case. And the variating process considers us to evaluate the system. For the timing process, we need to invert the public key, and the public key is a component of affine transformation and a special trapdoor we generate in secret system. So, to invert the public key, we need to invert the affine transformation and invert the trapdoor of the secret multivariate code-based systems. In particular, for the HFE-based signature schemes, the trapdoor, the universal trapdoor is just to find the root of a new right polynomial in our case, in GF2 to the end of X. So, for the performance, for Quartz in 2001, we have a fourth signal for the key perseneration, 10th signal to sign on approximately one millisecond to verify a signature. And today, with the N2-soft, so with the new hardware with the Azure processor, but also with our implementation, we have a factor 2,000 on the key perseneration, a factor 500 on the signing process, and a factor approximately 400 for the verifying process. So, the performance for the NISC candidates, the gems on GUI. So, for gems, we have a factor approximately three for the key perseneration. Whereas for GUI, we have a factor between 13 and 26 for the key perseneration. For the signing process, we have approximately a factor two. And for the verifying process, we have a factor between 60 and 100% for Azure processor. And here, we have an example of the architecture of our library for HFC schemes. So, we have the free cryptographic operations, the key perseneration, the signing and verifying process. So, in red, we have the part I talk in this talk. So, for the key perseneration, we need to generate the secret meteorite systems. And for this, we need the multiplication of GF2 of X. So, this is an crucial part of the key perseneration. Then, for the signing process, we need to have the root filing in an univariate polynomial. And for this, we need a arithmetic. So, multiplication is fine in GF2 to the end, but also a arithmetic in GF2 of X. So, here also, the multiplication is on crucial part and also the squaring. And finally, for the verifying process, the main part is just to evaluate on the multivariate quadratic systems. So, the arithmetic is an important part of the verifying, no, the signing and key perseneration process. So, we have generic libraries. For all value of N, we have the implementation of the arithmetic. So, we have a magma, a computer, algebra, software. We have NTL, a C++ implementation, really efficient. We have also print a C implementation, when print is adapted to the binary field. And then, we have a GF2X, a specialized library, just for the multiplication in GF2 of X. And also, we have the implementation for a specific value of N. So, we have a several contexts. We have the context of the binary AT curves. So, we have a five value of N, our standardize. So, we have a specific implementation for this. And we have also a specific value of N for the implementation of GUI. And with MQSF, we have a generic implementation on tie N equal to 576. So, we use a C implementation and we have the AVEX2 instruction send to improve the libraries. We are specialized specifically for the SCALAC Intel processor, but also, recently, for the hardware processors. But, generally, we are efficient for all Intel processors. So, for the multiplication and squaring in GF2 to the N. So, we create GF2 to the N by GF2X consented by a N-degree irreducible polynomial. And we need to have a constant time implementation to be protect, again, the timing attack. So, the multiplication is the most important operation. And for this, we use a PCM-muclic instruction. So, the generation computes the product or 263-degree polynomial in GF2 of X in Y instruction. So, the idea to use the schoolbook algorithm by block of 64 bits. And when we want to multiply two blocks, we just use the PCM-muclic instructions. Then, when the value of N is enough large, we use the Cartov algorithm just to reduce the number of code to PCM-muclic instruction. So, the instruction, this performance depends on the processor. It is faster on the Scalic processor. And so, it impacts when we choose the Cartov-soul-bar-soul-book algorithm for the value of N. Then, for the performance, we have a compare Magma Intel and N-Q-soft for the multiplication in GF2 to the N. So, we have approximately 600 cycles for Magma. We have 200 cycles for N-T-L between 40 and 90 cycles for N-Q-soft. Then, for the squaring, it is a particular operation because to put as a power of two is the characteristic of the field. So, we have a special property with the linearity of the Frobenius and the Morphys. We have the sum of the square and the square of the sum. So, we have a specific implementation for this. So, first, you to use the table lookup of square. So, the idea, we have a special instruction which compute 16 or 22 times in parallel the fact to search eight bits from an index or four bits, search the eight bits in the table. So, if we take four bits, we want the eight bits corresponding to the square, you can use this instruction to have directly the result. So, the instruction is particularly faster on other processors. But for the Scalic processors, we use directly the PCM-Lucidico instruction to compute each block by itself of the square. So, for the performance, for the squaring, we have with magma of 500 cycles. We have 150 cycles for NTA. And for NQSF, we have between 20 and 40 cycles. So, in particular, the squaring is between factor two and factor four faster than the multiplication. So, it is really efficient. Then, for the verifying process, we need an efficient evolution of a multivariate quadratic system. So, we need an efficient representation. So, for this, we can use the representation equation by equation, so the idea. For each equation, we set a monomial order and mistook the coefficient one by one. But it is not efficient on the GF2. The most efficient representation is the representation coefficient by coefficient or monomial representation. So, the idea is to unpack the coefficient of each equation for one monomial. So, we can represent this as just one equation in a big field. So, for example, here we have two equations, the red equation or the blue equation. So, if we create the extension GF4, you multiply the red equation by one, the blue equation by alpha. And if we do the sum, we obtain just one equation, but in the big field. And so, we set a monomial order and mistook the coefficient one by one. So, this representation is already used in 2006 and today we continue to use this in NQSF. So, an example of how we use this representation. So, the verifying process and the public process. So, here we don't need a constant time representation. We just use a variable time representation. So, we set P, the public key representing as one big equation. And we can naturally use quadratic representation of the public key. So, for example, here, the coefficient P23 corresponding to the monomial X2 by X3. And in our example, we want to write P in the vector 1, 0, 1, 0. So, to do this, we begin to initiate an accumulator to the constant of the multivariate quadratic system. Then we look each flow of the matrix. If the variable is one, we know each monomial of this flow is one multiplied by something. So, we look each column. If the variable is one, we have the monomial is one by one equal one. So, in this case, the version is one and we add the coefficient to our current accumulator. But if the coefficient is zero, we know the monomial is one by zero equals zero. So, the version gives zero and we don't add it to our current accumulator. So, we repeat this process. If you have one, we add the coefficient and if you have zero, we don't add the coefficient. Then what happens when the variable is zero for a row? We know how the monomial of this row is zero multiplied by something equals zero. So, the version is zero, we don't look zero and we go directly to the next row. So, and we repeat this process. If it is one, we look each column. If it is one, we add the coefficient and as we have zero, we don't add. And if the variable is zero, we don't look the row and we have finished the example. So, here we have just 25% of the coefficient are not zero. It is normal because the other case, we have one for monomial, it is one by one. So, with this method, we have natural value factor four of speed up for the evaluation. And in MQsoft, we just improve the discretization with the unroll loops and also with the Euclain division just to have a faster access of the variable x1, x2, x3, x4. And we have also special faster discretization in constant time, useful for encryption or for the signing process also. And particularly on Skylake, we have a factor 10% on the state of the art with the specific of x2 sections. And finally, the last most important operation for the signing process is the root signing in jf2 to the n of x. So, how we find the roots? We use the better categorisms of the principle is you have the polynomial x as the power 2 to the n minus x. This polynomial, whole roots are whole amount of jf2 to the n. So, if we take this polynomial and we compute the gcd with f, we have the polynomial g and this polynomial has a minimal degree and contains all roots of f. And when we have this polynomial, we can use an equal degree factorization algorithm, for example, the Kantor-Zasenov algorithm to find whole roots. In practice, to compute directly the gcd is bad because the degree is really high. So, in first, we compute the modernization of the x as the power 2 to the n minus x modulo f and then we compute the gcd to decrease the degree. So, in our case, the first step is the most important step because it is really long. But we are, in our case, with a specific polynomial, we have an hfe polynomial and we have a particular structure. We have quadratic form because for whole monomials, the degree has a aiming rate at most 2. So, it is naturally quadratic form with the number of coefficients is the square of the log of d, with d, the degree of the hfe polynomial. So, for the first step, we can compute this with a repeating scoring algorithm. So, it is really simple. We have x and we compute the square and we do the modernization at each step just to reduce the degree during the algorithm. And in the binary field, we have, when we compute the square, a whole degree term are zero because we have the linearity of the Frobenius-Andomorphism. And then for the modernization, for a random polynomial, we have a complexity big O of the square of d operation. But in our case, f is sparse, so we can replace the complexity d by d by complexity d by the number of coefficients of f. So, by the square of log d operation, in particular multiplication in gf2 to the n. And we want to improve this algorithm. So, we have an id. So, for e, a step in particular, we set the gradient division of the square of xi by f. And we set for f and q a particular split. We split the polynomial in two parts. The lower part are dense polynomial and the higher part have a special particular property. Whole degree term are zero. So, half of the coefficient are zero for this polynomial. So, by definition, we have these properties, the probability one and two. But we have also this third property if we have d and even integer. And we have the id to remove to f just the largest a degree term. And we ask what happens if we split the result. And if we split the result, we have the degree of the lower part decreases, is divided by two. And the higher part, the sizing increases. And it is interesting because the higher part increases and the half of the coefficient are zero. And if you have this for f, you have this for q. And the id, if you remove many terms in q, we decrease the complexity of the modal reduction. So, with the id, we have a theorem. We just remove a parameter f of a degree term in f. And if you remove a whole degree term, for example, we have a factor two in the modal reduction. So, a small example here with a degree 130. So, we remove s terms. We remove, at each time we remove one term, we have the degree of the largest a degree, the largest a degree term is divided by two, at each time. And we mark the speed of, if we remove all terms, is 100%. In particular, if you remove only three terms, for example, you have already a big part of the speed of. You have a 77% for s equal to three, for example. So, it is interesting. Then, we are asking what is the impact on the security because the HGP is a part of the secret key. So, we look the Grumner-Bedil attack. So, the id, we have a complexity against HGP by the scheme in a big o of n of the power omega by the degree of frugarity. So, the degree of frugarity is an tool to analyze the security and omega is the, exponent is the complexity of the product matrix. And we mark in our case, if we don't remove terms, we have a degree of frugarity, it is five. And if you remove terms, it is always five. So, it is okay. But for other degree, for example, if we take big d equal 514 and we remove many terms at the end, we have the degree of frugarity decreases. So, in this case, it is injures for the security and it is not, we must not remove two terms. But if you remove only, for example, three terms, we have already a big part of the speedup and the security is not impacted. So, to conclude for the rose findings, for the performance, we compare NTA and magma M2 soft. So, in mega-stakers, we have 1,000 and 4,000 for NTA. For magma, we have a similar performance. But we remark, when we remove three terms, magma succeeds to improve the algorithm. But we think it is because magma use on a variable time implementation. It looks if a coefficient is zero or not zero during the multiplication. Whereas in N2 soft, we know if the coefficient is zero or not zero. And so, we have a speedup with a constant time implementation for the Frobenius map, the redness current algorithm. And we have also a big speedup between factor seven and factor 13. And for the conclusion, so M2 soft is a really efficient library in CIF. So, it is also as efficient as generic libraries. No, it is most efficient at the general libraries. But for the specific computation, we are similar or faster also. We improve the NIST candidates. And you have a strategy with the parameters S to improve the same process. But we think we need to improve, to study in depth the security against other attack, for example, Minerank or other. Then, in the route finding, we need to use the GCD and at some moment, in constant time. So, we think with the previous talk, we can solve this problem. And finally, we think with a new instruction set, AVEX512 are also with the VPCMU QDQ in the S-flagged processors in the future. It can improve, release the multiplication and the cryptographic operation. So, thank for your attention if you have a question. Okay, so we are a little bit behind schedule. So, if you have any question, please let's take them offline. And now, let's all thank the, all the speakers of this session.