 Hello everyone, thank you for the introduction and so this is a work about building Diffusion layers for well typically block ciphers So we have general criteria for the security of block ciphers Diffusion confusion Diffusion means that every bit of the output must depend on every bit of the plain text and every bit of the key This can be a very simple dependency. So linear functions are usually used and Confusion means that there should be a complex relation between The plain text and the ciphertext So for this we use non-linear operations, which we'll call as boxes and A very typical way to design such a block cipher is the SPN structure Here we have two rounds on an SPN structure. So one in run-round you have a First a confusion layer with small s-boxes applied in parallel then a diffusion layer with a big diffusion matrix applied on the whole state and then a run constant round One key addition and you iterate this as many times as you need to get your security Here we will be focusing on this L diffusion layer So typically we will try to resist differential and linear attacks So here I will mostly focus on differential attacks The idea of a differential attack is well You look at the input x the output y and you look at what happens when you have a difference on the input a What difference on the output B you get and if for some differences a B You have very high probability that this difference a on the input gives a difference B on the output Then you can distinguish your cipher from a random invitation and possibly attack So the probability of such differential attacks depends on the s-box with a certain function of the s-box and the diffusion layer And actually what gives the security against these kind of attacks are active S-boxes what we call active s-boxes are the s-boxes which which when you have a difference on the input a receive a non-zero input difference And the rule then of the diffusion matrix is to make sure that as many as possible S-boxes are active. That's what this formula here gives So if we look a bit closer here We usually consider counting the number of active s-boxes in two successive layers of s-boxes So you have one big and if you made it function which can be seen in two ways either as a big Function on the whole state or as a matrix on words Because you can consider that you have four input and four output words in this case on the size of the s-box And since it's linear if we usually consider it to be a matrix and In this case what we want is to have as many active s-boxes as possible Which means basically that we want as many Input words to be active and as many output words to be active as possible so to have differences This is exactly what the branch number captures for differential attacks and for linear attacks is the same thing We just use the transpose of the matrix So I'm talking about Okay, so here for instance we can have We can have these s-boxes active so here we'll have five active s-boxes in two rounds That's how we count and the best we can do is well having One if we have only one active s-box on the input we cannot have more than all the output words active so this is the best we can get and the Diffusion matrices which reach this optimum are called MDS because they are equivalent to a maximum distance separable codes And So I'm talking about matrices because we represent this as matrices usually matrices on a finite field So we define a primitive element X on F2 to the M and then the words and the coefficients can be considered as Elements of this field so polynomials in X with binary coefficients and We usually represent them too with integers by just setting that X is equal to 2 So we have a binary decomposition of an integer We get these kind of matrices here 4x4 matrix This is the mixed column from the AES and this one is MDS so it's optimal and The good thing is we can characterize which matrices are optimal are MDS or not on this Matrix the idea is that matrix will be MDS if and only if all of its Minors are non-zero the minors being the determinants of the square sub matrices so in particular the determinant of the whole matrix must be non-zero and the Coefficients of the matrix must be non-zero. So for example a binary matrix cannot be MDS because it will have zeros There has been a lot of previous work on this topic not to find MDS matrices because we know some But to try to reduce the cost of these MDS matrices I will not go through all of it, but just a few examples One which is nice. I think is the recursive matrices in which you have a trade-off between the Size of the implementation and the time you need to implement If you don't implement MDS matrix you just implement a small matrix a of which our power i Is MDS and then you iterate i times to get an MDS diffusion layer What has also been done a lot is trying to optimize the coefficients of the matrix What do I mean by this? Well, if you look at this matrix We can imagine that we are going to do a matrix vector multiplication For which we have a fixed number of exhausts and we have to do Multiplications by each of the elements Each of the coefficients of the matrix So if we have coefficients which are lightweight to multiply in the finite field, that's better So one two and three are very nice in this case But since the search space is huge people usually look at structured types of matrices in Subspaces where we think are many MDS matrices and then basically try to Look at all the coefficients and take the best ones What is also doable is to go the other way around And get out of the fine field and so had to have a larger search space and in that case Maybe if we just take that the inputs are binary vectors and them Coefficients are n by n matrices. Maybe we have something less costly than finite field multiplications So the big question here I'm talking about cost what how do we evaluate this cost well The real cost of a diffusion layer would be the number of bitwise exhausts number of operations that we need to implement this layer with the best implementation But we don't really know how to evaluate this. So usually we use estimates The one which has been the most used in the literature is the Exor count Which is a quite naive way to count the number of bitwise exhausts But simply looking at the humming rate of the binary matrix The problem of this is that in this kind of implementations you cannot reuse intermediate values And and there has been some other kinds of metrics considered like local optimizations As I just mentioned trying to consider that the cost is the cost of multiplying the matrix So we have a fixed number of exhausts and then you need to have the cost of multiplying multiplying sorry By each of the coefficients of the matrix and then you try to reduce the cost of each coefficient independently, so it's local optimizations and What has been considered Recently now is a global optimization So there has been another work than ours in parallel done by a team from Bochum. So transit all What they what they did is? They use hardware synthesis tools straight line programming To try to find a good circuits to implement binary matrices We use also a global approach In which the idea is that you look you try to find an implementation of the matrix as a circuit and you try to find This implementation with the least number of operations And in our case we only use operations on words not on the on bits because it's too costly And so if we compare these these kind of metrics and here for these matrix with the the count as a Matrix multiplication we would need six multiplications by two two multiplications by three and six exhausts to perform The matrix multiplication, but actually this circuit here does exactly the same operation But by reusing some intermediate values We'll just need one multiplication by two and five exos so it's is the metric that we will use And actually we will get out of the finite field What we will do is replace this multiplication by two here in the finite field by any linear mapping alpha on F2 to DN and Then we will optimize in two steps first We will try to find a matrix and alpha which is lightweight and then we'll try to find the lightest implementation of this alpha Which gives an MDS matrix? So what we have in the end is not necessarily a field. It's a polynomial ring in general and So basically the the the words and the coefficients are not elements of a finite field They are not polynomials in a primitive of element X. They are polynomials with binary coefficients in alpha But what is nice is that we can characterize exactly which matrices are interesting it which formal matrices in alpha are interesting Which ones can give MDS instantiations? We will call them formally MDS It simply corresponds to once again having all the minors of the matrix, which is a formal matrix non-zero These minors being now polynomials in alpha And we know we can prove that if the minors are non-zero then we can have an MDS instantiation and otherwise it's not possible. So really we miss no one And Then we did so in two steps The rest of the talk would be in two steps first trying to find Lightweight formal MDS matrices and then trying to instantiate them the lightest way possible So our search space is this for the formal matrices We ran a search over small circuits trying to find the the first one which is MDS basically The operations that we use are only word-wise operations So word-wise exhaust this linear mapping alpha, which is undefined and copy operations on words And we have a few registers to sort to store the values We need at least one register per word and we have a few more registers to allow for more complex operations The main idea of this search Well, is that we use a graph-based search with a dextra algorithm So a node of this graph would be a matrix represented by a sequence of operations so a circuit and An edge would be adding an operation to the node So in this case the lightest circuit for an MDS matrix would be the shortest path from the route Which is the identity to an MDS matrix. So it's just a dextra algorithm basically This search is very costly in particular in terms of memory for a three by three matrices It was very fast very easy, but for four by four matrices. It was already very Not that much time-consuming, but memory consuming and for four by five by five matrices. It's just not doable at all And what we get in the end is a collection of MDS matrices formally MDS matrices With a trade-off between the cost of the implementation and the depth of the circuit So the depth being related to how long how much time it takes to perform this diffusion layer So if we look at what it looks like basically, it's like this we spawn a big big graph and then each each edge adjusts as an operation and We did a lot of optimization for this because it's not doable just like that at many levels We did optimizations one of the big ones is rather than using a dextra algorithm. We use an a star approach A star as a guide the dextra in which rather than having for each node its weight, which is the weight from the origin We also add an estimate of how much it would take to get to the objective So here our estimate is how far are we from an MDS matrix? And what we use is simply that if we have a column with a zero in our matrix Then it can be part of an MDS matrix And if we have linearly dependent columns that cannot be part of an MDS matrix together So what we use in the end? as our estimate is The rank of the matrix without the columns which contain zeros So we need at least K minus M a worldwide exhaust to get to an MDS matrix Where K is the number of words and M the rank of this matrix and the result is that it runs much faster with much less memory So using this we get a lot of results like 20 very good MDS formally MDS matrices And now we need to instantiate them So we have formally MDS Metrics matrices in alpha and now we want to find the best instantiation for this alpha Which is the one which will give something MDS and lightweight And the good thing is we can characterize very efficiently if an instantiation will be MDS or not So the basic way to do this would be to just take linear mapping a Evaluate the matrix at point a and see if all the miners are non-singular Which is just a basic definition of an MDS matrix But actually you can start by computing the miners directly on the formal matrices So here I denote them Mij the four more miners in which are polynomials in alpha and then we can evaluate these polynomial polynomials at point a and It is an equivalence For the instantiation to be MDS or for these miners to be non-singular It's not obvious, but it works and Actually, you can go even further than that using the minimal polynomial of the linear mapping a mu of a We have an equivalence The instantiation will be MDS if and only if the minimal polynomial is a prime with all the formal miners So this is very efficient because when we want to instantiate We just have to compute all the four more miners M of Mij once for the matrix with the former matrix And then for each instance that we want to try we just compute its minimal polynomial and see if it's primal or the formal miners and Actually, we can do this just by theory By going back to the finite field. So what we want is to find a linear mapping a Such that it's a minimal polynomial is prime with all the former miners And if we just look at Multiplications in a finite field we can do this if we just take D Greater than the degrees of all the former miners Then if we choose pi and a reducible polynomial of degree D Well, just by construction pi is relatively prime with all the former miners And so if we take for a the companion matrix of pi Well, this will give an MDS instantiation and this a just corresponds to a finite field multiplication and Which what is nice is that we know how to do this at a low cost Because if we pick for pi just a few coefficients For instance a triangle. Well, this can be implemented with just one rotation and one one bit was XOR So almost nothing So that's what we did and that's what we did In the end these are the matrices that we used for a We just use these for matrices It's not obvious that it works for everything but in our case for all of our former matrices Just these four and senses of a were enough to instantiate So at this point we need to fix the size of the words that we are working on only know Do we need to do this or all the rest of the time we had words on f2 to dn now we need to fix n So we chose two sizes which are typical for S boxes four bits and a bits and We for for all of them we instantiated Using a for here, which is does the companion matrix of an irreducible polynomial So exactly as I told you it's a finite field multiplication here. It's it's inverse for a bits We cannot have irreducible polynomials, but it's just the square of the polynomial here and And using this while we get some quite good results If we compare this with the literature we need to compare with two types of results Results from more than say one year ago and results since then because as I said in the beginning There was a team from Bochum so transit al Who who did the same kind of approach for the the global optimization and they get quite good results too but if we compare with what Existed just before this For four by four matrices with eight bit words the best that existed before was one hundred and six bitwise Exhausts from MDS matrix concept I'll found 72 bitwise exhausts and then and the depth of six and we managed to do a bit better than that and actually also have very a variety of possible depths and It's exactly the same thing for four by four matrices on four bit words So these are very lightweight MDS matrices that can be used In lightweight ciphers now and for instance it was used now in one of the next submissions saturn Thank you for your attention. If you have any questions, please. Thank you very much Any questions So you told us that your search algorithms Became infeasible at five by five Is there any way to use your mistakes or a way to reduce the space to still make further progress or? It seems complicated for complicated complicated for MDS matrices We also tried for near MDS matrices and that may be with a lot of work could work on for five by five Mattresses, but I don't think we can find MDS ones using this It seems too much. There's already 2.5 terabytes of RAM for four by four matrices. So it's a lot And five by five is not feasible And if you look at these matrices which you found do they have a special structure because you didn't show us How they look like or yeah, maybe I should have showed a few examples Yes, some of them quite a few actually for the four by four case Have for people who know these kind of stuff have some kind of a generalized five-store structure more or less So maybe maybe there's something to find there, but we didn't understand exactly why Maybe that's just the way we We put them in there in the schemes Okay, if there's no more questions, so let's thank Sebastian