 So first of all, story in advance in case I do something clumsy with my microphone but I'm Italian so in my hands for talking this is really difficult. So in this talk I'm going to give some improvement about inner product masking skins. So first of all let's move some context. So we're in the truck of such a general task. So the task I'm going to talk about happens when we implement in a hardware sound cryptographic algorithm. Indeed we can have some legal about sound, day, pleasure, timing, and the most useful is actually power consumption. And all these legalities can give some sensitive information about the values. So the security model that formalized the agenda tax is the T-proven model, the one that we are going to consider. We're in the T-proven model and we're studying in access after three days in a circuit. So usually a circuit looks like something like this. So in these blocks you can see we have some operations. For example we have some linear operations or some non-linear operations like the multiplication. And so in order to guarantee privacy in the T-proven model we have to guarantee the existence of a simulator which can simulate the adversaries view without actually knowing the secret. And a stronger property is that T-strongware interferes, TSNI, which I don't give the video as a body but you just have to remember that it's a property which guarantees a good composite behavior. So guarantee that if a gadget, if the output of a gadget is actually input of another gadget, this can be always a secure composite if both gadgets are TSNI. So an important countermeasure against the channel of tax is the masking. So what does I mean by masking? So this means to split every sensitive variable in the circuit into n random shares such that if the attacker knows up to T of these random shares then he cannot figure out entirely the secret. So this is the circuit. So a circuit would look like this. We will have an encoder which will speed the cigarette as seen in some shares, this one is n. And then we will have the circuit where we will have the operation we saw before and the non-linear operations especially we have some problems in order to guarantee the, to hide the secret. And then at the end we will have a decoder which will recompose the output. So the current studies are about masking scheme are trying to find a good trade-off between high performance and less evidence of leakage. And there are different masking scheme that have been proposed so far. One of the most common is the William Masking that is vastly used and really efficient. And what we are going to study the inner product masking in this talk because it actually shows less evidence of leakage but so far it was really inefficient so not really used in practice. So in particular I will give you a new multiplication scheme which is the design. Let's show, so less execution, with less execution time and with less randomness complexity, randomness requirements and then we will see an application in the S box of the AES some new implementation results and more detailed information to the evaluation of the inner product and the encoding. So first of all I give you some more detailed context. So what is a William Masking? So if we have a C with S, so a William Masking in the William Masking encoding we will consist in taking some N minus 1 random values, S1, SM and to assign to S1, S2, SM sorry, to S1 the sum of the secret plus the sum of all the other shares. So it's kind of intuitive to see that in order to decode the secret now we just have to add all the shares. And a really famous multiplication scheme is the SW that was in 2003 by Isha Isha Wagner. So a multiplication scheme now needs to overwrite on some vector. So because now the inputs would be NMB for example, the first share in the 1AN would be 1BN. And the output C needs to guarantee that the product would be between the sum of the shares AI and the sum of the shares AB if the sum of the shares of C. So now high-forge says W, so we have a magic T that is composed by the inner products between A and B, so AI vj. And on the other end we have a magic of random values. So in particular all these random values so this means that every random value has to appear two times and one of the two times should be the opposite of the other one. Yeah, we will see why this is useful. And then we sum these two matrices and we have the sum of B and the sum of the components on each row, each column will give the components of the output C. So now we can see why we need these particular random bits because in order to guarantee correctness we need that the sum of these output shares will give us actually this sum. And in this way now we can cancel out of this randomness and we have correctness. So the inner product masking is a lot more complicated. Indeed we have a set of phase at the beginning of execution of the circuit which will fix a vector L and this vector will be fixed for all the execution of the algorithm. So this vector is represented by L2 and Ln random values, known as zeros, and we're assigned to the first value L1, L1 and in order to encode a secret S now we take S2, Sn, random values and we're assigned to S1, the secret S plus the sum of the products of the shares L, I, S, I. So now we're coding, we're calculating the inner product between the vector L and the vector of the shares S. And again it's easy to see that this gives the secret S. So I will now show you our new multiplication scheme and I will explain why it's actually in improvements compared with previous works. So again the multiplication will take me to Tmd. So we actually take a lot of inspiration by the Sw that I showed you before. We did again a matrix T of inner products and a matrix U this time of random values. But this time we have to take into account the presence of this shares Li in order to guarantee correctness. So the matrix of the inner product will contain Ai, Bj and Lj. And as for the random values, again we have some random values appearing two times considering an encoding of zero and all these random values has to be multiplied by Li the inverse of Li. I will show you later why this actually guarantees correctness. And then for the rest it's really... I'm sorry. For the rest it's really similar to the Sw of B4 so we sum T and Q. And again the sum of the components of B even says the components of C. So correctness. So I'm not going to be on this formula, don't worry. But what we have to see is now in order to show correctness is now that the inner product between L and C is equal to the product between the inner product between L and A and the one between L and B. So by just simply substituting the components here, at some point we see why we need this Li to the inverse of Li because now thanks to this here we cancel out some values and we will obtain the sum of the UiJs and since we do all of them one the opposite of the other one, this is just zero and here we just have the product of the inner products. So we have correctness. It's just algebraic calculations. And we show this scheme to be TfNi so to have good composability properties and it needs only T plus 1 shares and it's really the optimal amount of shares needed. And all schemes so far are magic T-proven security so it didn't give any guarantee about composability and it needs 2 T plus 1 shares. So we can see here we are improving. So let's see now so how to apply these schemes to the S-box of the NS so we have to consider the exponentization to the power 254 and we take this algorithm from a given proof. So we see from this chain that we need some multiplication of these green blocks and then we need some squaring that are in our operations and some refreshing. So now for completeness I can show you the refreshing and the squaring but actually this algorithm is already existing already in previous works but actually we are also going to get more precise proofs for security. So this squaring so it's pretty easy because it's just a component-wise operation and it doesn't need any randomness since it's linear so we just need to square all the components of the input and multiply them with L1 and Ln. That's it. So the refreshing is a bit more complicated so we are definitely refreshing. So we have an input text and we have to give an output which has the same decoding so Ly is equal to Lx but internally we will add some randomness and this will make the masking independent of each other. Indeed the refreshing is usually used in order to grant the independence of our use. So we have a first step in which N times we assign a vector L and R which is a different encoding of 0 every time. So encoding of 0 this means to take again R2 and random values and just assign to R1 the sum of Ri and i and then in the second step we have to add recursively this Rj to X so for step we just assign X the vector with the input and then we have X plus R1 then X plus R2 and so on and so forth and again the output will be X plus R1 plus R2 plus Rn. So this means that every component assigned now has been added to N different random bits and this is actually the power of this scheme it's why actually it's really simple it's a U and so it's also TSNI again So now let's look a bit more in detail about the J so here is where actually it's used the reflection and yeah so as I said before the reflection is needed in order to grant the independence indeed before actually these values are injected in R that are dependent these red lines that you see here but the multiplication our P multiplication needs to be, needs to receive independent inputs so it needs the inputs before pass through a reflection in order to be independent at this point but so you saw before that this reflection needs a lot of randomness so it's really expensive and in order to optimise the amount of randomness needed we design a new multiplication scheme for dependent inputs and we go in IP mode too so this multiplication scheme can take two dependent inputs A and J where G is a linear function and of course we want this scheme to use less randomness than this case so the the trick in order to grant it is to internally refreshing one of the two inputs so first of all we refresh internally this A so we add AI to some random values E, J and we obtain a magic A prime so as before we have to calculate now a magic T of inner products but this time one of the two members that we go to multiply is not AI so it's not one of the inputs anymore but this will refresh components of A prime and then we do exactly as before we go into some random components they find exactly as before and then we add a vector V which actually just guarantees this is not needed in order to guarantee security because here when we did this calculation we add some new members that will not guarantee that again we will not cancel out afterwards and then again we have a magic V that is the sum of all these vectors and all these matrices and one of the components of the columns is the components of the output shares so now we can substitute the first multiplication scheme with a new multiplication scheme and we can eliminate here the refreshing that was before and we show this scheme again to be secure and composable using T plus one shares of the input and in general all these books are S and I this is really important because of course in the AS and this is composable with other rounds so some comparison so this is the new case where we use our optimized multiplication and this is what we had before so a multiplication scheme and the refreshing and here we have a table of comparison between the number of addition multiplication and randomness so this is the amount of we see here in white in green you see this case and here in orange you see the case with multiplication and refreshing and we are pretty the same with addition and multiplication more or less but with randomness we save a factor of n in the randomness because in it before the refreshing needs T squared and square randomness in addition internally in this A we just need n randomness so let's go now for look at the performance evaluations so here you can see an implementation of the S-box compared with Boolean masking so in orange and in red you see the Boolean masking with two different addition shades and with green and blue you see the implementation with our IP masking with the two different multiplication as I showed before so there's no reason surprising to see that actually our scheme is lower and the reason is in the presence of this L that actually gives a lot of more computation as before on the other end it's also true that the most practical cases have the one with low n small n so we need to show you an optimization for small n so what we did we tabulated the multiplication with L i and yeah this is possible only with small n that would be small n there wouldn't be practical and here you see comparison of timing and memory so here in green we have the IP masking again with just the multiplication using the same one and here we have the Boolean masking so you can see that now we still of course have an overhead that is smaller than before so fair enough so we gave also a deeper information journey evaluation of the IP encoding we consider the case of n equal to 2 we consider encoding of a as a1 plus l2 times a2 so here you see the case of linear leakage so where the leakage is represented by the humming weight of the shares plus noise that is usually random variables and here you see how the middle information between the leakage and the value behave according to the noise variance so in this black line you see the a so the value a that is not masked and then in blue you see the one with Boolean masking and then with red we see the more IP masking and on purpose we actually try this IP masking with different values of n because we saw before every time we have to so in the new execution of a new algorithm of a new circuit sorry we have different n and interestingly we saw how by by tanning this n we have different kind of values but more interesting we can see that actually our scheme leaks less than the Boolean masking so we study also a bit more the transition-based leakages so what does it mean to see transition-based leakages so this happens when an adversary receives not only the noisy version of the shares but also the distance and when does it happen when some registers are used consecutively in order to store two different shares of the same value so for Boolean masking even I'm calling a of a 1 plus a 2 so whether the humming weight of the sum will give us the humming weight of a this will vanish actually the effect of the of the masking on the other end our IP masking so in our IP masking so this L will make still the distance of the humming weight uniformly distributed so we don't see any evidence of leakages in transition-based leakages and for concluding I will show you some empirical such general leakage evaluation that actually gives confirmation of what we just saw in practice so so here you can see the comparison T score of the Boolean masking so the T test of the Boolean masking on the IP masking so in red is the threshold of the T score so in practice this means that when these lines in grey go out this means that our variable is leaking and yeah it's really evident that our IP masking show less evidence of leakage so for concluding just remark again that we saw some new IP multiplication scheme with good composability properties and better improved performance about way improved performance and yeah we can say that actually IP encoding represents an interesting alternative to Boolean masking with just a slight performance of the head over small N compensating by less evidence of leakage in practice thanks if I understand correctly the only condition in matrix U was that the sum of all the N squared elements would be zero and you achieve it by having two identical values one is plus and one is minus but actually you can achieve it without this condition if you just choose a matrix where the sum of all the elements happens to be zero without pairs which are the plus and minus of each other so is there any reason why you put the extra condition that should be pairs with plus and minus why not use a general matrix with sum of m to s to zero to gain anything from some m to s to zero yes sum of all the n squared m to s in U is zero this is the only condition you ever use yeah but if some of these m to s are zero then I think you can actually you should not write then on the components inside so because it is U I use them to write during the computation so I don't know we will have A, I, B, J plus U so if this U is at this constant zero you are not hiding any more than your products no not all the values are individually zero but sum of all the elements so you achieve it by having pairs which are equal I'm saying why is it why do you put this restriction not use a general matrix sum of all the elements in the matrix happens to be zero so I think this is a thing possible we just do this because it's more practical but yeah, good point of course