 Hi everyone, Niki Mua here. This is a pre-recorded talk for FSC 2020. The title of this talk is Maximums of the Additive Differential Probability of Exclusive Or. It's a result that came forth from the 2020 Cryptography Summer School. So it was a summer school organized by Novosibirsk State University. The goal of the summer school was to divide the students into groups to work on a particular problem. Remember this from the way that the E-Crypt research retreats or Asian Symmetric key works. The problem that I proposed was one that I've been giving to many students over the past 10 years. Every time that I've been asked to give some problem to study. It's a problem that's very easy to state. Basically, it is come up with a proof for theorem at a paper of FSC 2004. And this is a proof that has been lost. Now, it's not a problem that I expected to be solved because in those 10 years students have found it fun to explore some basic techniques and get a bit of taste for Symmetric key cryptanalysis, but never came up with a solution. So I was really surprised when the team reached out to me about two months after the summer school to let me know that a proof had been found. Now, I'd like to stress this is not a result that is groundbreaking in the way that we could just assume the proof was true and leave it at that, right? But still it does reveal some new insights. And as I'm going to show you there are quite a few other results that follow from the study of this particular problem as I will show. So I'm really really excited to give you this talk. But before I'm going to say more, I thought it would be a good idea to start with a basic overview of differential cryptanalysis. So what's differential cryptanalysis? Well, it's based on the idea that we're going to flip some bits at the input of the cryptographic algorithm. Let's say that it's a block cipher, so then you'd be flipping bits in the plaintext. And then you want to see which outputs that's going to affect, which output bits are going to be flipped. Now, typically for a well-designed cryptographic algorithm, those output bits are only going to be flipped with a certain probability. And if there would be some probability that a particular input difference leads to a particular output difference higher than random, then that is something that is considered to be a weakness because statistics of the cipher should be random, but it can actually often be leveraged to do a more powerful attack as well. For example, to recover the secret key for a block cipher. Now, typically a cipher consists of many rounds, and what we want to do is we want to determine the number of rounds that we can break, because that gives us an idea of the security margin of the cipher. Now, shown on this slide is how it's often easier to make assumptions on the differences after each of the rounds, because this helps us to calculate probabilities. Of course, this is going to be a potential source for errors, because it can be that there is more than one intermediate difference at a particular level that will eventually lead to the same output difference. So that's something that is often called clustering in literature. And it can also be that because we're going to multiply the probabilities of every round, assuming that they are independent, that that's actually not a correct assumption and that there might be an error in computing it in that way. Now, shown here on the slide, what we have is a differential, so a differential is delta x going to delta y, and also a differential trail. So for a trail, we specify not only the input and the output difference, but the intermediate differences after every round as well. Now, we're going to focus on ARX, or addition rotation XOR algorithms. So they involve modular addition, modular 2 to the power n, where n is the word size, typically 32 or 64 bits. The two other operations are bitwise rotation and an exclusive OR. Now, these are ciphers that are extremely fast, especially on low-end platforms that don't have any cryptographic instructions like instructions to compute a round or parts of a round of AES. So if you're looking at, let's say, 32-bit ARM platform, then you can do an addition and an XOR operation on 32-bit words in one clock cycle, and the rotation can actually be done in zero clock cycles, because on this particular platform you can provide the rotation amount for one operant as part of an instruction. This instruction, that case, will be an addition or an XOR instruction. So these are really fast operations, and if you want to analyze the way that differences propagate throughout these operations, one thing we could do is we could have a look at as I'm explaining, XOR differences, which are these bit flips that I was mentioning in differential-captain analysis, they go through the rotation in the XOR with probability one, but going through the addition is going to be with a certain probability that is not necessarily one, and here I'm going to show how such a probability can be computed. So it goes back to a paper of Litma and Moriai in FSC 2001. However, the approach that I'm going to give here is a slightly different one that was given by Litma, Wallon, and Dumas at FSC 2004 to compute the same probability in a different way using multiplication of series of matrices. Now the way it works is we have to specify the differences at the two inputs, so a difference of one will say the input bits are different, a difference of zero, that they are the same at this particular bit position, and then for every particular bit position, so you in the notation here have the most significant bit to the left and the least significant bit to the right, so for every particular i-th bit position, you look at what the differences are in the left operand, the right operand, and output of the addition operation, modular addition operation in this particular location, and then you're going to select one of eight possible matrices that you can have. Now these matrices are multiplied together. You have one matrix for every bit of the word, so for word length 32 you'd have 32 of those multiplications to be done. Then you left multiply by a vector that contains, in this case a row vector, that contains all ones, and you'll right multiply with vector, that contains, in this case column vector, that contains a one in the first position, and the zero in all of the other positions. Now for this particular example that I'm showing here, if you guys are going to do this computation to check, this is a particular input difference that will propagate with probability one-fourth, assuming that the inputs are chosen randomly. So here we're not looking at XOR differences, but we're looking at additive differences. What it means is that if you're going to go with XOR differences through XOR or through rotation operation, differences will propagate with probability one, but not necessarily when you're looking at an addition operation. Here it's the other way around. Probability will be one for any differences going through the addition operation because we're looking at additive differences, but the probability to go through XOR operation or rotation is not necessarily one. So this is something that can be handy to analyze certain ciphers where there are a lot of additions, maybe chaining additions or additions involving the key that are being done to analyze how differences propagate in that case. Now the approach that can be done here is similar. However, the matrices will look a bit differently. Previously, we had matrices of size 2 by 2. Now they are size 8 by 8 and again, the computation of the probability is similar. So what you would do is multiply row vector of all ones L and then a column vector with a one in the first position, zero in the other positions. And then the other matrices that you have depend on the differences in the positions that we are considering. Now, in this particular case of the additive differential probability of exclusive OR, there is symmetry property of those matrices where you can determine every matrix from every other matrix by doing a permutation of rows and columns. Okay, so without further ado, let's look at this theorem 3 of the FSE 2004 paper by Littma, Wallen and Dumas. So what the theorem states is that for all gamma, so gamma is the output difference, if you want to look at the maximum probability, so the value that maximizes the probability of going from input differences alpha and beta in the left and the right operands of the X operation to output difference gamma, then the maximum probability that you can have, maximizing over all alpha and beta, is going to be the same probability that you have to go from left operand difference zero, right operand difference gamma, going to output difference gamma. So what you will see if you're going to look up this FSE 2004 paper is that it says that the proof is omitted from the conference version. However, it turns out, I've spoken about this to Helger Littma, that the proof has been lost. So this is proof that we can seem to recover and that also we tried to come up with with other techniques. When I was looking into this more than 10 years ago in the beginning of my PhD thesis, together with Christophe de Canierre and Veselin Velitschkov, I don't want to say we spent too much time on this, but still it doesn't seem to be something that we could do with any of the existing techniques that we know in literature that we were rediscovering. So let me maybe discuss a few strategies and see how you might want to approach this type of problem. Of course, if the word size is small, you can do an exhaustive search. And I guess maybe for some people that might be enough that you would just say, let's look at all small word sizes, do an exhaustive search there. And if it works for small word sizes, it's also going to work for large word sizes. However, I want to be careful saying that because it does happen in mathematics that there are conjectures that have very large counter examples. So we cannot necessarily say because it works for small word size, it works for any word size. Another would be to have a look at this S-function toolkit that Veselin and Christophe and I worked on. This toolkit does provide a way of sorting output differences by probability. So if you can construct these matrices and the matrices can be bigger for different types of larger components, if somehow you get these matrices you saw in the previous slide, then this S-function toolkit, what it can do is it can give you for specific given input differences, a list of output differences ranked by probability. However, this is not something that gives you an insight into theorem 3 because it doesn't seem that the mechanics of the way that this A-Start search is done gives you something that shows that this theorem is obvious. That seemed to be a way that we were getting stuck. Another thing is you might say, well, what about using some automated techniques? Perhaps we can use that to search for certain types of differential trails. These MILP, or maybe people are saying MilpNow, SET, or other types of automated solvers that you can use. However, in that case, it's definitely clear that you cannot prove properties for any word size. You've fixed word sizes and for those you're going to obtain a result. Let me first start by an observation. A first observation is if you want to compute ADP XOR 0, gamma, gamma, then only two matrices are needed because the matrices, as I explained, they depend on the bits that you have in a particular bit position. In this case, the difference that we have for any position, let's say position i, is going to be either 0, 0, 0, or 0, 1, 1 in this particular case. So with two matrices, we have enough to compute this probability of ADP 0, gamma, gamma. Now, a second observation is that there is actually a property that we can use as a result of this permutation symmetry of the ADP XOR matrices that will turn out to be really important to compute the analysis of different cases and proof that we have in the paper. I've been told that for every good paper on a mathematical subject, there should at least be one proof in the slides. So I'm going to give you here a proof in this particular slide, where you can see by using these permutation matrices, Tk, that will swap indices i and ix or k. These matrices are in voluctions, so Tk times Tk is the identity matrix. You can compute a way of going from the formula on the left side to the formula on the right side, which as I explained, will be really useful to analyze the cases that will pop up in the proof. And then here I can sketch the proof, so it's going to be a proof by induction on n. So to prove this that the maximum over any alpha and beta of alpha beta gamma is going to be the same as going from 0 gamma to gamma, you can rephrase that as saying that whatever probability you compute from going from alpha beta to gamma will never be more than the probability of going from 0 gamma to gamma, and that is something that in the proof by induction, we will assume for a certain value of n. And then given that, we're going to prove that the property also holds when you move from n to n plus 1. So the idea is to take the one bit larger cases and reduce them to the previous case. So if you combine that with the base case of the induction, which would be n equal to 1, where you can just straightforwardly check that the computation is correct, then this allows you to prove the statement for any value of n. Now we don't only want to look at the proof itself, but we also want to see what are those differences that are going to maximize the probability. Now to find those, we need to recall some symmetries. These are some symmetries that are quite well known in literature. It still makes sense to recall them here. So one is related to adding an X-soring value that has a bit in the highest position. Another is related to rearranging the differences. And the third one is related to negating the differences. So going from alpha to minus alpha. In that case, we want to look at the number of maximums, so the distinct number of pairs alpha, beta, so that adp X or alpha, beta, gamma is going to be that particular value that attains the maximum. Then we find that if gamma is equal to either zero or two to the power n minus one, then we have exactly two distinct pairs. If not, then there will be exactly eight distinct pairs. Now you can see that all of those pairs are actually pairs that follow from the symmetries that I showed on the previous slide. And you can actually also see that they are distinct in the case that zero and two to the power minus one are not the values of gamma, but if gamma has those particular values, then some of these pairs collapse to simpler pairs. That's why you have not eight, but two in that case. Now, how do we compute the maximum probability? Well, that's not so difficult because we already had this approach of computing the probability of any differences, so that you can also use that to compute the maximum difference. But we want to see if there might perhaps be a more efficient way of doing this. And there is. So we'll first show how using the S-function toolkit can do a more efficient calculation. And then we're going to look into some recurrence formulas. So the S-function approach does is it says after you've obtained those matrices, you can remove states that are non-accessible and merge states that are indistinguishable. Now, if you're working with the eight different matrices, that's something that doesn't give you a better result, you still get those eight by eight matrices. But here we're only considering a zero and a three only two of the eight matrices. So there will be some states that we can remove in this procedure. And this gives us matrices of size already three by three, instead of eight by eight. We can actually do a modification of this approach to cheat and exclude the most significant bits of the difference gamma. So if we're going to say let's multiply not n matrices, but n minus one matrices, plus one additional one on the left and an additional one on the right, then we can use matrices of size two by two, instead of matrices of size three by three as in the previous slide. This is something that gives an additional insight, because if you look at the structure of these matrices, they allow you to describe the probability of ADP XOR when you're looking at this maximum case. So having zero gamma leading to gamma, then you can describe those in terms of recurrence formula. So the recurrence formula will tell you depending on whether there will be a zero or a one in the first bit position, so the least significant bit position, there are two different formulas that you can use to compute this probability. So as I said, the complexity is the same as the previous method, but it can be useful to compute some theoretical properties. I want to show a few other results as well. You might be interested in the minimum non-zero ADP XOR zero gamma going to gamma. One of the reviewers said that this was a really interesting mathematical result, so I definitely wanted to show this formula on one of the slides. Another thing is that duplicate values of ADP XOR zero gamma gamma are very rare. There are at most 30 times duplicates up to n equal to 32, following from computational experiments that we've done. It seems that duplicates grow linearly with n, so there are approximately two to the power n over n distinct values for ADP zero gamma gamma. And if we're looking at the minimum non-zero ADP XOR for alpha beta going to gamma, that would be 8 times 4 to the power minus n for any n greater or equal to 2. So here I'd like to conclude my presentation. We give the missing proof for theorem 3 of the FSC 2004 paper of Litma, Wallen, and Dumas. This proof that was related to finding the maximum over alpha and beta for ADP XOR alpha beta going to gamma. This is equal to ADP zero gamma gamma. We show that there are either two or eight distinct pairs for which ADP XOR zero gamma gamma attains this maximum value. If we want to compute the maximum, you can do that with measures of size 8 by 8, but actually they can be reduced to size 3 by 3 and even 2 by 2, leading to recurrence formulas that can be used to compute an arbitrary ADP XOR alpha beta going to gamma. We compute minimum non-zero probabilities for the minimum of ADP XOR zero gamma gamma, and we find also the value of gamma that is going to give you the minimum probability. Now here I'd like to conclude my talk. If there are any questions, I'd be happy to answer them by email or during the conference. I hope you enjoyed my talk and I'm looking forward to meeting you at the conference.