 Hello, I hope everyone is safe and thank you for listening to this presentation. My name is Antonio Flérez Gutiérrez and I will talk to you about my joint work with María Naya Plasencia on the subject of key recovery algorithms for linear cryptanalysis. Linear cryptanalysis is one of the fundamental cryptanalysis tools for symmetric primitives and any new proposals are expected to claim security against this type of attack. Over the years, many extensives and improvements on the original idea have been proposed. One of them is the work of Colad and others in 2008. They found a way to use the FFT to reduce the time complexity of some linearly recovery attacks. After that paper, several attacks have been proposed which use this technique, but so far nobody has tried to give a general description. In this work, we introduce a generalized version of the accelerated key recovery algorithm which covers any number of rounds, and takes the key scheduled of the cipher into account. We then use the new algorithm to propose the first attack on the 28th round present. The presentation will be organized as follows. First, I will shortly describe the contribution of Colad and the others. Then I will explain the extended version of the algorithm, including the key scheduled techniques. Finally, I will give an overview of the new attacks on the present. We will begin by revisiting the contribution of Colad and the others. In a simple last round linear key recovery attack, we consider a linear approximation of the cipher after removing the last round. The aim of a key recovery algorithm is giving a large enough collection of plain-text cipher text pairs to guess all possible values for a part of the last round subkey, decrypt the last round for each cipher test and key, and compute the correlation of the linear approximation for each possible case of the key. We expect that the correct guess for the last round subkey will show the largest correlation. If we implement this algorithm, its time complexity is the number of available pairs N times the number of guesses for the part of the last round subkey, which is usually 2 to its number of bits. Can we improve this time complexity? This type of attack needs to compute the correlation of the linear approximation for multiple guesses of the last round subkey. This correlation can be rewritten as a sum of signs over the space of all the available plain-text cipher text pairs. The correlation can in turn be separated into two parts. One doesn't depend on the data and can be pre-computed, and the other is independent of the key guess and can be obtained very quickly from the data. In other words, the vector of correlations for all the key guesses is the product of a matrix C, which only depends on the structure of the cipher, and a vector A, which is obtained from the data. Additionally, the matrix C has a specific structure. Each of its entries is minus 1 to the power of a Boolean function of the exclusive or of the row and column number. This type of matrix diagonalizes in such a way that the change of basis matrices are the Hadamard-Silvestre matrix. Furthermore, the eigenvalue vector can be obtained by multiplying the first column of C by this matrix. The product of a vector by the Hadamard-Silvestre matrix is what is usually called Walsh Transform, for which there is a fast algorithm analogous to the FFT, which has time complexity Bgo of M times 2 to the M. The result from the previous slide provides a decomposition of the matrix C, which in turn gives a way to compute the vector of correlations efficiently. The attack would proceed as follows. First, the adversary looks at each plain cipher text pair and updates the appropriate counter and a vector until all the pairs have been processed. This is called the distillation phase. Then, the attacker proceeds to the so-called analysis phase. First, the FFT is applied to the vector from the previous step. Then, the eigenvalues of C are computed using the FFT. They can also be obtained in advance. The vector is then multiplied element-wise by the eigenvalues of C. Finally, another FFT is applied to the vector in order to obtain a multiple of the correlation vector. The attack returns the largest components of this vector, which provide good guesses for the value of the last round subkey. The time complexity thus becomes Bgo of M for the distillation phase and three FFT for the analysis phase. That is, Bgo of the number of bits of the key guess times 2 to the number of bits. We now proceed to describe our generalization of this algorithm. We considered three possible ways of extending the accelerated attack. For example, we might desire to do key recovery over multiple rounds and not just the last. It would also be interesting to adapt the algorithm to multiple and multi-dimensional attacks. And finally, we should also consider the dependence relationships which are induced by the key schedule and how to reduce the time complexity of the algorithm with them. For now, we will focus on the first possible avenue. We will consider a cipher like the one on the picture, where the plain text is short with the first round subkey and then goes through a small keyed outer cipher, corresponding to the first few rounds. After this, the main inner part of the cipher is applied. At the output, another small outer cipher corresponding to the last round subkey is added to obtain the cipher text. Many ciphers and especially key alternating ciphers adapt to this structure. Like with the previous attacks, we will consider a linear approximation of the inner cipher. We also consider some parts of the subkeys which need to be guessed. K0 and K3 for the external round subkeys and K1 and K2 for the outer ciphers. As in the previous key recovery attacks, the adversary wants to compute the correlation of the linear approximation for all the possible guesses of the subkeys. We will now rewrite the correlations as we did in the previous attack. For a fixed value of K1 and K2, we can rewrite the correlations for all the guesses of K0 and K3 as the product of three matrices. The matrix A can easily be obtained from the data. The matrices BK1 and CK2 only depend on the respective subkeys included in the super index and can be pre-computed. Furthermore, the matrices BK1 and CK2 have the same structure as the matrix C in the previous attack and exhibit the same worst transform-based decomposition. From this formula, we construct the following algorithm. In the distillation phase, we look at the path of the plain text which interacts with K0 and the path of the ciphertext which interacts with K3 and count the number of appearances of each possibility for those. These counters are stored in the matrix A. We then proceed to the analysis phase. First, the fast-force transform is applied on all the rows and columns of A. This is the same as applying the fast-force transform on a vectorized form of A. Separately, we compute the eigenvalues of BK1 for all possible values of K1 and we do the same thing for the matrices CK2 for all values of K2. We now repeat the following steps for all possible values of K1 and K2. We multiply the entries of the transformed version of the matrix A by the appropriate eigenvalues from the previous step. Finally, we apply another set of fast-force transforms on the resulting matrix to obtain the correlation for all guesses of K0 and K3 for the fixed value of K1 and K2 that we have chosen. After repeating these steps for all possible values of K1 and K2, we have obtained all the correlations for all the possible guesses of the sub-keys. The only thing left to do is to search for the largest values and proceed with the attack. The time complexity of this new algorithm is big O of n for the distillation phase. For the analysis phase, it's the number of bits of K0 and K3 times 2 to the total number of bits involved in the key recovery. Apart from the possibility of covering multiple rounds, the new algorithm has the following advantages. First, we note that the distillation phase can be done just once in multiple and multidimensional attacks, thus saving a lot of time in the distillation phase. We also note that the matrix description of the algorithm allows us to separate K1 from K2. This can be useful to improve the efficiency of multiple attacks where approximations serve input or output masks. Finally, the worst transforms can still be seen as purely vectorial operations in order to perform all the optimizations, as we'll see later. Most real ciphers use some sort of key schedule, which means that the sub-keys are not completely independent. So far, our algorithm assumes that all these kivits are independent. Is it possible to use these dependencies in order to reduce the time complexity of the computations? For example, if K1 and K2 are dependent from each other, it is easy to omit any impossible pairs of values from the computation. However, exploiting dependencies involved in K0 and K3 seems harder, as they are involved in the computation of the worst transforms, and we need to compute the correlations for all the guesses at the same time. However, we've been able to introduce two different approaches. While the complexity gains are not dramatic, sometimes they can make an attack faster than exhaustive search, so they are still interesting. The first approach we consider is called pruning the worst transforms. Pruning is a problem that's been studied in the case of the FFT and can be thought of as the construction of efficient algorithms which compute only a subset of the outputs of the transform. We consider a simple visual example to illustrate how this type of algorithm could be used. In the picture, we have bits of the round subkeys which are repeated, bits which have fixed values, and bits of K0 and K3 whose value can be deduced from those of K1 and K2. In the picture, even though there are 24 bits of subkey involved, they can all be deduced from just 14 bits of information. In order to make use of this type of dependencies, we have described a new FFT pruning algorithm which can be used to compute the outputs which lie in an affine subspace. More specifically, if we think of the indices of the outputs as elements of the binary vector space of dimension m and we choose an affine subspace of dimension d, we can compute the outputs in that subspace in 2 to the m plus d times 2 to the d operations instead of m times 2 to the m, which would be the normal Walsh transform. We'd like to point out that in our paper we only describe a more specific version of this algorithm for the case in which some coordinates of the output index are fixed and not for general affine subspaces. The time complexity gain of the new algorithms are particularly significant when d is significantly smaller than m as we essentially achieve a reduction by a factor of m in the time complexity. If we use this pruned algorithm on the last set of Walsh transforms of the key recovery attack, which are indeed the bottleneck of the algorithm, we can obtain moderate but noticeable reductions in time complexity and great reductions in memory complexity. We have also considered a different approach for multiple linear attacks. It is possible that different approximations require different parts of the subkeys, as shown in the picture. We have a different set of bits for each approximation and they can all be deduced from some bits of information about the master key, which we'll denote by kt. In this case, we can use the previous algorithm separately on each approximation considering only the bits of key which are completely essential and then we can combine the information from all the approximations into the multiple linear cryptanalysis statistic. The resulting algorithm is like this. The distillation phase is performed separately for each approximation, or more specifically, for each masking of the plaintext and ciphertext. The analysis phase begins by using the generalized algorithm separately for each approximation, considering only the essential bits of subkey in each case. Then, for each case of the master key, we compute the key squared multiple linear cryptanalysis statistic from the tables obtained in the previous step. The time complexity of this version of the algorithm is shown on screen. We have now reached the last part of the presentation, where we will give a very short overview of our attacks on the block cipher present. As a reminder, present is a lightweight block cipher, which was made an ISO standard in 2012, which has a 64-bit block and 80 or 128-bit keys. It consists of 31 rounds which have the following steps. First, the exclusive or addition of the round subkey. Then, the application of 16 identical 4-bit S-boxes. And finally, a fixed-bit permutation of the state. This is a drawing of a present round. The present test box has 8 linear approximations with masks of Hammingweight 1. Because the linear layer is a bit permutation, this means there are linear approximations whose linear holes contain many linear trails comprised of these 1-bit test box approximations. There have been many linear attacks which make use of these strong linear holes, which so far have been effective on up to 27 rounds, and are the best known attacks of any kind non-present. For our attacks, we have looked at all linear trails with up to 2 active S-boxes in each round, and have selected some approximations which have a strong ELB, with this subset of trails. More specifically, we have designed 3 different multiple linear distinguishers, which have different properties. For all these distinguishers, we consider a key recovery attack with 2 rounds at the beginning of the scythe and 2 rounds at the end. Distinguisher number 1 is the most lightweight, and has 128 linear approximations with masks of Hammingweight 1. It has the smallest capacity of the 3, but also the smallest cost for the key recovery. This is why we used it on our 26 and 27 round attacks. Distinguisher 2 has 296 approximations, some of which have input masks of Hammingweight 2. As a result, its capacity is slightly larger, and the cost of the key recovery is not too large. We used this set in our attack on 28 rounds present 80. Finally, Distinguisher 3 has the largest capacity of the 3, with 448 approximations with masks of Hammingweight 1 or 2 in both the input and the output. It also means that it has the most expensive key recovery, so it can only be used in the 128-bit key variant. We used this Distinguisher for an attack on 28 round present 128. Our theoretical predictions of the properties of these Distinguishers were evaluated experimentally, by simulating the key recovery attacks on 10 and 12 rounds present. That means we tested the properties for 6 and 8 round versions of the Distinguishers. We were happy to find that the experimental results closely follow the predictions of the statistical models of linear-kept analysis. So we are fairly confident that our predictions for the larger number of rounds will still hold. As an example, this picture illustrates the key recovery for the attack on 28 round present 80 using Distinguisher 2. The crosses mark all the sub-key bits which are needed in the attack. All these bits can be deduced from the 73 bits which have been highlighted in red. Because of the Hammingweight of the masks of the approximations, for each one we only need up to 32 bits of the first round sub-key, up to 8 of the second, 4 of the penultimate one and 16 of the last. By using both of our key schedule techniques, we can bring the time complexity below that of exhaustive search, as we'll see in the next slide. Considering all this, the attack can be summarized as follows. In the distillation phase, we use the whole code book to construct 8 tables corresponding to different parts of the plain text and ciphertext. Then, we compute the correlation for all the approximations for all the guesses of the corresponding part of the key using the prune-fast-wash-transform algorithm. We obtain 228 lists which can actually be merged into 32 corresponding to different parts of the key. For each possibility of the 73 red bits of key, we compute the rest of the sub-keys with the key schedule and compute the multiple linear kept analysis statistic with the tables from the previous step. We keep the key candidates with the largest values. Finally, we search exhaustively for the remaining 7 bits of key in all the key candidates from the previous step until the correct full key is found. There is a probability of 95% that less than 2 to the 77.4 encryptions are performed. We should note that with the naive implementation of key recovery, this attack would have a much larger time complexity than exhaustive search. This table compares our new attacks with the best previous attacks on present. In short, we have provided new attacks on 26 and 27 rounds with smaller data and time complexity, as well as the very first attacks on the 28 round version of the cipher. With this, we reach the conclusion of this talk. In our opinion, the main contribution of this work is the new generalized algorithm for accelerated key recovery linear attacks which covers multiple rounds. We have also introduced two techniques to make this algorithm more compatible with the sort of dependence relationships introduced by the kill schedule. One of them features a novel pruned walls transform algorithm for affaying subspaces of the output. With this renewed toolbox, we are able to construct the best known attacks on the block cipher present. We think that these new algorithms will not just allow to speed up the time complexity of existing attacks, but permit attacks with better linear distinguishers which would be impossible without acceleration, and therefore also reduce the data complexity and sometimes even increase the number of rounds. However, there are still many avenues for improvement. First, we would like to apply the new algorithms to other ciphers. For example, we are currently working on attacks on reduced round no-account. More theoretically, we'd like to study the case in which the matrix A is very sparse, which for example happens when our key guesses cover more than half of the plain text and the cipher text. Can this situation be used to speed up the walls transforms? We'd also like to develop automatic tools which compute the cost of an efficient key recovery attack given a linear distinguisher of a block cipher. Another much more ambitious task would be to create software which finds optimal linear characteristics not just in terms of the linear properties, but also the cost of the key recovery. Finally, because the focus of our team is quantum cryptanalysis, we'd like to explore the possibility of efficient implementations of linear attacks in the quantum setting. So, this concludes the presentation. Thank you very much for listening.