 Welcome. This talk presents Truncated Differential Distinguishers on Reduced on the AS. My name is Eiklist and this has been joint work with Chen Chen Bao and Chen Gu from NTU Singapore. The sum of two independent permutations is a simple construction to turn permutations into a pseudo random function, where k independent permutations are sampled uniformly at random and the outputs are simply summed. So the goal of a distinguisher here is to distinguish the outputs from this construction from that of a random function. In the context of proven security, there has been a lot of results. For instance, the results by Bellare or Paliazzo in 99 on the sum of two permutations or already generalizing the work by looks in 2000. However, there has been a lot more analysis that considered such sums with the conclusion that this construction is close to optimal indistinguishable from a PRF for up to 2-2 end queries, which has been the result of many recent work for instance by Menning and Nevis in 2017 with mirror theory or dyadal with the chi-square method and pategeria in Nandi. Even more, it's even indifferential from a PRF close to 2-2 end queries, so close to the full codebook. From a proven security perspective, the interest quite ends here. One knows that this construction can no longer be secure once a distinguisher has collected 2-2 end minus the last query. So clearly, when he has all about the last query, a distinguisher can simply build a sum and then predict what the last value must have been. Noteworthy work by Paterin went beyond this point. He asked himself, what happens if your responses are random? In that case, the simple sum distinguisher does no longer work in other distinguishing approaches are needed, which motivated his study, and he went on to find other attacks. To exceed this limit of 2-2 end queries, Paterin modeled the setting that a distinguisher has access to multiple such sum of permutation instances. That distinguisher has access to G such construction and can query up to 2-2 end queries on each. The approach he used in his distinguishers then was to counter the number of collisions that occurred from pairs of distinct inputs. He found that between the sum of K permutations and a truly random function, the mean and the standard deviations of those number of collisions differed slightly. So he obtained new distinguishers given sufficiently many queries. Here, this is illustrated just as an example for Q2-8 queries per experiment and over 8-bit permutations. What you can see from the results is that for the sum of two PAPs, the expectation is close to 2-2 end half. This is 128 in this example. While for a random function, the sum is close to 2-2 end minus 1 half, which is 127.5 in this tiny example. Given enough queries, one can distinguish those constructions different. Considered K permutations and found a general formula which yields the probability of a collision as 1 by 2 to n plus or minus some term that depends on the number of permutations. The question you could ask here, what does this have to do with differentials or with the AES? In fact, it has implications for integral attacks. It has implications for integral attacks. We had the following observation independently, but during our work, we found that Janet All in their 2015 paper had a similar observation earlier in a slightly different context. So they deserve the credits for that. Their core observation was that an integral distinguisher usually ends with some linear operation. It starts with the propagation of some all subsets, which mean all values in a certain subset, and their contexts were substitution permutation networks. So in these components, a set of texts iterated over all values. At the end of an integral distinguisher, a linear layer adds these components and destroys these all properties, but still transforms it into some other balanced property or zero sum property. This means if all values in a subset or subspace are used, they will sum to zero. After linear layer, the sum will be preserved. However, the next non-linear layer will destroy this property and the integral distinguisher will end. What they observed is that the linear layer in many permutatives behaves similar to a sum of permutations. For instance, an affine layer consists of a matrix multiplication with, for instance, some circular matrix and some addition. This matrix multiplication is de facto the sum of the individual components of the input vector. This means if we have components that are summed in the linear layer that iterate over all values, one could potentially approximate this by the sum of independent permutations. The distribution of the number of collisions will then be preserved by the subsequent non-linear layer simply because equal inputs to the non-linear layer will produce equal outputs. Their context was a little different from ours. Since Janet Al focused on type 2 and Nyberg type Faisal networks with forward S-boxes, our target in mind was the well-known integral distinguisher on three-round AS. As most of you know, having a dataset of plane texts that are constant in all but one value and iterate over all values in some cell will yield a three-round integral distinguisher, where after two and a half rounds before the last mixed columns operation, all cells in a state will iterate over all values. The mixed columns operation in round 3 is then a matrix that sums four cells each, which can be approximated maybe by a sum of permutations. As a consequence, the distribution of collisions is then maintained through the S-box in a subsequent round. We will extend the standard following and note that since mixed column is a linear operation, we can invert it in the last round and simply neglect the last such operation. Of course, our work is not the only one that studied distinguishes on five and more rounds of AS recently. The hope in such round reduces distinguishes was to escape from the previous local maximum of attacks that brought computational improvements, but got somehow stuck at about seven rounds of AS128. The hope was to find better attacks in the long run by discovering different strategies. The first such results were key dependent, like the key dependent integral by zon et al. However, this was the start of a series of research. Some cornerstone works were made by Kassi et al, who introduced for instance multiple of N distinguishers and mixture differentials. Those led not only to the best round reduced attacks on five rounds, but also motivated the current best distinguishers on up to six rounds by mixture attacks by Bade and Wernjo. There is also a similar work to ours by Kassi et al, who also considered truncated differential distinguishers and also the variants for cryptanalysis of round reduced AS, both with small biases. So in sum, this all is an interesting topic, but many things are still in the dark and require more research. In our work, we adopt the statistical framework by Kassi. When we have two probabilities p-rand for some collision probability of a random function, and for instance p-as, sometimes we use a similar one, for the collision probability of the AS, where sigma AS and sigma-rand are then the standard deviations of the random function and the AS accordingly. Given this statistical framework, we can say that for success probability of distinguishing p-as, the number of experiments, so the number of pairs we need to collect must satisfy the following formula, where this is the inverse error function. Let's say we can approximate mixed columns at the end of round three as the sum of four independent permutations. Then, Paterin's formula gives us a collision probability of about two to minus eight, plus an additional term of about two to minus 32. Whereas for a random truncated permutation, the probability for a byte collision is roughly two to minus eight. If we use enough pairs, the small addent of two to minus 32 will become distinguishable. Given the formula from before, for a distinguishing success probability of 95% or higher, we will need somewhat two to the 58.4 pairs. If we take datasets of two to the eight texts that can be combined to about two to 15 pairs, this means we need roughly two to the 51.4 chosen plain text. This can be optimized, of course, but we focus on the concept here. While the computational complexity of two to the 51 is already somewhat feasible, it's still prohibitive to carry out experiments. We considered the minified variant of the AS, small areas, which had four bit cells instead of eight bit cells, as the real AS, by Citadel from their 2005 paper. For this reduced construction, we use the same formula to compute the collision probability for the real small AS of the collision after four rounds. And the collision probability for a truncated random permutation where we also compared in four bits at the end. Again, for a success probability of 0.95, here we needed now about two to the 30 pairs, so two to the 27 chosen plain texts. And this could be implemented. We implemented our distinguisher on four rounds for small AS and used full spec 64 with 96 bit key as a random permutation. We used 100 random keys and random datasets and see that for two to a 23 datasets, both distributions can be distinguished well. The distinguisher works better than expected, while the truncated random permutation behaves close to the expectation with 62 million 950 versus 940. Small AS hits many more collisions in a single cell after four rounds than we would project it. We come to that later. From that point, we could extend the four round distinguisher to five rounds. Instead of considering a single cell after four rounds, we can consider four collisions in a diagonal at a time that leads to an inactive anti-diagonal after five rounds without the final mixed columns operation. For the real AS, the probability would become two to minus 32 plus an additional term in the order of two to minus 54, whereas for a random truncated permutation, it would be roughly two to minus 32. If we consider any anti-diagonal to be inactive, we get about an addent of two to minus 52 for the reconstruction, whereas it would be about minus two to minus 61.4 for a random truncated permutation. Using the same formulas, we obtain for a success probability of again about 0.95 about two to 76.4 pairs. We could somewhat optimize this if we use structures. For instance, structures that contain pairs from a column or a diagonal and would have two to 32 texts each, then we could form four times two to 24 times two to eight by two pairs from such a structure. We would need two to 36 structures of that and would get about two to 77 pairs. The memory of such a distinguisher would be dominated by storing two to 32 states in some area or hash table and having four lists at the end for the inverse columns of decipher texts. The time would be then dominated by having two to 73 memory accesses and two to 68 encryptions. This distinguisher is not feasible. However, to verify our results, we conducted a similar experiment with the small a as variant. For that, the probability for having at least one inactive anti-diagonal after five rounds was two to minus 14 plus an addent in the order of two to minus 24. Whereas for truncated trend implementation with four bit cells, it would be marginally lower than two to minus 14. Again, to have a success probability of 0.95 or higher, we would need here about any order of two to 36 pairs. So this was feasible and we could implement this similar as before. We took again small a as and here in red and spec 64 here shown in blue has absolute random permutation. Both distributions are well distinguishable with enough data sets. The random world again matches the expectation quite well. However, again, the results for small a as slightly exceeded the predicted theory. We conducted a similar experiment with a variant of the small a as where we only replace the s box by that of present and found that the result differed greatly. In this small setting, we found that there are considerable dependencies of the s box. This is given and investigated a little more in our paper, but shall be just illustrated here for the moment. While our distinguishers are somewhat inferior to those by Krasi and Reschberger, they have one significant advantage. Since they start from a single active byte, one can easily prepend a round. So we could derive a six round key recovery from our five round distinguisher by guessing four bytes of a diagonal of the initial round key. More details can then be found in the paper. Here we considered the statistical frameworks by Seljuk, as well as the updated framework by Samaida and Saka in their paper. Optimal configurations of the advantages in bits A and the computational complexity are then at around two to seven T Josen plain text in Seljuk's model and two to seventy seven point five encryptions and at about two to seventy one point three Josen plain text and about two to seventy eight point seven encryptions in Samaida and Saka's model. To have a proof of concept, we implemented our key recovery attack again for the small version of the AS the four bit s boxes and tried to recover the value of the first diagonal of the initial round key and used to the 15 to the 16 and to the 17 structures to have a proof of concept and showed how many times the correct key was among the top 20 up to the top 100 keys and found that for two to 15 structures it was half of the time among the top 100 keys for two to 16 structures about 92 times and for two to 70 structures always in the top 100 actually in the top 30. The interesting point now has been a simple idea for our five round distinction we started from a single active cell for a key recovery on six rounds we had guessed the four key cells of a diagonal in the first round the idea was now to ask what happens if you consider all pairs in such a diagonal a fraction of correct pairs we have a single active cell after the first round for those the collision probability after six rounds will have the same bias as in our five round distinguisher the remaining pairs hopefully should behave effectively close to the probability of a truncated random permutation if they would not it would have a second distinguisher for those however if the collision probability is sufficiently close to that of a truncated random permutation the bias from the good pairs would still be detectable and this bias should be much smaller but maybe still exploitable again we consider any inactive diagonal after almost six rounds for a truncated random permutation this holds with probability roughly 2 to the minus 30 whereas for six rounds of AS we would get an additional addent of the order 2 to the minus 74 here we neglect of course much smaller tools this means the difference would be tidy about 2 to the minus 74 however if we use the same formula as before for success probability around 0.95 we would need roughly 2 to 120.5 pairs for this distinguisher using diagonal structures of 2 to the 32 texts with roughly 2 to 63 pairs we would need about 2 to 57 structures with 2 to the 89.5 chosen plain texts it would be great to have some kind of verification for that if we scale this approach down for the small AS again we would get a probability for a random truncated permutation about 2 to the minus 14 whereas for six rounds of small AS we get an additional term in the order of 2 to the minus 34 here we would need about 2 to the 56.2 pairs which means when we start from these structures we would need about 2 to the 41.2 chosen plain texts per experiment and this was practically enough we implemented this again with small AS and with full round of respect 64 as a pseudo random permutation we used 100 experiments of random keys and structures and found with sufficiently many data we can distinguish both distributions quite well similar to theory the pseudo random outcome is again very close to the expected value the real small AS overshoots or expected value note that these standard deviation values are for one structure whereas each experiment consists of about 2 to 25 structures so the difference is significant besides patterns some of permutation we employed two further approaches to verify our theoretical probabilities once we used the proof method following the footsteps by Krasion Reschberger which had two assumptions that the S box was ideal and that any combination of input and output positions is equally successful in our distinguishes second we employed Röhnjum's truncated differential propagation matrices which yielded the same probabilities as in our approach of using the sum of permutations in sum we observed the same probabilities for all three approaches nevertheless they did not completely model the real word setting we tried to refine our analysis in our paper and considered two kinds of parameters first index dependencies which means at which position we used an active input cell in the plain text and which output cells were considered second we considered effects of using different S boxes in small AS those results will be presented at the end and can be found in our paper to sum up we started from patterns sum of permutations and approximated the mix columns operation at the end of the three round integral distinguisher as a sum of four independent permutations we derived a forward truncated differential distinguisher extended it to five rounds by considering multiple output cells and showed a key recovery on six rounds moreover we exemplified that we also could consider all pairs in a diagonal structure and could mount a distinguisher on six rounds with some tiny advantage but which seemed to work we cross verified our theoretical distinguishers with two further approaches first the proof by krasien reschberger under some assumptions and second the differential propagation matrices by ronio all our experiments were tried to be implemented with a small version of the AS the code to which is publicly available moreover for the six round distinguisher we use the NIST randomness beacons as pseudo random keys these values are public and you should get hopefully exactly our results when you consider also the same keys moreover we proposed a natural extension of our five round distinguisher to a six round key recovery and implemented a proof of concept again with the small AS we see that small bias distinguishers can become useful and would like to explicitly recommend the equally rigorous paper by krasien reschberger to you and note that we built upon their statistics and proof methods we found interesting dependencies and deviations from our theoretical probabilities by varying the input and output indices of which cells were interesting for our distinguisher and by varying the S boxes and small AS we could somehow also follow or substantiate the claim that krasien reschberger already put up that the more uniform an S box is the better it is the lower the deviations from theory precise reasons behind that are still unclear we have only indications however we have also indications that the deviations that you saw for instance for the present S box are mostly due to the small size of the small AS and we found much smaller deviation when we consider the equation systems for our four round distinguisher for the real AS so far to say thank you for your attention there is a significant deviation from the theory and our analysis we tried to consider two kinds of parameters as said before one is the influence of the active input cells and the concerned collisions in the output cells so the positions of which cells are relevant and the second parameter would be what happens if we replace the S box in small AS or in the real AS with a different one first let us consider how different combinations of input and output indices affect our distinguishes so here i in considers the position of the input cell i out the position of the output cell and just as a comparison metric we tried to plot them in the following in multiples of the deviation from the real experience to the random permutation what do our indices mean for instance i in of one would mean that this would be our active input cell and i out of zero would be that we consider collisions in the zero of output cell we found that we could construct an equation system that differs in the factors depending on the positions for four rounds this is also feasible to compute and we get four terms per output cell for example here for the first input cell and the first output cell of course the equations differ naturally for different in and output positions this was feasible for four rounds as said and we could plot this for all combinations this is illustrated in multiples of the difference between number of collisions for the truncated random permutation and the experiments found a value of zero would mean that there is no distinguisher whereas a value of one would be the distinguisher works as expected for any value that exceeds plus minus one we would get an even better distinguisher what you can see here is that in theory given our equation system there are huge deviations and range from somewhat zero to plus seven times the difference that we expected this means that most input output cell combinations are actually better than expected so give stronger distinguishers however not in the case of using the zero of input cell and the zero of output cell we also consider the theory for the eight bit a is here we observed that the deviations are considerably lower than for the small a is they range from about one to one point thirty five times the expected difference which means that any combination of input and output cell works well we could interpret that the size of the four bit s box and a little number of only four rounds produces side effects that cannot be covered by the simply sum of permutations theory a different parameter is the s box we followed the approach by krasien reschberger who also considered alternative four bit s boxes three from real world ciphers present prince in pride and three worse s box with a differential uniformity of six eight or ten we left all other aspects of small as as before and just change the s box and computed then the index dependent equation system and found huge deviations in our results less in prince somewhat more in present and much more for instance in pride or toy ten interestingly the deviations follow both positive as well as negative directions the arising question is which s box properties cause the deviations krasien reschberger suggests that the variance of the s box would have a strong effect we try to evaluate this a little closer we also considered random s boxes with naturally high variance since random s boxes are rarely good s boxes so to say as well as known golden and platinum s boxes from other papers which were rather good four bit s boxes and had low variants we then computed for each s box from our equation system the distance to the expected number of collisions for an input cell we had on one hand the quality of the s box on one hand the results from our experiments we computed the Pearson correlation of the variance and our matrix ds and found that there is quite some high correlation of zero point eight one with a low error probability that this correlation does not exist the plot illustrates that there seems to be a correlation and we could somewhat substantiate what krasien reschberger already tried to address however we know that a full story would need much more studies and still that precise reason for when does an experimenter's technique of follow theory how well is still in the dark and is still an interesting open work