 Welcome to this online Yolk-Ryp's talk. My name is Nathan Keller from Berlin University. I'm going to present the paper, The Retraising Boomerang Attack, which is a joint work with Ordoin-Kelman, Yalronen, and Edition. So let's start with a brief summary of the talk. So while speaking about the Boomerang Attack, one of the best known types of innovative techniques, we present a new variant of the Boomerang Attack, which we call The Retraising Boomerang. אין להבדיל פרק של הדתה, ובסגל להגבי סגימפון של היו. אז אנחנו התתכנסים על השעות של היו, וגם להתכנס את הטלווות שעצמך, כשהן היו, ובסגל ולבשות, וכן להבדיל את האפליקציה, אנחנו נעשה 5 שעות, עם סופקים עשית ה-216.5. עכשיו אנחנו ממשים שזו супרופגנט, שזה באמת כ� עד סוג появה ספר של עוד ספר עד הספר כך. והאף פסטתי עבדה עד זה דבא אוריבות עשת נוקצים About 20 שנה. ועד עבור 2 שנות, השבור הכל הכי הקטע והנתה עבור בו מ-302. ثק 2 שנות, הראש ההקהלו של大家好 24 עשיתי ועכשיו connects us to reduce the complexity. All the way to two to 16.5 as a price of using an adaptive. There are other variants and applications. In our paper. What's not presented in this talk. Okay, so let's start with some background. A block site which is the most widely used secret key primitive is just כך פונקציה, שהיא רשיבה פלנטקסטי בי-ביץ וכי בי-ביץ וכי סייפר טקסטי בי-ביץ. most of the block ciphers today are iterative, that is, they repeat a small sequence of operations called round many times. Each round usually consists of some non-linear operation applied on small parts of the states in parallel like S-boxes, some linear mixing layer, and some key-mixing usually via symbol XO. Probably the best-known kryptonitic attack is a differential attack introduced by BM and Shamil 30 years ago. In this attack we consider pairs of plain decks with a chosen difference and we trace the development of differences through the encryption process. So here's the main notion is a differential, which is a probabilistic relation which says that if we take two plain decks P1 and P2 with difference alpha then the encrypted values e' P1 and e' P2 differ by beta with some probability P where e' denotes several encryption rounds and we hope that the probability P is larger than for a random penetration. Now if we can find such a differential which covers almost all rounds of the cypher then usually the cypher can be broken with a chosen plain decks attack that takes complexity of one over P. So since this attack appeared all cypher designs try to avoid long differentials with a non-negligible probability so that to be immune to this attack. This seems to be sufficient but 10 years later Wagner in 1999 showed that if instead of one long differential we have two short differentials for the first half of the cypher and second half of the cypher then we can also combine them into an attack on the entire cypher. So formally assume that the cypher e' can be decomposed as e0 and then e1 such that we have differentials, a difference alpha goes to difference beta through e0 with probability P and difference gamma goes to difference delta through e1 with probability Q then we claim that e can be broken with an adaptively chosen plain decks and cypher decks attack with complexity of one over P squared Q squared. How this works? So here is the attack algorithm it is very simple. We take two plain decks with a difference alpha P1 and P2. We denote the cypher decks by C1 and C2. Then we shift both cypher decks by delta so that we define C3 is C1 extra delta and C4 is C2 extra delta. We decrypt them, call the resulting plain decks P3 and P4 and check whether their difference is alpha. We claim that this holds with probability P squared Q squared. So why this attack works? Let us denote the intermediate value after e0 by xi. So if the differential in e0 holds for the pair P1 and P2, then we know that the difference x1, x2 is beta. This happens with probability P. If the differential for e1 inverse holds for both pairs C1, C3 and C2, C4, then the intermediate values satisfy x1 of x0, x3 is equal to x2, x0, x4 is equal to gamma. This happens with probability Q squared since we asked two differentials to hold. If all the above holds, then we can write the indifference between x3 and x4 as the sum of three differences x3, x0, x1, x1, x0, x2 and x2, x0, x4. So we get gamma, x0, beta, x4, gamma, which is beta. So if all of this holds, then we have the difference between x3 and x4 is beta. You can see it in the figure. Here in the middle we have the rectangle of beta, gamma, gamma, beta, which gives us the difference between x3 and x4. So if the above holds, and also the differential for e0 holds in the inverse direction for the pair x3, x4, then p3, x4, p4 is alpha, and this happens with another probability p. So in total, the probability that p3, x4, p4 is alpha is at least p squared Q squared. Why at least? Because we can't reach this difference alpha also in some other way. As for a random safer, this probability is 2 to the minus n. This allows us to distinguish the safer from random permutation. Okay, now let's present our new attack. So our basic idea is that we want to force dependence between the two pairs c1 and c3 and c2, c4, so that the probability of the differentials in e1 in the inverse direction will be higher than Q squared. How can we do so? So we assume that we can further decompose the second subcypher e1, s e11 and then e12, and also that e12 consists of two subcyphers applied in parallel, a left subcypher e12l and a right subcypher e12r. Now if we make sure that the output pairs in the left part, that is c1, l, c3, l and c2, l, c4, l are the same, maybe in the inverse order, then we save the probability of one differential transition through e12l. Because if one differential holds, then the other must hold as well, since this is actually the same pair. So formally, let us denote the intermediate value after e11 by yi, and it's left and right parts by yi, l and yi, r. So assume that we have three differentials. For e11 we have differential, gamma goes to mu with probability q1, for e12 left we have mu l goes to delta l with probability q2l, and for e12 right we have mu r goes to delta r with probability q2r. Here you can see it in the figure, now instead of e11 we have, instead of e1 we have e11 and then e12, and on the right part e12 is also divided into two parts, the left part and the right. Okay, so assume that we have all these differentials. So if the classical momentum distinguishes the probability is p squared q1 squared, then q2l squared q2r squared. Now if we make sure that c1l, x of c2l is either 0 or delta l, then after the delta shift we will get that the pair c1c3 in the left side is equal to the pair c2c4 in the left side, maybe in the inverse direction. So the probability of the differential transition in e12l becomes q2l instead of q2l squared. So we gained a factor of q2l, of 1 over q2l. We know that this, we call this technique retracing boomerang, since what we did we forced the boomerang to return at least part of the way in the same way it went forward. In a standard boomerang it doesn't return exactly the same way, so here we forced it to return partially the same way, so we call it a tracing boomerang. Okay, now how can we guarantee that this condition on c1l and c2l holds? We propose two different ways to do it. The first way which we call shifting retracing boomerang we just discard all pairs c1c2 for which the condition doesn't hold, so we construct c3c4 only for pairs c1c2 that satisfies the condition. This is the first way. The second way which we call mixing retracing boomerang. Now here we preserve all the pairs c1c2, but instead of adding the same delta to all pairs we XOR to each pair a specific delta. To be precise to the left part we XOR c1 XOR c2 left and to the right part we XOR zero. As a result we have c3 is c2l c1l and c4 is c1l c2r so that in the right part we have just zero difference and in the left part we have two pairs which are actually the same but in the inverse orbit. Okay, so let's consider each of these two separately. In the shifting retracing attack the problem is that we discard lots of data. So if e1 2l acts on bl bits then we discard all but 2 to the minus bl plus 1 of the ciphertext. And how much do we gain? We gain q2l which is 1 over q2l but q2l is at least 2 to the minus bl plus 1 so it seems that we eventually only lose. So why it is good? So here are the advantages of this idea. First this improves the signal-to-noise ratio by a factor of 1 over q2l. Indeed in the random permutation the probability of the event p3 XOR before equals to alpha is anyway 2 to the minus n. In our setting after we discarded part of the data we gained 1 over q2l in the probability so now the signal-to-noise ratio is better and we can use it to replace a differential in e0 inverse by a truncated differential and so our overall probability becomes better. Also we can reduce the data complexity since on the decryption side we decrypt much smaller amount of values and sometimes this also reduces the overall time complexity if the decryption side of the attack or the recovery part is the heaviest one then the overall complexity can be reduced. So if we compare this to some previous work then we can mention that discarding part of the data was already used but in other types of attacks in linear cryptanalysis by Biemens-Perle in time memory trade-off attacks. What we do to exploit dependency between differentials in Hans-Sibomerang was done by Biryukov and Khovratovich in the S-box switch and the partition to subcyphers applied in parallel was done also by the same authors in the ladder switch. However this was done only in the transition between the two subcyphers. What we do is to look at each subcypher separately here we showed it in e1 and show that we can also enhance the probability in each subcypher separately. So in sense this complements the attack of Biryukov and Khovratovich 10 years ago. And also we have an application of this technique. There is a Biemens-Perle attack of Biryukov on reduced round AES on five rounds and six rounds and we can reduce its complexity on five rounds from 2 to the 31 to 2 to the 32 and for six rounds reduce a data complexity from 2 to the 71 to 2 to the 55. The digits of this attack are given in the full version of the feed. For mixing retracing recall that we proceed with all pairs and we define C3 to be C2L C1R and C4 to be C1L C2R. So the advantages are that there is no loss of data and also the gain in probability is even larger than before. We gain not only 1 over Q2L but also 1 over Q2R squared since the difference in the right part is 0. On the other hand we cannot use structures since to each pair we saw some different value and also the combination with E11 is more problematic since now the difference muL is not meaningful anymore because it comes from delta L which is different for each pair. So this one is much more closely related to previous work. Actually we can view it as a natural generalization of the yoyo tricks attacks on reduced round AES of Ronjom Bardet and Helleset from Asia Crypt 17 which actually do this trick in the specific case of AES. And also here we have some nice application. Just on five round AES we can reduce the complexity to 2 to the 16.5 we will show it soon and also for five round AES with a secret S-box we reduce the complexity to 2 to the 26 from 2 to the 72. Just a small remark on names we call the second type mixing retracing since it is based on some mixing of the ciphertext and shifting retracing is called so since it retains the delta shift from the classical boomerang. There are some other variants of the retracing boomerang described in the paper. First we can have other subdivisions like we can subdivide also E0 so that we will have a subdivision to four sub ciphers. Also we have a chosen play text variant just like with the classical boomerang technique which was transformed into the rectangle technique. Here we can also adapt the attack to the chosen play text setting and we get the shifting and mixing rectangle attack and the mixing one is actually a natural generalization of mixture differentiates of grassy which actually do the same trick in the specific case of five round AES and also in the related key settings the attack can be mounted as well. Okay now let's move to an application. Our application will be to reduce the round AES so let's recall the AES which is the most widely used block cipher today. Its block size is 128 bits which are arranged in a 4x4 matrix of bytes as you can see in the figure. The key size is between 128 and 256 bits. We have between 10 and 14 rounds. Each round has four operations, sub bytes which transforms each byte separately by a non-linear operation, shift rows, shift the bytes of each row, mix columns, some linear mixing of the columns applied separately, and add round key just source a sub key to the state. Okay we start now with describing the yoyo distinguisher on four round AES of Ronjo-Metal. The basic observation here is that one and a half round of AES is actually 432 bits cipher supplied in parallel. Each starts with a diagonal and ends with an inverse diagonal. Let's see for example if we start with a diagonal 0 5 10 15, then after the sub bytes it's still 0 5 10 15. After the shift rows it moves to the first column 0 1 2 3. After the mix columns add round key sub bytes it remains the first column and then the shift rows moves it into an inverse diagonal which is in our case 0 7 10 13. So this happens for each diagonal hence if the input one and a half round AES has zero difference in some diagonal then the output has for sure zero difference in some inverse diagonal. How do we exploit this? So with the compose four round AES into three sub ciphers E0 E1 1 E1 2, where E0 is the first one and a half rounds, E1 1 is a mixed column of the second round, E1 2 is another one and a half rounds, and as usual the last mixed columns can be neglected. Now we follow the compose E1 2 S E1 2 left E1 2 right, where E1 2 left corresponds to the first diagonal. Now we know that E1 1 is actually linear so any differential so it holds with probability 1. And here is the attack algorithm. We take a plain text pair with zero difference in the first diagonal for the corresponding ciphertext. We construct C3 and C4 in the way of the mixing tracing boomerang. Then we check whether the corresponding plain text are equal in the first diagonal. We claim that the answer is positive with probability 1 which means that we get distinguishing for four round AES with only four plain text and ciphertext which is quite amazing. How Y does it work? So since E1 2L and E1 2R are applied in parallel and due to the construction of C3 and C4 we have that Y3 and Y4 are also equal to Y2L1YR and Y1LY2R with probability 1 which means that the difference between Y3 and Y4 is the same like between Y1 and Y2. And hence since E1 1 is linear we also have the difference between X3 and X4 is like between X1 and X2. For this difference we know that it is zero in the first inverse diagonal. Hence with probability 1 in P3 X4 before we have zero difference in the first diagonal as we see. Okay, so this is for four round AES. Now how can we apply it to five round AES? So the basic idea is to just add one round before the distribution. Rondument I'll observe that if two plain text P1 and P2 have no zero difference in only the first diagonal then this probability 2 to the minus 6, the corresponding values after one round have zero difference in some diagonal because they have no zero difference in only at most three bytes out of 16. Now if W1 and W2 have zero difference in some diagonal then we can apply to them the four round distinguishing attack which means that for the corresponding plain text P3 P4 the values after one round also have a difference in a diagonal. This can be used to mount the following attack. We take 2 to the 6 pairs with no zero difference only in the first diagonal. For each of them we assume the distribution indeed applies to W1, W2 and then we have some information on difference between W3 and W4. We use it to attack the first round. If the P1, P2 was wrong we just did the contradiction. For the right pair we find the first round subkey and then we are already done. Naively this takes 2 to the 40 time because each byte in the difference between W3 and W4 depends on 4 subkey bits in the first round but using some fine properties of AES, Rondument I, it uses complexity 2 to the 31. Now let us present some improved attacks. First we observe that in the setting of this yellow attack for a right pair P1, P2 we know that the difference between W3 and W4 is 0 and 1 byte. As we say the difference in each byte in W depends on the plain text and on 4 subkey bits, byte for example byte 0 of W depends on byte 0, 5, 10 and 15 of the subkey. And now the observation is that due to the structure of mixed columns this difference can be written as the sum of four functions so that each of them depends only on the plain text and a single subkey byte. This allows us to apply a meeting the middle attack so that we put two such functions on each side of the meeting the middle so each of them depends on two subkey bytes and so the meeting the middle takes 2 to the 16 operations for each right pair. Moreover since we have actually four independent functions we can use a dissection technique to reduce the memory complexity even farther to 2 to the 8. Since we have to repeat the attack for 2 to the 6 pairs and for four possible values of the byte with zero difference the overall complexity is about 2 to the 24 which is already better than 2 to the 31 but not better than the best previous one. So let's improve it even farther. We can further reduce the overall complexity by choosing specific plain text and also refining the meeting the middle possible. How can we do it? So first we choose plain text pairs that have difference only in bytes 0 and 5. As a result the fact that some plain text pair is a right pair that has zero difference in a diagonal after the first round gives us a condition only two key bytes 0 and 5 and thus we can eliminate one of them from the meeting the middle. Also we take instead of 2 to the 6 we take 2 to the 14 plain text pairs so randomly for 2 to the 6 of them the result in p3 and p4 have zero difference in byte 10 so that we can remove this byte as well from the meeting the middle possible. So now in the meeting the middle we have one byte on each side so the complexity of the meeting the middle possible becomes only about 2 to the 8.5 operations. Again we repeat the attack 2 to the 8 times so the overall complexity now is 2 to the 16.5 and we can repeat again that for 20 years nobody could attack this variant better than 2 to the 32 and only recently the attack was improved to 2 to the 24 here we get to all the way to 2 to the 16.5 and this attack was fully verified experimentally so that we know it indeed holds. So as maybe two open questions one if further improve this attack either reduce the complexity or more important achieve the same complexity of 2 to the 16 in some more realistic attack model say chosen plain text or even known plain text and the other problem find some other variants and maybe some further applications of the new repression bombing attack. Okay, so sense for your attention.