 Hello, my name is Ohadamon, and I will be presenting the paper Three Third Generation Attacks on the Format Preserving Encryption FF3. This paper was made by myself, or Dunkelman, Nathan Keller, Eyal Ronan, and Adishami. Let's begin by defining FF3. FF3 is a Format Preserving Encryption specified by NIST in 2016. Format Preserving Encryption can encrypt any domain into itself, unlike most block ciphers, which can only handle a specific predetermined domain. For example, we can encrypt credit card numbers such that each ciphertext is also a conforming credit card number. This is useful mainly for adding encryption to existing databases or communication packets that require a specific format where the ciphertext cannot be of a different format than the plaintext. After FF3 was proven insecure by recent attacks by Durak and Vardene in 2017, a new cipher FF3-1 was published that fixes the uncovered vulnerabilities. A recent paper by Bain presents an attack on FF3-1. However, the attacks that we show in this paper are relevant to the original FF3 and not to the newer FF3-1. So FF3 accepts plaintext of a domain M on N along with a key K and a tweak T and encrypts them to ciphertext in the same domain. The tweak acts as an IV value in order to allow independent encryptions of the same value. For simplicity, we will assume throughout this presentation and also throughout our paper that M is equal to N and we use N to express complexity. However, these attacks can be relatively easily generalized in order to also address cases where M is different from N. The complexity of each attack will be portrayed using data time and memory requirements. Now that FF3 is generally used on relatively small domains, meaning that heavy usage of time and memory is practical. In that case, we will prioritize minimizing the data requirements, meaning the oracle queries and the cost of additional data and sorry, at the cost of additional time and memory. So let's begin by presenting previous existing attacks against FF3. The purpose of all of these attacks is to enable encryption and decryption of any plaintext and ciphertext under a certain tweak. The most trivial way to do this is simply to query all possible plaintexts, meaning the entire domain of N squared, and keep a table of size N squared, which holds the ciphertext of all of these plaintexts. That allows us to then query the table to encrypt or decrypt any plaintext that we wish. The first attack of that was an improvement over this was an attack by Dirac and Vaganet in 2017 that improved the data requirements, meaning that we no longer need to query the entire domain at a significant increased time. Which, although it is N to the power of five, as we said earlier, these times are practical since FF3 works over low domains. The second generation attack by Hong Miller and Tru in 2019 preserves this data and time memory requirements, whilst significantly reducing time below N to the power of three. We give three attacks beyond these existing attacks. So the first attack is a symmetric slide. The symmetric slide attack is based on the second generation attack by Hong et al. It significantly reduces the data complexity down up to N to the power of 1.5 while also improving the time and memory complexities. Furthermore, it also allows a trade off between data and time, if that is required. The second attack is an asymmetric slide. It's based on the symmetric slide and is also a further strict improvement over the symmetric slide, while also giving a better time data trade off. The third attack is a bit different. It's an application of an attack previously shown by BM document in color in 2007. However, that attack was then considered impractical due to high domain sizes, since the attack requires walking over nearly the entire domain. However, FF3 means that this attack is now practical since FF3 and format preserving encryption in general have a low domain size, meaning that the attack that was previously thought to be theoretical is now practical. Note that in this attack, it is possible to require more in the domain size than N squared, since we do need to query several different tweaks. However, we don't necessarily need to query the exact plaintext or ciphertext that we're trying to recover, meaning this attack is still better than the trivial attack. All of our contributions, we experimentally verified simply by simulating them many times over random keys and tweaks. We also got success rates that are strictly better than the second generation attack, as can be seen in this table. On the left hand, on the left side, you can see our asymmetric slide. And on the right side, you can see Hong et al's second generation attack. As you can see, the number of queries and time complexity are significantly lower, while the success rate is higher. Now that we've summarized the contributions, we can move on to show the construction itself. So let's begin by the cipher construction. So f of three is an eight round feisty construction from an on end to an on end. The thing to note here is that while most feisty constructions use exclusive or in order to merge the left and right house f of three uses addition model loop, which is what that square notation means. And due to the fact that the domain that we're working on is not necessarily a power of two, making so are unfeasible. The round function utilizes the tweak in order to make each round distinct. So as I said earlier, f of three accepts two parameters, a secret key K and the tweak T, the tweak is divided into a left and right half, into a left half and right half, similarly to the state. So, in order to calculate the round function of around I, we take the corresponding tweak, meaning the right half if the round is even and the left half if the round is odd. That is exclusive order to the number of the round. The result of that sort is appended to the input of the function. And then all of that together is encrypted with a yes, with TK. Then the result of that as is truncated as necessary in order to fit the state. The important thing to note here is that the tweak is the thing that makes the functions of each round distinct. If the tweak is identical in two rounds, then the functions are also identical. The attack has three parts. First, we will create a reduction from a round of FF three to four round FF three. Then we will break for FF three by reconstructing the code books of each individual round function using a pure, using a sub routine that we will call pure F reconstruction. Lastly, we will simply combine the above two steps in order to reconstruct the code books of all eight rounds of FF three. We only have time to focus on the first strand, meaning the reduction to four rounds. Therefore, the second step, we will simply assume that we know the algorithm that we will call for the rest of the presentation PRF reconstruction. And what that does is simply accept a number of pairs, plain text and ciphertext pairs for four round FF three and returns their own functions. So in order to create the reduction from eight rounds to four rounds, we need a slide characteristics characteristic, which was presented by Durak and Vodane in 2017. For that slide characteristic assume that we can encrypt under any tweak we wish. So we can abuse that scheme in order to create a slide attack. So for that we'll need two tweaks. One tweak, which is simply the tweak we are trying to attack. And the second tweak T prime, which is equal equal to the first week where each half is X sort with four. Let's see how encryption under each of these tweaks looks like. So, if we encrypt under the original tweak, then simply the tweak of each round is X sort with the number of the round, as we sign the definition of the functions. However, if we look an encryption under the related tweak, then rounds zero through four are X sort with four through seven rounds four through seven are X sort with zero through four. What that means is that the first half of encryption under T is equal to the second half of encryption under T prime and vice versa. If we look at it a bit differently, then we can define these have as functions F and G, such that encryption under T is equal to performing F and then G and encryption under T prime is equal to performing G and then F, where note that both F and G, they are each F of three with four rounds. So, using this, we can mount a slide attack on F of three, where the purpose of the slide attack is to find input output pairs for F and for G. In order to do so, we will need something called slid chains, which was presented by Furuya in 2002. So chains are iterative encryptions, meaning we choose a random starting point X zero and iteratively encrypted in order to create a chain. So here we can see encryption under T. We can also do the same thing for the related tweak from a random starting point Y zero. Note that the functions that the chains are both alternating functions of F and G, however, we only know we do not know the intermediate states between F and G. Now, note what happens if there exists some offset T for which F X zero is equal to Y T, as shown here. In that case, we can align the chains at an offset of four rounds to each other. And we can see that since the functions of G and F are now simply the same in each chain. That means that from here on forward and also backwards, the values in both chains are identical, meaning that the intermediate values of each chain are the values held in the other chain. That gives us, on one hand, a lot of values for F, since F of X zero is Y T, F X one is Y T plus one and so forth. On the other hand, that also gives us values for G, since G Y T is X one, G Y T plus one is X two, and so forth. Meaning that if we can locate two chains and an offset where we can find a single node where the intermediate value is equal to the value of the other chain. We now have plain text ciphertext pairs for F and for G, which then allow us to mount the four round PRF reconstruction. So our goal is now to find such chains, which we call slid chains for a correct offset. The problem with identifying slid chains is that it's difficult to do so without knowing the intermediate values, since we do not know the encryption after four rounds of F of three. The naive solution is to not try to figure out what chains are slid or not. That means that for every single possible offset between two chains, we simply try PRF reconstruction on that offset. So if the reconstruction works, that means that we have succeeded, we have queried on a successful slide, and we're finished. If it doesn't work, we can move on to the next slide. The problem with that is that this is very expensive in time PRF reconstruction takes about and to the power of 1.5 time for each query, meaning that if we try to query it for each offset. Then we need to do the full process of PRF reconstruction a lot of times when ideally we will only do it for a correct offset. Therefore, we can try to use a distinguisher, which was what how long it all do. If we have a distinguisher for foreign F of three, then we can query each offsets and only call PRF reconstruction on the correct offsets. That is since if the chains are slid with offset D, then the intermediate values form plain text ciphertext pairs for foreign F of three. Therefore, distinguisher will return true for those values. However, if the chains are not slid chains under a certain offset, then the values are simply not correlated between the two chains. Therefore, a distinguisher will return false. That means that if we can find the distinguisher that performs better than PRF reconstruction, then we can improve the attack. So how long it I'll use the distinguisher that required until the power of 1.5 pairs of plain text, where each pair has a common right hand. We use the same distinguisher that how long it I'll use. However, we did improve the analysis that showed that only required o tilde of n pairs. We can simulate such o tilde of n pairs by accepting only square root of n plain text, where all of those plain text have a common right half. Therefore, defining all pairs between each two of those plain text, we can create o tilde of n pairs. Since the distinguisher can work in time equal to the data that accepts, we now have a distinguisher that works with square root of n data and square root of n time. So the exact workings of this distinguisher are outside the scope of this presentation. However, if you're interested they are presented in the paper. For this presentation, we will use this distinguisher as a black box. Only note that it does require square root of n plain text that all have a common right. So, using this distinguisher and the slid chains, we can now set up our attacks. All three of our attacks use similar premises of creating chains and trying to find slides between them. However, we only have time to present one of those attacks. So, we've chosen to present the cycle detection slide, the third attack, because we believe it is the most interesting. So, how does the cycle structure attack work. So, f of three is a permutation, meaning about its graph is formed of cycles. So, the first step is to find slid chains more easily. And that is, as I said earlier, a theoretical attack presented by BM Dunkelman and Keller in 2007. However, as I said in the beginning of the presentation. Before now this attack was purely theoretical, since it requires on walking over most of the domain of any certain cypher. It was impractical for any cypher with a large domain. Due to format preserving encryption, having low domains, that means that this attack is now practical for f of three, which gives it additional academic value. So let's see how this attack works. Consider a cyclic change. Sorry, consider a cyclic chain of size L in the permutation graph defined by the tweak T. So consider the intermediate value, meaning that if we look at the chain x zero x one x two, then now we have the intermediate values, why zero y one y two and so forth. Now consider the cycle defined by those intermediate values. That cycle has two traits that we want. The first street is that its length is exactly equal to the length of the original cycle. The second is that, as a, because it is formed of the intermediate values, because of the slide characteristic of alternating forms of G and on F. That means that that cycle is a cycle in the permutation graph of the related tweak, meaning the tweak where each half of the tweak is x or with four. The cycles are intermediate, hold the intermediate values of each other. That means that if we can find those two cycles, the first in the graph of the original tweak. And the second in the graph of the related week, then under a certain offset, those two cycles form slid chains, and then we can use them in order to recover F and G. In order for that, we need to find a cycle of sufficiently a cycle of sufficient length and T, and then find a cycle of the exact same length in T prime. So, let's see how long our cycles need to be. So, as we said earlier, the distinguisher needs a square root of and plain text with a common right half in order to work. The minimum length of the cycles needs to be into the power of 1.5. In order for there to guarantee a specific right half, where all of, in order to guarantee a specific right half that has square root of and plain text that all have that common right according to Shep and Lloyd from 1966, there is a high probability that cycle of sufficiently exists. And not only that, there is a high probability that cycle of that exact length is unique in the permutation graph of the encryption. That means that if we can find two cycles that have the exact same length, 1 and T and 1 and T prime, with a very high probability, they aren't just random cycles that have the same length. They are the exact offset cycles of intermediate values that we need, meaning that it's not a meaning that's simply sufficient to find the cycle of the exact same length in the related tweak graph. The problem of finding a cycle of a specific length is that it's costly in data. In order to do that, you need to walk over most of the encryption graph of the related tweak, therefore, that takes n squared data and also n squared time and memory in order to do so. Now, once we have found those two cycles, then we can now test all offsets between the two cycles, meaning that we look at all plain text defined by all the offsets. For example, starting by x zero along with y zero, then y one, then y two, and so forth. If a specific offset is accepted by the distinguisher, we can then use those values in order to recover F and G and thus recover the full eight rounds of FF3. So, to go over the full algorithm, start to finish. First, we find an input cycle of sufficient length in the original graph. Then we find an output cycle, meaning a cycle of the exact same length under the related tweak. Then we find all of the values in the input cycle that we need for the distinguisher, meaning we find the most common right hand value and keep square root event indices where it appears. And those will be the plain text that our distinguisher requires. Then we go over all possible offsets. There will be n to the power of 1.5 of them and test each and every one using the distinguisher. If the distinguisher accepts the slide, and because of the structure of the cycles, there is a very high probability that it will accept one of the slides. We assume that that slide is the correct one, and then we can call puref reconstruction on F and on G. Let's do some complexity analysis. There are n to the power of 1.5 offsets, and the distinguisher requires square root of n time to run. Therefore, the time of checking all of the offsets is o tilde of n squared, which is also the time needed to find the two cycles. And as we saw earlier, the most heavy data requirement is all of the queries required to find the cycles, which is n squared. And that concludes our attack. So, beyond these three attacks that we have, we have some further more minor contributions, which I will list here. The first is that we managed to improve the time complexity of that puref reconstruction from n to the power of 5 thirds, which was what was used by Juan Getal. We improved that to n to the power of 3 halves. We also had added two related domain attacks, which related domain effects mean that we query under separate domains in order to reconstruct the encryption. The first is a generic attack on all cycle walking for my preserving encryption schemes. And the section is a distinguishing attack, which is relevant for FF3 and also for FF31, where which are origin, which are three attacks that we presented here, don't work on FF31. We also fixed the slide characteristic that was dependent on the tweaks. We also have some several additional minor results, including reduction of memory and some alternate attack models. To conclude what we've done here. We have three new attacks on FF3 where the symmetric and the asymmetric attacks significantly reduce the data complexity of previous attacks, while also improving time and data memory and also allowing time data, sorry, while also improving time and memory complexity and also allowing a time data trade off as needed. We also show a new attack, which is a practical application of a previously theoretical attack. These findings show the general potency of slide attacks and why it's important to make grand functions different from one another, and also shows how interesting theoretical results may become practical in the future. Thank you for watching.