 Hello, everyone. Thank you, Kenny, for the introduction. My name is Betul Durak from EPFL. And this talk is about format-preserving encryption. It's a joint work with my co-author, Serge Woudini, from EPFL. And more specifically, I'll be talking about an attack to FF3 standard over small message domains. And I'm going to start with the concept of a block cipher. Block cipher basically gives a method to encrypt a message of a certain fixed length into the same length of cipher text with a hard-wired key. I show you an example of AS with 128 bits of block sizes. However, block ciphers are somewhat rigid because of this fixed length of messages. In order to relax this, a new notion called format-preserving encryption has been introduced and took place and applied cryptography recently. NFPE basically is a cipher that encrypts a message from a general domain ID into the same domain ID. So it is basically, again, a keyed permutation but defined on more general domains. So these general domains typically are passcode social security numbers or credit card numbers in practice. And they are not necessarily bit strings or fixed lengths. And more importantly, FPE is designed to work on small domains of messages. So for indicators of SSN, it's just 30 bits of message. I can show you an example of FPE in practice with encrypted databases. Here's a table of patients from Hospital Record. It includes some sensitive information, the identity of the patients. And in order to upgrade the security of this database, we can use FPE to individually encrypt columns with independent secret keys. And it gives a transparent way of encryption. So what I mean is that we don't need to do any significant schema changes or there will be no significant changes to applications that are running on top of these databases. So desirably, we like to use already known conventional block ciphers like AES to construct FPE. One way to do is to pad the input to 128 bits and truncate the output. But this disables the decryption. So we still want to use AES because we know its security has been there for so long. It's fast enough for practice. But maybe it's not that straightforward to use it. Having said that, I can classify FPE constructions in two classes. In the first part, we have provably secure constructions, but they are not practical enough for practitioners. So NIST has published a standard about format-preserving encryption in 2016. And all the constructions are practical, but the security of these constructions are supported by crypt analysis, which is exactly the topic of this talk. And we have two constructions in the standards called FF1 and FF3. And they both are based on Faisal networks. A Faisal network, it's a widely known iterative cipher. Basically, it defines a pseudo-random permutation on any domain. And in here, I denote the domain as d square because of the two branches of Faisal network. And basically, we have the input message. We divide it into two parts, again, because of the two branches of Faisal network. Again, we iterate through the Faisal network. Each iteration is called a round. And the round consists of a secure round function called, typically, this is a secure PRF, and a group operation defined on domain d. So I'm going to talk about a different feature of format-preserving encryption. FPE is by design a deterministic encryption because we like to preserve the size of the message on the cipher. So we will have a deterministic encryption. And as I said earlier, the FPE is designed for small messages. So it's very likely that we will end up having two plaintexts trying to encrypt it under the same secret key. And the ciphertext will be equal to each other. In order to escape from this, FPE introduces a notion called a tweak. And as long as we have two different tweaks, two messages that are identical will be encrypted under the same key as two different ciphertexts. So tweaks are very essential for format-preserving encryption. And more importantly, probably, these tweaks are not the part of the secret key. So they are not secret. They are publicly available and under the control of the adversary. So coming back to FFTrick construction, to uniform and to simplify the message domains that we work on, I will focus on the Zn times Zn, the domains Zn times Zn. And you can see that the domain size is, in this case, n square. And how round function is constructed is follows. We take the right half of the message. We pad it to 96 bits and concatenate it with the 32-bit soft tweak. And this is the input to AES function with the secret key k. And output of AES is reduced modular n to truncate. So now we can use modular operation to truncate without disabling the decryption. So what is important here is that round functions are supposed to be distinct to each other. And we want to use AES with the same secret key. But changing the tweaks, we will obtain different round functions over here. This is the basic idea of FFTrick construction. And from now on, I will just drop the secret keys and tweaks from the notation. So don't get confused, hopefully. So in the standards, FFTrick has eight rounds as it's standardized. And the domain size is at least 100. In terms of security, in the standards, it's written that the targeted security is 128-bit. I don't know what that means. And the security of FISL networks also inherits the FFTrick construction. Because FISL networks has long history, dates back to early 70s. And there has been some cryptanalysis on them. And the results still apply to FFTrick construction. And in addition to this, FFTrick claims that there is a chosen plaintext security and even PRP security under chosen plaintext and ciphertext attack. And we will show that this is not indeed the case. I will divide the rest of the talk in two parts. In the first part, I will talk about generic attacks on FISL networks. And I will specifically focus on three-round and four-round FISL networks. I hope you don't find it really like a small number of rounds, because you will see that I will use it to break the FFTrick construction for small message domains. And for small message domains, the attack will be practical. As far as I know, this is the best known query and time complexity. There are other attacks to FFTrick as well, but our results are practical for messages. Good news is that we can patch the FFTrick construction to prevent it from the present. So a little bit of intuition. The question that we like to answer probably is about the round functions. So here, there is a three-round FISL network with three different round functions. And the question is if these round functions are uniquely defined to encrypt the messages. So the encryption happens as follows. We take the right half of the message and input at the first round function and it edits output to the left part of the message to generate the intermediate value C. And then now we change the direction and we take the intermediate value C as an input to the second round function and we add its output to the right part of the message to generate the right part of the ciphertext. And continuing like this, we generate the left part of the ciphertext Z. And similarly, what we can do is that to maybe introduce an additional delta value to the output of first round function and this additional delta value be transferred to intermediate value C. And since I want to have the same ciphertext, we can basically subtract this delta value from the intermediate value before inputting it to the second round function. It will give me the same right half of the ciphertext, which is T. And again, this intermediate value C still has this additional delta value and it can be the output of last round function could be subtracted, delta value subtracted to generate the left part of the ciphertext. And all I'm trying to say here is that I have these two tuples of round functions and they give the same input output behavior. So not for just one specific message ciphertext pair, but it will be the case for the entire domain. So maybe you can make the analogy to coset concept from algebraic structures, if that will make more sense to you. The outcome of this observation is even nicer because we will use this. The outcome is that the output of one arbitrary input y could be set arbitrarily for the first round function f0. And this will still help give us way to have the same input output behavior as original tuple of round functions, namely f0, f1, and f2. So why it is important? Because the goal of the adversary in our work is to do the round function recovery. So the adversary will not be able to recover the true round functions that has been used in Faisal network, but we can reconstruct the equivalent tuple of round functions. So this will be still valid because it will give the same input output behavior as the original tuples. And another way to look at it is maybe to think of the code book recovery. So without caring about the round functions, how they work, how they function, we just want to find a way to encrypt message, get the input output pairs without caring about the round functions. So these two aims for the adversary will be equivalent to recovering the secret key, because we won't eventually need a secret key anymore. So as I said, I will talk about the three round Faisal network attack. As far as I know, there has been no work on this in terms of round function recovery. And our attack is non-plaintext power given to the adversary. For the four round attack, I will use this two round attack to develop four round Faisal network attack. And there is another work to recover the round functions given by Biryukov at all, and even better time complexity. But in their case, the adversary is more powerful than non-plaintext adversarial power. So to attack FF3 construction, we exactly need non-plaintext. We cannot assume stronger adversarial model. And our work also can be generalized to more number of rounds, five, six, maybe more, as opposed to Biryukov at all results that cannot be applied for more than five rounds. So the sketch of our three round Faisal network attack as follows. The input to the adversary is a set of message ciphertext pairs, non-message ciphertext pairs. And we call this set SS. And the adversary tries to recover the round functions even fully or partially. So it could end up with partial recovery of all these tables, f0, f1, and f2, or full of them. And we will analyze theoretically when it is going to be full, when it is going to be partial. So I have the three round Faisal over here again. In the first step, the adversary picks arbitrarily a pair x0, y0, z0, and t0. And I show you the equations that it can use. But we don't know the intermediate value. The adversary doesn't have information about this. But using the outcome of our intuition, we can set one arbitrary output of arbitrary input in f0. I do this mapping of f0, y0, to 0. So we are free to do that for one input, one single input. And from now on, we can use these equations to fill the tables of f1 and f2 on one point, namely c0 and t0. And the second step, the adversary now knows how to evaluate f2 of t0. And now he can pick the message ciphertext pair from set S, which has the matching right half of the ciphertext, because it knows how to evaluate this f2 on point t0. And now the adversary tries to decrypt the backward using again same equations. But from now, we use the last equation over here, like f2 t equation over here. And it will fill the tables of f0 and f1 for the points, for this given pair, x1, y1, z1, and t1. And again, similarly, now the adversary will pick another pair of message ciphertext pairs, which has matching right half of the messages. And since it knows how to evaluate f0 on these points, it will be able to recover the rest of the points for this given pair. So the idea is to go back and forth between f0 and f2 by filling the intermediate values of c1 for the second round function f1. So we will continue like this until no more values will be revealed for the tables. And we will end up with partial tables or fully recovered tables. And why this works could be modeled as a bipartite graph. Like now, I will try to justify why this algorithm works from a theoretical point over here. For that, I will model the bipartite graph. And for this bipartite graph, I have two vertices, two set of vertices, which includes all the possible values for y value, which is the right half of the plaintext, and all the possible values for the t value, which is the right half of the ciphertext. So the edges will be defined with this input set s. And we put an edge between y and t whenever they appear in the set s. So what the algorithm does is that it starts with one arbitrary input y0. And it follows an edge from y0 to t0. Remember, in the tables, we fill the points of y0, c0, and t0 for the first, second, third round functions. And from t0, we moved another right half of the message, which was y1 in the previous slides. And then we continue back and forth between y and t values. And all we are trying to do is to look for a big connected component in this bipartite graph. And since what we know about this bipartite graph now, since it's a Faisal network, it's supposed to be secure and having PRP security, which should be indistinguishable from random looking like permutation, this graph should behave like a random graph. And from random graph theory, we know that this graph is fully connected with high probability if the set of a number of edges, which is the size of s, is n log n. And again, even when it is not fully connected, we could have some giant connected component. And in random theory, again, it says that this is highly likely when the size of the s is n. So we have a nice gap to play around. Even though we don't succeed to have full recovery of tables, partial recovery would be probably helpful to attack the FF3 construction. So I have the experimental results over here. And the size of the s is parameterized as delta value. And delta values are shown in the x-axis over here. And I have two classes. In the thin lines, I show you the fraction of recovered f0s depending on theta values. Maybe the behavior is that the fraction of recovered f0s do not really depend on the n value over here and the thick lines are the fraction of experiments out of 10,000 experiments, which fully recovers all the round functions. So no partial recovery, but full recovery. And this matches what the theoretical way we were saying in our work. I'm going to switch to four round files to network attack, but I'm not going to spend much time on it because it's less intuitive, a lot less intuitive and more detailed involved. It's not very simple to explain. But the question that we were wondering and trying to answer was that, can we characterize f0, the first round function in the four round FISL network, in a way that we recover some intermediate values c? As soon as we recover all these c values, intermediate values, we will be able to apply our non-plain text attack on three round FISL networks. This was the whole idea. So it took quite a long time for us to discover this attack. But as soon as we find a way to characterize f0, we are good to go. And I can show you some experimental results. I have n as the domain size to the round function, not to the FISL network, to the round function. And m is the data complexity, a query complexity to the FISL network. And I have some number of trials. The probability success shows that with what probability we fully recovered the round functions for four round FISL networks, not partially, but fully. So as you can see, when n gets larger, the success probability gets higher. Of course, the time complexity gets higher as well. So we couldn't run it for larger than 2 to the 18, the domain size of 2 to the 18 for FISL network. But this, again, matches what we theoretically justified our results for the four round FISL network attack. Having said that, now we can go back to FF3 construction. And in here, we need to be careful about the, I talked to you about how the tweaks are used specifically for the FF3. But there is some more specifications, the way tweaks are used in FF3. So FF3 takes an input tweak, T, and it divides it into two halves, left and right halves. And for the even number of rounds, the even indices are even for the FISL network, it uses the right tweak, and when the number of rounds, the indices of rounds are odd, it uses the left tweak. But still, it needs to separate all the domains. So we don't want to have the same tweak for the first round function and the third round function. So in order to prevent this, they absorb the right tweaks or left tweaks with the round indices. So this is basically, in the literature, this is called the pairwise different round functions. But the construction tries to separate the domains with this idea, but we don't know how secure it is. And indeed, OK, so before that, this work, we have a chosen plain text and tweak attack for FF3 construction with given query and time complexity. So it's a round function recovery again. There's another work given by Bellara at all, but I don't think it's a fair comparison because of the techniques that are used different techniques into different papers. We take more traditional approach, like round function recovery is more traditional approach than the security notion that Bellara defined. And also, we were really interested in finding query complexity, which is less than n squared. So the n squared is the size of the domain. If you make more than n squared queries, basically, I mean, you can get the code book, entire code book with n squared queries. So our aim is to do it a little bit less than or better than n squared query complexity. And the number of tweaks we use is two over here. So the idea of FF3 attack is maybe called a slide attack because this is FF3 with tweak t. And I can show you another FF3 with tweak t prime, which is exactly the taking the old tweak t and XORing it with four. So what is happening over here is that with XORing the old tweak that we have used for another construction, we can permute the round functions because tweak can be XORed to something, and then they will be XORed to the round indices. So we can play around with this. And it gives many other permutations than this specific one. So what is happening in this specific one is that they have the same halves. As I can show you here, there is a G function that I will call it as a four round FISAL network with tweaks tXORed 0, tLXORed 1, and so on. And the upper part of the left side matches the lower part of the right side. And again, similarly, the upper part of the right side matches the lower part of the left side. So this kind of gives a little bit idea about maybe a little bit of a meet in the middle type of attack, but this is a slide FISAL network with just XORing the tweak with four. And what we can do is the following. We start with an arbitrary input x, y, 0, 1, and we apply a chain encryption to this with the secret key k and the tweak t. So this is basically H of G I will define in short. And I will do it for many other arbitrary plaintexts, starting with many other plaintexts. As you can see, this is a chosen plaintext attack, not the known plaintext anymore. And what we can do is the same thing for the other tweak, which is tXORed with four. And now what happens over here is just to swap the G and H function that we apply here. So now we take two segments from left part and with the encryption under tweak t and encryption under tweak xORed with four. And the key point over here is that the adversary wants to find a mapping of x, y, i, j to x, y, bar i prime 0 under this function G. As soon as that happens, as soon as the adversary discovers this, the rest will follow basically pretty forward because we applied the chain encryptions of swapped G and H functions. And the rest will be recovered. What is happening here is that the adversary is able to collect known plaintext ciphertext pairs that is going to be the input of our four-round FISL network attack. So we reduce to eight rounds attack to the four-round known plaintext ciphertext attack. And maybe the question is that how the adversary gets this very first mapping? Well, we applied a very trivial way. So the adversary tried for each pair. So it tried each pair and assumed that these pairs gives the mapping under function G. And then it tried to continue the four-round FISL network attack. And if it fails, it will start over again with another pairs. So this was the idea. And maybe because of that, the complexity is a little bit higher. But maybe there will be some more neat ways to let the adversary to come up with the first mapping of x, y, i, j to x, y, bar, i prime 0. So now what is in theoretical perspective, we are trying to find that these two segments, taken from these two encryptions, off-length B. B is the size of the chain again. Then these two initial points of these segments will be overlapping on m points. Why m? Because m was the complexity of our four-round FISL network attack. To apply it, we need exactly m points. And the probability is given here. And we tuned the parameters in order to get this probability very high. So we picked our B and A based on this. So the experimental results as follows. Again, n is the domain size of the round function. And m is the query complexity to the four-round FISL network attack with parameter l. And l is set to 3 for experimental results. But in theory, if you get larger l, it will be giving much better probability of success. And A was the number of arbitrary plaintexts to apply the chain encryption in the algorithm. And B is the length of the chain. So in here, if you want to have a secure FFT, if you want to use the FFT in a very secure manner, then you can go with encrypting maybe two, three bits. It's fine, because our attack couldn't succeed to recover anything for the n equal to 2. But n gets larger. When n gets larger, the probability of success, which is the full recovery of all the eight-round functions, was getting higher, which is getting close to 1, even for a bigger parameter of l. But for the experiments, it was around 78%. Of the time, we were able to fully recover all the round functions. So I'm going to conclude the talk, saying that, stating that maybe we need to work a little more on FISL networks with such small domains. Maybe some other techniques will work as well. You don't know. And the good news is that the FFTree, even though FFTree suffers with bad domain separation, there is a very quick, simple patch that this is basically the concatenating tweak with the round indices instead of X-soring it. So our attack won't work in that case. So thank you so much for your attention. Thank you. OK, we have a little bit of time for questions. So I would invite people from the audience to come to the mic if they have a question. While you're thinking of one, let me ask one briefly. Obviously, you broke here a NIST standard or found new attacks against NIST standard. Can you explain a little bit the process you went through with NIST and how they reacted to your work? So we submitted our paper first. And then we emailed NIST about the results. And in the first couple of months, probably, they didn't reply back. And probably it was because of my bad write-up that I sent them. So it was a little bit tough to figure out what is going on, because it was all theory. There was no slides like this to explain briefly what is happening. And then they get back us. They said, OK, so I think they need to look at the standards again and prepare some announcements about it. And then they invited me to talk about it. I talked about the attack, more details, with more details around 2 and 1 half hours. And then they made an announcement about it. So are NIST planning a new version of their standard to introduce new techniques? Yes. OK, rising up. Any other questions? And by the way, this is also in NIST standards. It's not just NIST standards. And NIST is also considering to revise their standards. I have a question about what sort of security notion we want for these very small things. We're encrypting two bits. We're not going to want to learn the whole table, so don't we want a tweakable notion where we can't tell if we're different tweaks, say different rows of the database, what the functions are. I mean, that's a very great point that the very first time I wasn't even able to understand what the security should mean, because it's such a small domain size, even with tweaks, we won't make much sense. But yeah, I really don't know the answer to that question. So we should maybe talk to people that want so badly, we need FPE. OK, thank you. Let's thank Betul again for a great talk. Thank you very much.