 So today we'll talk about attacks on Format Preserving Encryption, or FPE. So FPE is what you practice to encrypt cracker numbers or fields in legacy databases. So in those legacy systems, the schema mandate some certain format on the data. So if you use an ordinary encryption scheme to encrypt your data, then you would destroy the format and disrupt the legacy system. In contrast, under FPE, the server text will have the same format as the plain text. For example, if you encrypt a cracker number, the server text will also look like a cracker number. And thus, by retaining the format of data, FPE avoids disrupting legacy system. So definitionally, FPE is just a tweakable block cypher with a general message space DOM. In particular, an FPE scheme takes as input a key and a tweak to map a message in the domain DOM to a server text within the same domain. This mapping is deterministic, meaning that under the same key and tweak, if you keep encrypting the same message, you would end up with the same server text. So let me elaborate why there's a need for tweaks via an application. Suppose that in a database, we encrypt the customer names together with their cracker numbers in one table. At another table, we install the transaction numbers together with the corresponding cracker numbers. If we combine these two tables, you can realize that Shondo made transactions 1 and 2. And even if you FPE encrypt these two tables under the same key and tweak, the linking process can be done because FPE is deterministic. In contrast, if you FPE encrypt them with different tweaks, then you can still realize that transactions 1 and 2 were made under the same cracker number. But there's no way to link that back to Shondo anymore. So by using tweaks, you increase the security of FPE. Now, a particularly annoying challenge in designing FPE is that, unlike traditional block cipher, the domain here can be very small. In fact, there are applications in which the domain size is just 100. And this crucible repeatedly leads to attacks that are exponential running time, but still practical on small domains. We will see those attacks in a couple of minutes. But if I get into that, let me briefly recall some of the real-world FPE schemes. So most important ones are niche standards, FF1 and FF3, both based on five-store networks. FF3 is currently suspended due to a recent attack by Durak and Vodane. But because the fric is pretty cheap, so it seems likely to get reinstated. There are several companies offering FPE products, such as Vortice or Varyphone. And most of them either use niche standards or some close five-store variants. However, there are still non-five-store FPE solutions from industry. For example, Protectity uses a scheme that they call DTP and claims better security than niche standards. DTP largely follows an ad hoc construction in the 90s. Cisco also proposed an FNR scheme, but to the best of our knowledge, it is not in use. FNR, instead, is based on the non-rangle variants of five-store. There have been so far two complementary attacks on the FPE schemes. All of them focus on niche standards. The most recent one is by Durak and Vodane. And by exploiting a bug in the rough functions of FF3, they managed to recover the entire codebook. So this prompted these to temporarily suspend FF3. However, this attack is not applicable to a generic five-store structure. And it is to easily fix with a heretic performance by restricting the twig spikes. In a different direction, in my prior paper with Nier Boulary and Stefano Tecero, we show a message recovery attack on a generic five-store, which applies to both FF1 and FF3. This attack can recover just a single target message. But it shows some inherent weakness in niche standards, meaning that on a small domain, you need more routes to be a PSQ. The two attacks I just show are so expensive, they are only practical on small domains. Moreover, it's still quite questionable about whether it's practical to deploy them. For example, in a DV attack, the FF3 needs to make several adaptive queries to an encryption oracle. But in practice, it's kind of hard to mouse adaptivity. Moreover, they need several chosen messages per twig. But many companies prefer to encrypt not so many messages per twig to increase the perceived security. And by doing so, they unwittingly defeat the DV attack. On the other hand, while the VST attack is not adapted and needs only three messages per twig, it requires a very strong correlation between the known messages and the target. In particular, a known message must have the same right half as the target. But it's unclear how to enforce that in practice. The idea of none of these attacks seems applicable to FNR, although it's just another generalized phisal structure, which makes FNR have an interesting alternative to niche standards. However, today, I will show an attack on the phisal structure that applies to both FF1 and FF3. And our idea actually can be recast to break FNR as well. While our attack is somewhat similar to the VST attack, we need no correlation between the known messages and the targets. And by reducing the known messages, we can recover multiple targets instead of just a single one. In addition, we only need just a more or a number of known messages per twig. So it seems that our attack is quite deployable. As the prior attacks, however, our attack is still quite expensive, so they are only practical on small domains. So it's a graphical illustration for the cost of our attack for FF1 and FF3, where cost is measured by the number of self-attacks per target. Our attack also highlights an interesting difference between the desired FF1 and FF3. So both of these schemes prefer to use balanced phisal whenever possible. So doing that, ideally, we want the left half and the right half to have the same size. But if the message line is odd, one half has to be bigger than the other. So in that case, for FF1, the right half is bigger than the left. And FF3 chooses to go the opposite direction. So at the first glance, this guy at the desired choice appears innocuous. It shouldn't harm security. But in our attack, it appears that FF3's desired choice is inferior. In particular, in all domains for FF1, our attack has just roughly the same cost as the BST attack. But for FF3, our attack is quite better than BST. So far, as you've seen in previous slides, we have known plaintext attacks for FF1 and FF3 that is practical on small domains. For the DTP scheme of protagacy, we can do even much better. We can even launch a self-attacks-only attack that is practical on any domain. In particular, if you want to recover an encrypted credit card number, you need roughly 600,000 self-attacks to do that. In reality, it's even better because protagacy prefers to interpret a credit card number as a sixth one of affidynumeric characters. By doing so, the goal is to enlarge the domain to make attacks more expensive. That'll be true if you use FF1 or FF3. But for DTP, it only makes our attack 10 times better. So in particular, you now need only 53,000 self-attacks to recover a credit card number. So now I don't have time to talk about all the attacks. So I will just discuss the FF1 and FF3 attacks. But I will be happy to take off-lab questions about them, the other one. So we call that our attack is a known plaintext attack, meaning that we are given some random known messages, X1 to XT, together with their self-attacks under Q-twix, T1 to TQ. Our goal here is to recover all the unknown targets, Z1 to ZP, given just their self-attacks under the same twix as before. But because FB is deterministic to rule out trivial attacks, I will assume that the known messages and the targets are distinct. So before I get into the details of the attack, let me briefly recall a couple of things about five-store networks. So in this picture, it's a four-row five-store. But we will consider a general R-row five-store, where R is 10 for FF1, and it is 8 for FF3. So the domain here is a product ZN times ZN, where MN can be pretty small. So remember that there are applications where the product MN is just 100. And because the domain is non-binary, so instead of a regular XR, we would consider a general group operator plus. Now, the key idea in our attack is that when you encrypt using five-store, it exhibits some certain bias. It dates back to a paper by Pat Tarine in 1991, and was also exploited in the BST attack. So in particular, suppose that we encrypt two distinct messages of the same right half. Let's call them L-zero, R-zero, and L-prime-zero, R-zero. Now, let's take the left half of the cell attacks and study the difference. So it turns out that this distribution will pick at the point L-zero minus L-prime-zero. The gap between the pick column and the other column is so small. It's just a very small number delta to exploit directly. But if you have enough pairs of plain text over text, you can amplify that. Let me now show you how to use the bias to recover a target Z. Now, suppose that by magic, I can select a known message X such that it has the same right half as the target Z. Of course, this is just wishful thinking, but I will show you how to realize this missing step later. And because it is a known plain text attack, we have given the cell attacks as usual. So if we fail in this situation, we can recover the right half of the target previously by looking at the right half of the known message X. To recover the left half, we will do some frequency analysis plotting a frequency histogram. And thanks to the distribution that I just showed you a couple of minutes earlier, if you use enough twigs, it's likely that only one column, the pick column is likely to be the point left Z minus left X. In particular, there's a certain threshold based on the bias delta and Q so that it is the only column exceeding that point. So by looking at the pick column of the histogram, you can recover the left half of the target. Now this graph is nice, but we still have to realize this missing gap. To fill in the gap, we will narrow down the set of known messages by selecting messages Y1 to Yn so that the right half is covered in the entire Zn. So because it covers everything, at least some way I must have the same right half as the target, but we don't know which one. But now before we pinpoint the correct one, let's take a step backward and look at what we just did. The selection is not always possible because the known messages X1, Xt are random. So on the one hand, we need t to be small for efficiency. On the other hand, t needs to be big enough so that the selection is possible with high probability. So you need to pick a sweet spot here. So it turns out that this is just a well-known coupon collector problem where there are n types of coupons. So you go buy t coupons at random and you hope to collect old n types to tie probability. So in the classical setting, the coupons will have truly random types. So you are recommended to buy about n log n coupons. But in our setting, the coupons are the right half of the known messages. Because these known messages are distinct, so X type will buy a coupon. It is slightly biased toward the new types that you never have. So the number of needed coupons is slightly smaller. So now remember that we are given n messages. What do I n? And exactly one of them, we have the same right half as the target, z. But now we need to pinpoint that particular known message. In order to do that, for X, Y, K, we plot the frequency histogram as before. If Y, K happens to have the same right half as z, then as before, there's only one column beyond the threshold. In contrast, if Y, K doesn't have the same right half as z, then it's likely that no column would ever surpass the threshold. So by looking at the histograms, you can pinpoint the correct known message of the same right half as the target. Now this attack is only possible if the tweak number Q is speaking up. But how big is speaking up? So here we have a lower bound for the recovery rate if you use Q twigs and want to recover P targets. And here's the illustration for that bow for FF1 and FF3 where you want to recover the entire codebook. So we actually run some experiments on FF3 and the empirical results are even better than the theoretical analysis. Even if it's quite smaller values Q than the suggested, we can have 100% recovery rates. Now let's look at what we just done. So we are given some random no messages. And then we show how to recover all targets. But in reality, no messages might not be uniformly distributed. But we still want our attack to work in that particular case. Of course, if the distribution is not nice, it might be not possible to recover all targets. But we want to recover as many targets as possible. So in fact, we can recover every target Zi as long as that target has a right half covered by some known messages. In order to do that, we will first tell the set of known messages to Y1 to Ys of the distinct right half, such as the set of the right half remains the same as before. Then if you want to recover a target Zi, for every Yk, you would use a pre-course histogram to check if Yk has the same right half as Zi. If yes, you would recover Zi as before. If you cannot find just Yk, and you simply announced that you failed to recover it. So in summary, today I've shown you some practical attacks on several FAA schemes. To deal with them, if you happen to use FF1 or FF3 in tiny domains, you should use double encryption as suggested by ANSI. And for the DTP scheme, it is completely broken, so you should avoid it at all costs. At least, thank you.