 Welcome to this presentation my name is Tim and in this video I'll be talking about new attacks on format preserving encryption standards using linear cryptanalysis. So format preserving encryption allows you to encrypt the plaintext with a given format for example the fact that it's a six digit integer to a ciphertext which shares the same format. So an example of that it might be the encryption of credit guard numbers so for example applied to the six middle digits of that number and the reason for using format preserving encryption scheme in such cases is often that there's a legacy system that doesn't allow for example handling larger than six digit integers. Of course the problem with this is that there aren't so many six digit integers so you're essentially doing encryption on a very small domain and that would lead normally to simple code book attacks. So because of this we need a tweak which is an additional public input parameter that allows you essentially to diversify a block cipher. So every time you take a different tweak you get different block ciphers which should look ideally independent. There are standards for format preserving encryption so in the United States there's FF1 and FF3 so there's a NIST standard and in South Korea there's the FIA family of block ciphers and the attacks in this paper will apply to FF3, FIA1 and FIA2. So before describing these attacks I need to go a little bit into the details of how these block ciphers are constructed. So both are FISAL ciphers so in the case of FIA1 it's a very classical FISAL cipher where both branches are M bits and the branches are added using exclusive OR. In the case of FF3 this addition is an addition modulo and integer n. So this integer n is such that n squared is a domain size. The most important thing about both of these ciphers is that the tweak can be split into two parts which I'll call the left and right parts the L and the R so that these are used in an alternating way. So for example all the old rounds might use the left half of the tweak and all the even rounds the right half and this will be actually the property that enables these attacks. One more thing to note about this is that the functions F1, F2 and so on will for the purposes of this dog be considered to be uniform random functions. These are key dependent functions. In practice they have a specific construction for that. So just for context they have of course been a few attacks already on format preserving encryption schemes. The first class of such attacks are generic attacks on our round FISAL ciphers with a small domain of size n squared and even at the time of the design of FF3 it was already known that you can distinguish such cipher using n to the power of r-4 data. And this was later on extended to a message recovery attack which is of course a more practical threat than just a distinguisher. Of course there has also been a lot of follow-up work on this to improve these attacks in various settings such as multiple targets. A second class of attacks are dedicated attacks on FF3. So these attacks start with the work of Durek and for the form Crypto 2017 where they show that there's a flaw in the tweak schedule of FF3 which allows a kind of slide type attack and this attack too has seen a lot of improvement and the latest iteration of that will be presented at Eurocrypt this year. This of course did respond to these attacks so they modified the tweak schedule for FF3 so this is why they revised the standard and now it's called FF3.1 and this modification essentially eliminates completely this flaw in the tweak schedule and so these dedicated attacks are no longer a concern. They also require that the domain size is at least one million and this doesn't prevent the generic attacks but it does make them less practically relevant. The attacks in this work are generic in the sense that they apply to Faisal ciphers in general but I will use the specific property that the tweaks alternates and the data complexity of these attacks will be about a square root of the data complexity of previous attacks so that's quite a big improvement and in fact it will show that this minimum domain size requirement of one million is no longer sufficient. So I will first start with a high-level overview of the attack which doesn't require any knowledge of linear cryptanalysis and after that I'll go into more details. So the basic property or one of the basic properties that the attack relies on is a property of small uniform random functions so suppose you take a sample or your sample uniform at random a function which takes four bits to four bits so from the set of all such functions where each function has the same probability of being sampled and then you look at the output distribution of that function so by output distribution I mean for each output how many inputs are there that map to this particular output and I went ahead and did that and created a histogram and one thing you should note here is that this distribution this output distribution doesn't look very uniform at all. Now if the function had been a large function so working on a large number of input bits then it would have been pretty close to uniform output distribution but for small functions typically isn't because there is a lot of variance and this is a very important observation that will be exploding in the attacks. Now another way of thinking about this which will be not so important for the high-level overview which is important for the linear cryptanalysis is to think about the correlations of linear combinations of outputs of this function f. So by linear combination I mean the xor of several bits of f as indicated by a mask u. So if you then look at this correlation so which is twice the probability that this linear combination is 0 minus 1 half so sort of measures how biased this is then you see that there are several of these correlations that are actually quite large and that will be used in linear. So the idea behind the attacks is that we can fix half of the tweak so let's say we fix the left half of the tweak and then now f1 is a pretty small function because it only operates on the domain size or half domain size which is typically small for format preserving encryption. So because of what I've just explained the output of f1 will then be pretty non-uniform but so by again this is the output if the plaintext is uniform random and the right half tweak is uniform random and then after two rounds the cipher takes on the left half plus the plaintext on the left half which is actually this output of f1 must therefore be non-uniform. Of course for the actual attack we need to be able to iterate this over multiple rounds and we also need to somehow quantify what we mean by non-uniform and that's what we were going to use linear analysis for. So from this point on I will assume that you have some basic knowledge of linear cryptanalysis and because this makes it easier to explain the detailed analysis. So I will focus on ff3 works essentially in the same way but it uses additions in another group and this requires a slight generalization of linear cryptanalysis. So the attack on f1 is based on a simple two-round iterative trail which mirrors the intuition that I've shown some slides back and so here I show this for a mask U. Now the correlation of this trail well in the second round it's 1 because this second round is basically inactive and in the first round well we don't know what the correlation is because that depends on the key which is used in f1 but because TL is fixed we know that f1 is a small function that looks like a uniform random one and so we know the distribution of this correlation and we know in fact that this is going to be roughly normal with a variance of 1 over n. So we could compute the distribution of the correlation and then conclude that it's quite likely that the correlation is large. For simplicity here I'll just work with the expected squared correlation or the variance of the correlation which is 1 over n for the first round and then 1 of course for the second round. So over our rounds we've got an expected squared correlation of 1 divided by n to the power of r over 2 and there is actually a simple improvement that we can make here because we can fix the right half of the plaintext and then in the first round essentially we just add a constant to the left branch and so we can skip essentially the first round and this increases the expected squared correlation by a factor of n. So data complexity then well this is sort of well known for linear cryptanalysis that it should be 1 over the squared correlation of the approximation that we're using but we cannot use it in this case because the correlation here isn't known we only know the average squared correlation but as a heuristic we might just plug in the average and then we get n to the power of r over 2 minus 1 as the data complexity and you can of course do a better and more accurate calculation of this data complexity. So here I show a plot of the maximum advantage so the maximum success probability minus false positive rate that can be achieved for a given amount of data. So in the case of vr1 you can see that the zero here on this graph corresponds to n to the power of r over 2 so 6 minus 1 data so n to the power of 5 data. What you do see here however is that even for a large amount of data we don't actually achieve an advantage of one and this is because we might just have bad luck in choosing the mask so here we just choose some mask and then it might happen that for some keys this doesn't have a large correlation. The only thing we know is that it has quite a large variance so there must be some keys that have a large correlation but some of them might not. Of course we can just use multiple masks to resolve that and this will also improve the data complexity a bit. That's the idea behind the multi-dimensional linear distinguishes. So in a multi-dimensional linear distinguisher we simply use all linear approximations that we have here so that's for any mask you and I talk about multi-dimensional rather than multiple in-linequip analysis here because the set of all these masks is also a vector space and this will have a nice consequence of for how the distinguisher can be actually implemented. So instead of estimating the absolute correlation as we did in the simple linear attack we can now estimate for example the sum of the squared correlations and then compare this to some threshold for the distinguisher and the data complexity of this approach is going to be about a square root of n divided by the sum of those squared correlations. Again we don't know what the sum of the squared correlations is we only know its average so heuristically we might plug in the average and then we get that this will have a data complexity of about n to the power of r over 2 minus 1.5 which is squared of n better than than the simple linear distinguisher. The square root of n here that factor is because we don't know the actual fix key correlations if we knew them we would be able to actually do better. So an equivalent way as I've mentioned because the masks form a vector space there's a nice equivalent way of describing this and that's basically the guy's squared distinguisher. So what we do here is just like before we query the cipher with fixed right half plaintext and fixed left half tweaks and then we create a histogram of the left half of the plaintext exorted with or added in general with the left half of the ciphertext and while we don't know what this distribution will look like because that depends on the key and so on we do know because of the multi-dimensional linear distinguisher that this must be a non-uniform distribution and in fact the property we can use here is that the Euclidean distance between this distribution and the uniform distribution is equal to the square root of 1 over n times the sum of the squared correlations and typical method in statistics to test if a distribution is uniform is based on Pearson's guy's square statistic which essentially computes the square Euclidean distance. So we know more or less the expected value of that statistic and then so because of this we can easily do hypothesis test using this guy's square statistic and data this will essentially be the same distinguisher as the multi-dimensional linear distinguisher as I described it on the previous slide. So again you can make a more accurate calculation of the data complexity and which advantage you reach for given data complexity so this is an example for the for FF3 with the full number of rounds and the minimum domain size of 1 million and the red dots here are experimental verifications so as you can see now we can get two advantages of one quite easily because we're now using all the masks this would also be true for smaller n. So now that we have these distinguishes it would be nice to be able to turn that into a message recovery attack because that has a more larger practical impact of course. So the goal of a message recovery attack is of course to recover a secret message and we'll do this given the ciphertext of this message or related message in the several tweaks and for the attacks here to apply we need the left half of those tweaks to be the same and in fact we're not just going to get a single output like this is the secret message instead we're going to get a ranking of candidate plaintext. So for example if you have a one byte message that you're trying to recover then the attack will work something like this that will be for every value of this one byte that's statistic that's going to be computed very similar to this guy's rare statistic and then this list of candidate plaintext is going to be sorted by this test statistic and the top portion of that list is sort of the most likely candidates for the plaintext. In practice you can then sort of discard most of this list except for fraction pf and then with some probability of success the real message is in that fraction and so we call the advantage of the message recovery attack the difference between pf and pf. So the attacks in the next slides that I will discuss are left half and right half message recovery attacks so that means they don't recover the whole message in one go but they recover half of it but of course you can combine those attacks to get the whole message. So the left half recovery attack works by first for a known plaintext pl with again fixed right half fixed left half of the tweak and variable right half tweak just estimating the distribution of the left half of the ciphertext and then we do the same thing for the target message so again the target message is the secret message that we're trying to recover and it can always be written of course as this known plaintext pl plus a certain unknown difference delta and we're also going to just estimate the distribution of the left half of the ciphertext and the crucial observation here is that the distribution of this left half of the ciphertext is approximately the same as the distribution of the left half of the ciphertext for the known message except that it's translated by this difference delta. Now this isn't exactly true but it's approximately true under the piling up assumption or equivalently that there's dominant trail that I've given is dominant so actually one way of thinking about is that for every mass q the correlation of the target for the target message is the same as the correlation for the known message up to a sine that sine is minus one to the power of u transpose delta and so this gives this shift by a difference of delta. A second important observation to make this attack work is the multidimensional linear distinguisher because of this we know that the ciphertext isn't uniformly distributed and that means that we can actually from those estimates of those distributions we can actually recover this shift delta. If the distributions were uniform then of course you wouldn't be able to uniquely identify that delta. So this attack requires about n to the power of over two minus one point five data so that's the same as the multidimensional linear distinguisher. So again you can do a more accurate calculation of the data complexity which more or less agrees except as you can see here in this example for FF31 as the amount of data you use approaches this dashed line. It seems there is a discrepancy here between the theoretical model and the experimental results and this is because the vertical line is essentially the maximum amount of data that can be used because of the short tweak length in FF31 which was also shortened to fix this previous attack on FF3. So because of that you cannot use this attack for large values of n so let's say n larger than 2 to the 12 or something although this of course depends on the advantage that you want to achieve but this is also only when you cannot when this PL must be fixed which is not necessarily true actually. But for right-half recovery we don't have this issue but right-half recovery requires a little bit more data so the principle here is the same as for left-half recovery but the difference is that we will recover rather than the difference of plaintext directly we recover a difference of outputs of F1 evaluated in a known and target plaintext and this doesn't allow us to get to the target actually but if these outputs are equal so if the difference that we recover is zero then very likely the known plaintext was equal to the target plaintext so we have recovered the target plaintext so you can do this for every possible known plaintext of course the tweaks here are different so it's actually a meaningful message recovery which then requires more data of course than the left-half recovery but it doesn't have many limitations on n because the fact that now also the left-half of the plaintext is variable which gives a lot more data that can be used for the attack so then again you can compute more accurately what's the what's the maximum advance in terms of data complexity here again shown for FF3 so to conclude I have presented new attacks on FF3 and fear these attacks require about the square roots in data complexity of previous attacks in the paper I also give a text on fear to which require about the cube roots of the data complexity of previous attacks and I also show how the message recovery attack on fear one can be turned into a key recovery attack which is of course an even larger concern in practice the practical relevance of these attacks of course depends on let's say how exactly you're using these ciphers I do want to mention though that in many cases these attacks will be practical or at least can be for example getting data with half of the tweak fixed is actually not that difficult a very common way of using a tweak is to put in a counter that automatically give this also want to mention that for the graphs have shown you don't want necessarily an advantage of one very often high false positive rate is perfectly acceptable for example thing back to the credit card case here we have a checksum digit at the end which essentially reduces the effective domain size so you don't for the message or you don't need to have a very low false positive rate in general I would say be very careful when you're using these standards you need to check every application and I would actually avoid them until they are fixed so prefer ff1 over ff3 and fear so in particular do not rely on the ciphertext looking indistinguishable from uniform random and avoid the smallest domain sizes because that's where the attacks are most practical of course NIST had this requirement already over minimum domain size of 1 million now with these attacks it's clear that this is no longer sufficient you'd have to actually square that if you don't fix the tweak schedule and so that that's a real problem as a future work I think it's worth looking at alternatives to fights to bait from encryption schemes because due to these but also previous attacks actually these become quite inefficient and there should be easier ways to achieve the same functionality if you want to test the attacks yourself or compute exactly how much data you would need for your application then you can find the source code for everything of all the experiments and all the attacks in this paper at this link that's shown here and if you have any questions about that or anything else related to the paper please send me an email