 Hello, everyone. My name is Betul Durak from Microsoft Research and today I'm going to talk about the construction of format preserving encryption. We named it as FAST. It's a joint work with Henning and Michael Horst from Comfort in Germany and Serge Wodenin from ETFL. Format preserving encryption or FPE in short is essentially a deterministic block cipher. It encrypts the message from domain D to the same domain and the domain is defined with the format and in this slide I'm showing you an example of four digits and the format is defined with the length of the message. In this case it's four and with an alphabet in this case it's a set of digits so it is Z10. But there could be some other formats that we may see that we may observe in real databases such as social security numbers or credit card numbers or even smaller domains or a bit larger domains than this. So the idea in general is that when we would like to lift the security of a database by encrypting it we would like to keep the format of the data which is encrypted because of transparency reasons. So when there is an encryption which keeps the format there will be no need for applications to be rewritten or no schema changes will require when the security is updated lifted. So one way to think of the format preserving encryption is to use the traditional block ciphers like AES because AES is strong. AES has been analyzed for a long time it's very fast it has made very fast and it has everything we want. But then the problem would be the keeping the format in the suffer text because if you want to keep the format in the suffer text we need to truncate it and then then we cannot decrypt it back. Another idea could be to use the counter mode because we won't need to decrypt back if we can truncate it as we want but then it would require the encryption and decryption to sync and maintain a set of counters which is which is not so pleasing. And instead a big line of work big line of research has been proposing lots of different FPE constructions. The most notable ones are from NIST which are standards now they are called FF1 and FF3 and they are based on Feistel networks and if you may remember the Feistel networks are consisting of some rounds and in the specific case of FF1 and FF3 the round functions are defined with AES. And since the number of rounds are quite small for both cases for both standards it's practical and the security has been already analyzed by many cryptographers. In our work we designed a new form of conservative encryption which has a new taste, a different taste. It still consists of rounds but then the rounds are not traditionally defined with the AES round function instead it is defined with several layers of permutation boxes as we will detail in a short while. And where the secret keys are used is then to in the setup secure setup phase we generate a pool of S-boxes that will be used in those layers and these S-boxes will be generated with AES. And our other contribution is that we analyze the security of this such a construction and we formally proved that we can reduce the security from strong security to weak security and then it allowed us to analyze the crypt analyze the scheme only in the weak model with several different attacks that we instantiated. And then finally we adjusted our parameters for the construction such as number of layers, number of rounds etc with a security margin to protect, to provide good security. So before getting into the details of our construction let me say a few things in general about the FBE. FBEs are generally are tweakable in the sense that because of the domain our domains are too small there's a big probability that the ciphertext will for the same message it there's a big probability that two different entries, two different records in the database for corresponding to two different users will be having the same plain text and they will encrypt they will encrypt to the same ciphertext in which it leaks and therefore we introduce a tweak and this tweak essentially is very important for a domain separation. And on top of that which I don't show here is that the FBE takes the format as part of the input as well and it is a desire to have it in the input. So I would like to show you here the two different instances, two different styles of format presuming encryption. On the left we had the FISTL based FBE. These are only toy examples I didn't draw the entire FF1, FF3 or neither our construction it is just for illustrating the differences. On the left we have a message, we have a FISTL based FBE and a message is divided into two halves and these two halves are entering into the branches and there is a round function and each round is defined with the input with a key, a tweak and tweak changes for each round. And on the other hand we have a substitution network where we have the plain text which has length L and each character in the message is going to be a registers and each of them will run by one entering a substitution boxes. It's essentially a substitution network. And in here essentially the rounds on the FISTL based constructions are defined with the secret key, AES with the secret key and the tweak and in our case there will be no secret keys and no AES computations it will be only the substitution boxes of several layers and AES will be used in the secure setup for key derivation. So the parameters that we have that I will need for the rest of this talk are as follows. The format I already mentioned is defined with the size of the alphabet it is A and the length of the messages and in this case we can imagine that alphabet is ZA. It's a set of integers from 0 to A minus 1 and we also have another parameter called M which is the total number of S boxes to choose. And essentially the S pool is generating M many S boxes and in practice typically we can think of this M as 256 because we will need the index of each of these S boxes when we want to use it in the layers and it would be nice if we can represent them all with one byte. So this is what M is. R is the number of rounds and N is the total number of layers and N is set to L times R because each round will consist of the same number of layers and because the length of the message is L then we see that N is equal to L times R. So what is important here is that how to pick the R, what would be the R for security that we will be analyzing the paper and with crypt analysis results. So the one instance that I can show you here is that L is equal to 4 meaning that the message has length 4 and it has three rounds and 12 layers. So one example of round is this. It consists of four layers because the length of the message is L and there is layer 10 which is essentially only the round operation and the substitution box that we show here. This probably explains what we mean with layer and round better. So let me get back to the setup phase where we use the secret key. The secret key master secret key is used for key derivation and this key derivation works with the set of parameters, two different set of parameters. We separated these parameters because we wanted some flexibility as I will describe shortly that in the first set of parameters we have only the length of the as a size of the alphabet and M, the number of S boxes in the pool that we generate. In the second set of parameters we have the L, N and W1 and W2. I will not explain W1 and W2 yet, just yet. I will describe it with a picture in the following slides. I just assumed that those are two different numbers of parameters. So what happens is that then the key S, there is we drive a key key S from secret from master secret key K an instance one and only instance one with a constant and this means that there will be this key S will be drive the differently whenever the instance changes, instance one changes. It means that whenever the alphabet size changes it will be drive, it will be computed, recomputed but as long as there is no changes in the size of the pool and the size of the alphabet this won't be this won't need to be computed again. And key sec is going to be generated with the both instance one instance two as well as the tweak and let me explain a little more where they are used. These derived keys KS is going to be used to generate the pool of permutations and key sec is going to be used to generate the index sequence to use to which indices should be used for layers for for S boxes for each layer. So that means that when the tweak changes or the size of the method changes in instance two then only the key sec part needs to be regenerated only the index sequence needs to be regenerated. Other than that we don't need to keep creating the pool of S boxes from scratch as long as there is no changes in A and M so this was the property that you wanted therefore I put the thumbs up here and this is essentially our way of doing the domain separation. In the S box generation more in a bit more detail we have the input secret key KS and this key S is used in the AES for PRNG and this PRNG is instantiated with AES encounter mode it's parallelizable it's nice and the permutations are generated with the Fisher Yates shuffling and Fisher Yates is essentially some sort of rejection sampling it's simple enough and S box index sequence generation is again taking an input secret key sec and using the PRNG as well as some domain parameters and then we instantiate this PRNG again in the implementation with AES in encounter mode but these two PRNGs in the S box generation and the index sequence generation could be completely different it's up to the implementer but we chose AES encounter mode for parallelizable features. The core design that we have is that it is going to I show you first with the one round with four layers though it is four layers because we have four substitution boxes here but I was I have been seeing this since the beginning but let me correct myself let me make it more precise more correct it is not essentially one S box that I'm using here it is two S boxes two same S boxes that we are using here and there are two arrows as you can see here one is entering to the round operation it looks like the round operation but essentially entering input to the first S box S1 and the second one is entering to the S box permutation it's essentially again entering to the S box but the second S1 so in here what is what we define this blue arrow and orange arrows are that they need to come from some registers here and we want to define the distance of these registers to the S boxes so in the first case the input to the first S box is called W1 and the distance in this example is one and the second distance to the second S box is called W2 and it is shown with with an orange arrow here the distance to the S box over here is two and these these parameters W1 and W2 are the same for all layers as you can see here except that sometimes they rotate because there's nothing on the left side of the X1 register so it rotates and also it keeps shifting so if it starts from X4 but then it comes to X1 and X2 and X3 and so on so this is the design for one round at a high level if if you wonder why we choose it this way why we define it this way it would be difficult for me to explain the reasoning in a few sentences instead what I could see is that we essentially didn't think start our design with just one S we with two S boxes like this one but instead we define something more like this without these W1 and W2 and we designed it in a way that the diffusion property will be will be sufficient for both encryption and decryption but unfortunately when we design it with only one S box and in a very simple manner it would be it would require a lot of number of rounds for diffusion to happen well for the decryption decryption everything was fine for encryption but for decryption it was difficult and therefore we tried several different versions of this construction and we're convinced that this is this is resisting to our attacks and this makes a number of rounds smaller and more practical and yet secure enough so this is the high level intuition I could give you and for the three rounds with four layers again we have the same round repeating three times except that none of the S boxes that we use in the first round will be same in the second round they won't be same they will be different and they will be picked from the with AS computation in the index sequence generation as of security let me start with the assumption we have we have the assumption that the adversary has access to the S pool all the S boxes are known to the adversary because it's it's a reason of assumption because it may leak from side channels and you want to turn we didn't want to assume any such any such leakage but instead you want to be safely say that pool can be leaked and then we define the strong security model with multi-target chosen formative capable PRP essentially the the idea here is that the there is an adversary playing with encryption decryption machine machines and each of these encryption decryption machines have different secret keys so therefore they are there it is called multi-target and and the adversary can also change the format or tweak when she's querying these oracles when she's playing with these machine names in the weak security notion the adversary is allowed to play again with the encryption decryption oracle but it is only for one key and then he the adversary must fix the format and tweak once for all queries that he's making to this encryption decryption machine so this is the notions that we define and we formally prove in the paper that there is a reduction from the strong security notion to the weak security and more a bit more formally essentially what we say is that there's a there's a there's a tau here the tau is the number of different keys in multi-target game and the q is the number of queries that that has been done and what we say here is that there are four adversaries a b c d and this we say that if the prf prng1 and prng2 are secure and if the advantages of b c and d are negligible in the games that they play then the strong security is going to reduce to the weak security it means that if someone can give an attack in the strong model then we can construct an attack in the in the weak model and this is essentially the high-level idea of our our proof and this led us to as I said earlier to to only focus on the weak weak model we we kryptonize the construction in this weak model meaning that by only by fixing the key tweak and and the the format and then according to our kryptonizes results whatever we found we doubled the number of rounds for security much so a little more details not too much we defined we designed many different types of attacks like distinguishes or linear potential or linear attacks and and so on all the details are in the paper and you're welcome to try even more or improve those attacks if if you want that would be nice for us but essentially the the outcome of all those attacks is that we were able to define what the w1 and w2 would be because intuitively when we first started the discussing the design then we would we would guess that this w1 and w2 would be naturally something like l over 2 the half of the length of the messages or l over 3 but we wouldn't guess that it would be a square root of l and then it can still be set to l over 2 w1 and w2 can still be set to l over 2 but then it would require a asymptotically l square number of layers that we need to use in the construction to be secure but if we pick w1 and w2 being around the square root of l then the total number of layers asymptotically can be picked as l to the 3 half and it would be enough asymptotically again there is a constant the term that we multiply it and in the small when the parameters are small it it does not reflect well but when the number of parameters get larger it will be more visible that we need n equals to order of l to the 3 3 and 3 half so in here I'll show you one set of parameters selection of parameters for a fixed sorry for a so for number of rounds and we have a different size of alphabets alphabets and we have different length of messages and l is the different length different lengths of messages from 2 to 100 and a is from 4 to 16 but you can find more parameters in the paper so just to give an example when we encrypt the full 16 digits of credit card numbers the l would be 16 because the message length will be 16 and the alphabet size will be 10 because there are digits we need 37 rounds of our construction and we did the experimented our construction and we compared it with existing ff1 and ff3 for both constant tweak and changing tweak the performance doesn't change for ff1 and ff3 if the tweak changes but as I explained in the setup phase if the tweak changes we need to we need to generate the sequence of indices again and that would affect the performance therefore we tested both and this we tested it for alphabet size being equal to equal to 10 and we observed that in all cases fast the construction our construction fast is faster than both ff1 and ff3 so maybe what is important here is that like probably it was known to many many people who are implementing that after the certain length of the messages suppose 32 the characters of the message the there is a big jump in terms of complexity for ff1 and ff3 and this seems to be known to the known to people who are implementing these designs but in our case there is no such blow up happens the problem the the actually what happens here with ff1 and ff3 is that because of the modular reduction they have in the round functions if you remember the round functions are round operations are modular operations and the messages are divided by half and therefore the modular reduction happens with modular a to the l over 2 and if l becomes larger and larger there would there would be a significant performance problem with with the big integer libraries that we that we are using to implement them and that creates a problem for both ff1 and ff3 and ff1 is essentially ff3 is essentially does not recommend us to encrypt any messages larger than 50 54 or 56 i believe but we just abused it and we encrypted up to 64 but ff3 does not actually provide larger larger messages messages so this concludes my talk thank you very much for for your attention and for listening to my talk and we would like to welcome everyone every cryptographer everyone who is working on format preserving encryption are open to different designs to crypt analyze our our construction the code is provided in this github link we also linked it in the in our paper please join us to crypt analyze it further