 Hi, my name is Benoit, and I'm going to talk to you about CTET+, a beyond-birthday-bound secure tweakable and ciphering scheme using a single pseudo-random permutation. This is a joint work with Jordan, Virginie, Bianck, Jo-Yong, and Marine. In this work, we're interested in the problem of full-disk encryption. In this context, we want to encrypt data that is divided into chunks, or sector, of a size that is a few hundred bytes to a few kilobytes. The problem that we have is that there is no room to store additional data, like an ounce, or a random IV, or even an authentication tag. So we have to work around this limitation, and what we want to do is to encrypt each sector independently. Of course, having a different key for each sector is very impractical, so a good solution is to use a tweakable block cipher. Tweakable block ciphers are a key primitive that generalize standard block cipher by taking an additional input that is called a tweak, and whose goal is to bring variability to the block cipher. We can assume that T is public, or controlled by the adversary, without decreasing the security of the tweakable block cipher, and we want that a secure tweakable block cipher gives us a new independent permutation every time we pick a new tweak. We do have several examples of natively tweakable block ciphers, like the deoxys block cipher. However, for the problem that we have, these tweakable block ciphers are not sufficient because existing ones use a small size of blocks, like typically 128 bits, which is not enough to encrypt a whole sector as a single block. The solution is to use a tweakable mode of operation for a smaller primitive, like a block cipher or a tweakable block cipher with a small block size, and in that case, the simplest construction is to use a tweakable block cipher, and in order to encrypt each block of a sector, M1, M2, M3, you just use as a tweak the concatenation of the number of the sector, that is, I, and the position of the block in the sector, so 1, 2, 3. This simple construction does have some advantages, first, it actually provides independent encryption between sectors, and it's also very fast, as every block in a sector is encrypted independently from the others, so it can be decrypted in a single tweakable block cipher course, which is very efficient. However, it does provide the adversary with a good granularity. For example, assume that an adversary is able to monitor the changes in the ciphertexts. If only a single block is updated, for example M1, then a single ciphertext block will also be changed, which can allow an adversary to monitor very closely what kind of changes were done in the plaintexts, and this can be bad from a privacy point of view. The current standard for this encryption, which is AES-XTS, uses this construction combined with the XCX transformation applied to the AES. Note that XCX has a security up to the birthday bound, which means that the construction is secure as long as the number of current blocks is small in front of to the 64. AES-XTS uses two AES keys, K1 and K2, and if you want to encrypt the J's block in the ITH sector, you first encrypt I using the key K2, you get a mask that you multiply by alpha to the power J, where alpha is the primitive element of GF2 to the N. This new mask is sold to the plaintext MJ, then you give that as an input to AES with the key K1, you absorb the mask a second time, and you get the ciphertext CJ. This construction actually has two issues, the first one is the small granularity issue that I mentioned in the previous slide, and the second one is that big data centers most likely hold more than 250 bytes, which is very close to the birthday bound of to the 64. The solution to the granularity issue is to use wide tookable block ciphers, which use whole sectors as input blocks, even if they are based on a tookable block cipher or a block cipher with a small block size. Using such an algorithm, if you have for example a single bit change in M1, then all the ciphertext blocks C1, C2, C3 and so on will be affected, unlike before where only C1 would have been changed. Several such constructions already exist, and they can be divided into three families. You have encrypt mix encrypt, hash encrypt hash, and hash counter hash. Roughly, for a hash encrypt hash and hash counter hash, you first give your input sector as an input to a key hash function that also takes as input a tweak. You get a new sector that is encrypted either with the ECB mode of operation or the counter mode of operation, and this new sector will be given to a new hash layer that also takes as input that week, and you get the corresponding encrypted sector. For the encrypt mix encrypt family, it works in a similar way, but you first encrypt the sector with the ECB mode of operation. Then you have a linear mixing layer M that also takes as input the tweak, and you have another ECB encryption layer. All those constructions require either two AS calls, or one AS call and two field multiplications per block. And they also have in common the fact that they are secure up to the 64 queries, so the both they aren't. These constructions all solve the granularity issue, but they do not solve the birthday bound issue. If we want to get a security beyond the birthday bound, we actually have to increase the number of layers, and the two round SPN has already been used as a tweakable domain extender flow for block ciphers. It can be seen as an example of the hash encrypt hash encrypt hash paradigm, and it has already been proven secure up to the two to the two and divided by three queries, as long as t and t minus one satisfies some properties, namely, we want them to be almost super block was universal and uniform, which we shorten to SBU. In more details, it means is that the probability of the random choice of the key K to be able to fix any output block of t to some predetermined value u should be close to two to the minus n. And similarly, the probability of the over the random choice of the key to be able to create some collision between any two output blocks of t should be close to two to the minus n. We already know of an example of efficient linear tweakable SBU layer, which is the following one. So in this example, t uses two 128 bit keys K and K prime, one 128 bit tweak T, and one W times 128 bit sector X that is divided in W blocks, X1, X2 up to XW. And in order to evaluate T on the inputs, T and X, you take the scalar product of a vector of growing powers of the key K, K squared up to KW, and of the vector of all the X, X or T values. This scar product is sold to all the blocks X1, X2 up to XW. And then another value that depends on K prime and is multiplied by the power of alpha, a primitive element of GF2 to the n, which gives us the output value of T. Note that this construction is invertible as long as the XOR of the powers of K, XOR 1 is different from 0. And as long as this is the case, the probability of collision is smaller than W plus 1 divided by 2 to the n minus W, so very close to 2 to the minus n. And this construction is linear. And it requires roughly one finite field multiplication per block, which is actually quite efficient. Our first contribution in this work is to introduce CTET plus, which is an optimization of the two-round SPN that I presented earlier. We'll reuse the same permutation in both non-linear layers of the construction. And then we also remark that it's possible to use a much more efficient linear middle layer instead of using T for the three linear layers. Namely, we show that in the middle linear layer, T can be replaced by a much simpler linear layer L, which doesn't rely on a keyed matrix multiplication, and that the resulting construction is still secure up to 2 to the 2n divided by 3 queries as long as both T and T minus 1 are SBU. I won't go into the details of the security proof, but as usual, it relies on the H-coefficient series technique, and basically it boils down to comparing the interpolation probabilities between the construction and a uniformly random trickable permutation. Assuming independent S-boxes, the strategy works in two steps. First we show that with a high probability, it's impossible to get a collision, both in the input of the first linear layer S1 and the output of the second linear layer S2. And under this condition, we simply lower bound the number of possible intermediate values for the outputs of S1 and the inputs of S2 and show that the number is sufficiently high to make the interpolation probabilities close enough. Since we are in the single permutation case where S1 is equal to S2, it's actually possible that an input to the first S-box layer is equal to an input in the second S-box layer. So we have to take care of this internal collision when computing our interpolation probability, which makes the computation slightly more complicated when compared with the independent S-box case. If you want more details on the proof, you can have a look at the full paper where you can find all the details. Our second contribution is the design and implementation of a concrete instance of CTET+. With this in mind, we can identify three different strategies to get our instance. The first one is to simply use the full AES128 bit with all 10 rounds. In that case, we can easily claim 85 bits of security by using a simple security reduction and the generic security bound. A more aggressive option would be to use a round-reduced version of AES that has roughly the same security as the final construction, so that would be roughly seven rounds of AES. And the third option would be to directly prove the security of the full construction using a round-reduced version of AES with strictly less than seven rounds. We actually chose to follow the third strategy and introduce our actual instance, AES6-CTET+, which combines the CTET+, construction with a round-reduced AES with only six rounds. Overall, our construction takes as input a 128-bit tweak, a wide block of W128-bit words and relies on six 128-bit keys, two for each T-linear layer, one for the L-middle-linear layer, and one for the AES box. Our security analysis allows us to claim 127 bits of security for AES6-CTET+. We rely on two complementary arguments to justify our claims. First, we have the generic security proof, which justifies the fact that the generic structure of the instance is sound, and it proves that the construction will resist generic attacks with very high probability. Second, we provide dedicated crypt analysis to justify why our instance is secure even when the S-box uses only six rounds of AES. As far as crypt analysis is concerned, we identify two main possible attack directions. The first one is to use a weakness of the S-box and to extend it to the full construction, and the second one is to directly attack the structure of our instance. The first strategy is difficult to apply for two distinct reasons. First, the AES has very strong arguments against basic attack vectors. For example, even four rounds of AES give significant security against differential, linear and algebraic attacks, so extending these basic properties cannot lead to a break of our instance. Moreover, even if there exist attacks that are able to break six rounds of AES, those attacks actually require the ability for the adversary to set either the input or the output of the AES block cipher, which is made difficult by our first and last linear layer as we are going to argue. Indeed, given the fact that the KK prime 0 is simply sold to each block of a sector, it is easy to see that setting a specific value for S-bits of the input of an S-box requires guessing the corresponding bits of K prime 0. Moreover, thanks to the matrix multiplication, the only way to set a specific difference at the input of an S-box with probability 1 is to use the same input difference for all the S-boxes. This means that if an adversary wants to exploit different input patterns that have high probability to go to the next round, it will first have to guess the key K0. These two facts either make attacks against the AES boxes much more expensive by requiring a key guess, or they greatly restrict the different strategies that an adversary can use in choosing state differences. All in all, this prevents the extension of attacks against the AES boxes to the whole construction. Let us now take a look at two structural attacks, namely yoyo attacks and truncated distinguishing attacks. Let us start our discussion with yoyo attacks. Indeed, when we remove the outer linear layers, our construction boils down to two rounds of SPN, which can be distinguished in four queries with the yoyo attack. So, how does this attack work? We start by choosing two different plain texts, P0 and P1, with a specific zero difference pattern. For example, only the first blocks of P0 and P1 will differ, and all the remaining blocks will be equal. And then we ask for the corresponding ciphertext. Then we swap some words between both ciphertext C0 and C1 to get two new ciphertext, C tilde zero and C tilde one. And finally, we ask for the corresponding plaintext P tilde zero and P tilde one. It can be seen that P tilde zero and P tilde one will automatically have the same zero difference pattern as P0, P1. This can be seen as follows. The only difference between C0, C1 and C tilde zero, C tilde one is the swap of some full words, which means that after applying S minus one to both C0, C1 and C tilde zero, C tilde one, we will still have the same relationship than before applying the S-box layer. Similarly, thanks to the linearity of L, after applying L minus one, this will not break the zero difference pattern between C tilde zero, C tilde one and C zero, C one. And this same relationship will still be preserved after an application of the S-box layer. This means that P0, P1 and P tilde zero, P tilde one will still have the same zero difference pattern. This argument can easily be extended to a version with one more linear layer L. Let us see what happens when we try to apply this technique to our construction. First, we have to build a pair of messages with a specific zero difference at the input of the first AES box. As we have seen earlier, this requires a guess on the key to the first linear layer. And besides, when we want to swap some safer text words, this also requires a guess on the key to the final linear layer. Or in all, the fact that the two outer linear layers use secret key actually saves us from your attacks. Let us now quickly discuss what happens with truncated distinguishing attacks. First, let us remove the first linear layer of our construction. Let us pick two plain text blocks. The first one starting with X1, X1. And the second one starting with X1, XOR, Delta. X1, XOR, Delta for some delta difference from zero. And such that the remaining words are the same between both messages. When applying the first AES box layer, only the first two S-boxes will be active. And we can propagate the difference to a new one, delta prime, that will be common between the first two words. After applying the L linear layer, we will still have the exact same difference patterns, albeit with a new difference, big delta. Which means that once again, only the first two AES boxes will be active. After the final application of T, we can see that we'll have the same difference on the last w-2128-bit words. This allows us to distinguish our construction from a random permutation with high probability because this pattern occurs with probability one in the case of our construction and with negligible probability for a random permutation. However, this attack requires the ability to set a very specific difference pattern at the input of the first AES box layer, which, as we have seen, is prevented by the first linear layer because it requires a key guess. So once again, the linear layer protects us against truncated differential distinguishes. As a conclusion, I would like to quickly compare our construction with existing ones and discuss a bit about our implementation. As far as generic security bounds are concerned, only two constructions are beyond the birthday bond secure, our construction CTET+, and also the tweakable two-round SPN construction that I mentioned during the talk. We have implemented both constructions using six rounds of AES as the S-box, and we can see that CTET+, is roughly 0.4 cycles per byte faster than the two-round SPN that was mentioned during the talk. This improvement mainly comes from the fact that we replaced the middle linear layer by a much simpler one. If we instantiate CTET+, with the full AES, we can see that the resulting construction, although beyond birthday bond secure, is much slower than other birthday bond secure schemes such as EME, XCB or TET. However, replacing the full AES by six rounds of AES in CTET+, allows us to be roughly as fast as EME, XCB and TET. Finally, if we compare AES 6 CTET+, with XTS, we can see that XTS is roughly twice as fast as our construction. However, XTS is only secure up to the birthday bond and it is also insecure as a Y2 cable block cipher that uses all sectors as a block. Finally, let me highlight a few remaining open problems. First, we currently do not handle variable input length and partial blocks. While this is not an issue for this encryption, doing so could open up new applications for our scheme. Second, it may be possible to improve the performance or to reduce the amount of key material of CTET+, by switching to a more efficient or more simple SBU layer. Thank you very much for your attention. If you have any questions or comments, feel free to contact us by email.