 Okay, hi everyone, I'm Huck, I work at Google on storage encryption and I'm going to be talking about wide block ciphers and Linux. So in storage encryption there are two primary challenges. The first is that the encryption modes that we have to use are required to be length preserving. This means that the cipher text has to be the same length as the plain text. This is important because we want to make sure that an encrypted disk can store the same amount of data as an unencrypted disk. Unfortunately, a length preserving cipher cannot also provide authenticity. So if we use a length preserving cipher, we can't automatically detect when our disk has been tampered with. The second challenge in storage encryption is that we can't store randomized initialization vectors on the disk, so instead we have to reuse other known data such as the sector number as the initialization vector or the logical block number as the initialization vector. This causes problems because if we write to a specific sector on disk both times when it's encrypted it's going to have used the same initialization vector. So if an attacker is to acquire the, like a snapshot of our encrypted disk at multiple different times they're going to acquire cipher texts of our encrypted disk that use the same initialization vector. So essentially they're going to acquire the encryption of specific sectors that use the same initialization vector when encrypting and with many encryption modes this reduces the confidentiality that we have. So for example with AES XTS mode which is the common mode that is used for disk encryption, if I flip a single bit on my disk sector it only corresponds to a 16 byte change in the cipher text. So if I have two plain texts with very long common substrings or I guess 16 byte long common substrings then if I encrypt them with the same initialization vector they're going to have common substrings in their cipher texts as well. And so this is a problem if we're encrypting disks and an attacker recovers two cipher texts from the same sector and those cipher texts had plain texts that were somewhat related and that maybe like somebody changed like a few bytes in that sector. They can, an attacker can essentially see that only a few bytes in that sector have been changed and they can kind of analyze access patterns, write patterns to the disk. So for example let's say I download this top image here to my disk and then at some later time I edit the image to add this hello text. So if an attacker takes a snapshot of my disk at both times independently the snapshots look completely random but if I were to zore them together an attacker can see that there's like this hello text there and this would allow an attacker to say guess that this sector on my disk has a bitmap image in it and I've edited the bitmap image to say hello. A similar attack works in reverse so an attacker can, this attack only works because the attacker can see edits at like a 16 byte granularity so they can pretty much see exactly where I've written to the disk or not exactly but fairly precisely see where I've written to the disk. A similar attack works in reverse where an attacker can corrupt 16 bytes at a time on my disk so they can flip a bit in my encrypted disk and when I decrypt that only 16 bytes of my disk are going to be corrupted. Basically if they acquire two copies or two snapshots of my disk they can cut and paste 16 byte blocks between the copies to kind of create like a Frankenstein file between the two versions of the encrypted disk and then when I decrypt it I see that I basically get like a mixed up file and this is again a problem because we don't have authenticated encryption. We can't have authenticated encryption with length preservation so we can't automatically detect that this has happened. We can only really detect it because our data now looks wrong. So another attack that we could do similar with the disk encryption is say we're running like an SSH server and we have an attacker that has physical access to our disk and let's say that they know on our disk where the SSH config file is and they know so in this server we have password authentication disabled and they want to turn password authentication on so they know which line corresponds to password authentication on our disk and let's say that they just flip a bit in that line on our disk. With a 1 and 256 chance they have that the first character when we now decrypt that randomized plain text will be a pound sign. So with a 1 and 256 chance they will have just corrupted that line and then commented out the corruption. So when we restart the service password authentication will not be allowed. So how do we fix these problems? So I already said that these problems are kind of impossible to fix without authenticated encryption and we can't use authenticated encryption because we want length preservation. So instead what we can do is we can increase the granularity at which an attacker must work. So instead of allowing the attacker to work at a 16 byte granularity we make them work at a 4096 byte granularity. So what we do is we essentially change the granularity from being 16 bytes to the entire sector. So to do this what we want is called a wide block cipher. So narrow block ciphers like ASXTS for example kind of work on 16 bytes at a time. Whereas with a wide block cipher we want a cipher that works on 4096 bytes at a time. So if I flip a bit in the plain text the entire cipher text is going to be randomized in a completely unpredictable way. And the reverse should also be true. If I flip a byte in the cipher text the entire plain text should be unpredictably randomized. And if we do this we have the property that IV reuse is now safe. If I have two messages with the same IV even if the entire message matches all except for one bit the cipher texts are going to be completely random so an attacker can't correlate these cipher texts at all. And we also get the kind of an authentication guarantee not quite. We get that if an attacker modifies some bit on our disk the entire sector is going to change and the idea is that we are much more likely to detect if 4096 bytes of our disk are corrupted rather than 16 bytes of our disk are corrupted. And so if we try this attack again with a wide block cipher the attacker can really only see that those blocks have been modified. They can't tell since the granularity is so much larger they can't tell that I've added like the hello text there they just see that the sectors have been changed. Now I said that we wanted to use wide block ciphers but to be more specific a cryptographer would call this a tweakable PRP or tweakable pseudo random permutation. So essentially it's a block cipher where the block size is the sector size. It accepts a key and a value that's similar to an initialization vector but it can be variable length and that's called a tweak. And you can see that this is kind of an ideal cipher for disk encryption. I just put in the block and I put in the sector number and I get out an encrypted block. So a similar problem exists in encryption for file names and file system encryption. In modern file systems we use hash trees to do directory entry lookups. So essentially we hash the file name in order to look up the directory entry. And if we're using an encrypted directory we can't store a hash of the plain text file name so we need to instead use the encrypted file name to do this lookup. And ideally we would use a unique initialization vector for every file name but unfortunately that would require us to first look up that initialization vector, encrypt the file name, hash it and look it up in the hash tree. So we can't look up the initialization vector because we haven't looked up the directory entry yet. So what we do instead is we just use a directory initialization vector where every file name in the directory uses the same initialization vector and then we encrypt with that, hash it and look it up in the hash tree. And so let's say I had these two file names in the same directory with AES, CTS, CBC mode which is what's commonly used for file name encryption. Since these file names have a 16 byte common substring their ciphertexts are also going to have a common substring so an attacker can sort of correlate that these files are related in some way which they shouldn't be able to do. And if you notice this is kind of the same problem that we had before with this encryption and we use a very similar construction to fix it. We use the main difference is that for file name encryption we have variable length messages whereas with disk encryption we had a fixed length 4096 byte message. So for this we use what's called a tweakable SPRP or tweakable super pseudo random permutations and you can think of this kind of as an infinite set of wide block ciphers or an infinite set of tweakable PRPs, one for each possible input length and they're all mutually secure. So if I have an input of length 16 and I encrypt that and I append one byte to that and I encrypt that 17 byte message the 16 and 17 byte message should look completely different so they can't be correlated. What you might notice is that if we have a tweakable SPRP we can easily implement a tweakable PRP so we can easily get the disk encryption stuff if we have the file name encryption stuff. And we just do this by fixing the input size so say I have a tweakable SPRP I just fix the input size to 4096 bytes and now I have a tweakable PRP of 4096 bytes. Interestingly also we can use tweakable SPRP to make authenticated encryption so what we do is we pad the input with some fixed number of zeros and we encrypt normally and then when we decrypt we just check whether those zeros are still all intact and if an attacker were to try to modify our cipher text it's going to randomize those zeros so when we try to decrypt those zeros will no longer be intact and we can know that an attacker has like modified our message in transit. We can also use this to make authenticated encryption with associated data just by passing the associated data in with the tweak. So the advantage is that a tweakable SPRP has something like AES XTS mode. First of all that tweak reuse or IV reuse is safe and that they're less malleable so an attacker kind of has to work on a 4096 byte granularity rather than a 16 byte granularity. There's also the advantage that tweakable SPRPs are useful outside of disc encryption as well. For example Tor is looking at using tweakable SPRPs whereas XTS mode is like strictly limited to disc encryption if you're not familiar with disc encryption you probably have never heard of it. And then also tweakable SPRPs are cryptographically cleaner if you're a cryptographer AES XTS and AES CTS CBC mode aren't great because they're encryption modes that were kind of modified last minute to work for disc encryption. So the disadvantages of using a tweakable SPRP is that there's some amount of performance loss so the extra diffusion from 16 bytes to 4096 bytes requires more work in our cipher and in a lot of tweakable SPRPs this is somewhere between two and four times as much work. In the specific cipher that I'm going to be talking about later it's around 1.7 times which is as far as I know the best currently. So the other disadvantage is that there's few fully specified algorithms and then there's even fewer implementations of actual algorithms. As far as tweakable SPRPs in the Linux kernel go there's already one in there it's called Adiantum and it is used for CPUs that do not have accelerated cryptography instructions. If you have accelerated cryptography instructions AES XTS would be much faster than Adiantum. There's another mode called AES H counter 2 which is what I've been working on and it has recently been accepted and it is intended for use on CPUs that do have accelerated cryptography instructions. So here is AES H counter 2. It uses two rounds of hashing and one round of encryption and that is why it is somewhat slower than AES XTS because AES XTS only has one round of encryption. The two rounds of hashing are what slows us down. The hashing here is called Polyval it's from AES GCMSIV it's a hashing mode that's reused from there. If you're familiar with AES GCM Polyval is basically G hash but rewritten to be optimized for little indie and CPUs. And then the layer of encryption is a mode called X counter so X counter mode is very similar to AES counter mode if you know what that is and it's a stream cipher that uses ZOR rather than addition like counter mode and it's also optimized for little indie and CPUs. And something I should note about H counter 2 is that it's a construction so it's not a cipher in the same way that AES is. You base the construction off of another encryption mode so let's say I used AES as my underlying block cipher there's a mathematical proof that says if AES is secure then H counter 2 with AES is also secure. And that proof uses some argument using polynomial root counting. And so in terms of performance this is a graph of the performance with H counter 2 in blue and XTS in orange and the X axis is the input length in bytes and the Y axis is essentially the cipher speed so it's cycles per byte. And you can see that H counter 2 is slower than XTS mode which is what we expect and this graph isn't super helpful I think this one is much better. This is the ratio of speed between H counter 2 and AES XTS mode and you can see that as the input length gets larger the speed slowdown converges to around 1.7 that red line is 1.7 times as slow. And what we want to use this for in the kernel is first of all for filename encryption where performance doesn't quite matter as much and then also for file contents or disk encryption depending on what sort of trade-offs you have between security and performance. If you're a very security focused organization maybe you want to use a tweakable SPRP. And that is it. Thank you for listening. Yeah. What do you mean by the authenticated mode? Yeah. Yeah, so the performance would essentially just be as if you added like some number of bytes onto the input length here. So if you wanted to do authenticated encryption of 4096 bytes the speed of that would just be encryption for 4096 plus 16. So you would just kind of extend this graph out like 16 bytes. So it should be just as performant for the AED, right? So yeah. So you would just like ASGCM the tag is 16 bytes, right? So the tag would just be the 16 bytes off the end of the cipher basically. Again like ASGCM mode I don't know. I assume it's probably similar in speed. Actually it's probably not similar in speed to XTS mode. It's probably like H counter 2 is probably 1.3 times the slow because GCM also does a G hash. But yeah the tag is essentially just, I don't know, you just take like the last few bytes off and use those as a tag. And yeah you can use the same amount. Like you can variably figure out how many zeros you want to add depending on how authenticated you want your encryption to be. The more zeros you have the lower probability that an adversary will be able to modify your message and have it still be correct. So it's not fully authenticated. We want to use this for filename encryption currently. Because filename encryption currently like the message or the, it sort of leaks some amount of data about your filenames if your filenames are a sufficient length. And we'd prefer that not to happen. I don't think that we plan to actually use it for filename or disk encryption on Android but if other people want to use it for filename or disk encryption they could I guess. Yeah the performance slowdown is fairly significant because it's like 1.7 times as slow for your disk. It's the encryption overhead. Yeah, yep, oh nice, yep, yep, yep, it can not, right, that's why we want to use it for filename encryption and not file contents encryption or just full disk encryption. Because as far as I know the inline encryption does not work for the filenames. Sure, I mean we have more questions if there's no more questions then maybe I'll sit down.