 Good morning everyone, I thank the organizers for the invitation, I thank Nigel for putting so much hope in the speakers and for placing me first. So I'm going to present some work on the key management services of AWS and this is joint work with Matt Campania who is somewhere here, happy that he is not on stage being grilled. So this is our work. So the outline, I'm going to describe the outline of the KMS, the service, then describe the limitation on a naive use of ASGCM in a cloud scale and then what we do or what mode of operation is being used in order to address this problem and we're going to go through some security bounds of the system. So what is KMS? So Amazon's KMS is a web-based service, first of all, provides you, the user, the customer, a simple interface to generate the key, to rotate it, manage it, and to send encryption and decryption queries so that you can encrypt files and decrypt files and even allow others to decrypt files per your permission. So there is a notion of a customer master key which is the root of trust of everything from the viewpoint of the customer and this is managed and protected in hardware whereas the user actually is using this key implicitly only through an API with a request to encrypt or rotate or generate the key but actually doesn't need to see the key or to own the key at least as the default configuration. So here is an outline of how things happen. You have some access control policy. As a customer of this service, you can just dispatch create key request and this will invoke a sequence, first of all, everything needs to be authenticated to know that you have the right to make this request and then a new customer master key, CMK, is being generated and it is stored. Everything is done inside HSM so it never leaves out. What goes out is the key ID. After you have the key ID, next time you can invoke requests with this key ID. So you have a file here. You want to encrypt it. You need to log into the system, then send an encryption request with your key ID and the plain text and eventually what happens in the system, first of all, do you have the right permissions and credentials, then the system will retrieve the customer master key, the CMK through the key ID, do the encryption, send back the cipher text, everything here is happening within the boundaries of the secure facility and the HSMs. So here is just a small demonstration of how a sequence of calls would be. So user A, okay, generate a key and generate a key for this session with this key ID and then you can encrypt the file, delete, a user B can say retrieve the data key that was used to encrypt this file if I have these permissions, then request the decryption and so forth. So basically the underlying premise is that the customer master key is never leaving the premises of the secure environment but through the access control, the owner of this customer master key can use it. All right. So a few items here, CMKs are stored encrypted and only decrypted on the HSMs. They cannot leave the HSM. There is an exception. This is the default. If you will, if you wish so, you can deposit your own key, right, and enjoy the same security promise, of course, with the exception that you cannot make a statement that the CMKs cannot leave the security boundary. They have left. They were created outside. But once they are there, they never go out. Now the access is restricted, only a limited set of audited APIs. In-text and ciphertext are not stored or logged by the system. Well, actually a system is geared to use, to provide you a session key or for some usage, which is a short value. And the encryption uses AES 256 GCM with a random 96BTIV. Why random? Because the system is distributed, of course, you cannot do anything else. But the random IV, the maximum plain text size is four kilobytes. The maximum AAD is eight kilobytes. The AADs are logged. This is part of how you retrieve your keys eventually. And they can be configured to rotate yearly. When you rotate the key, the next encryption's requests you make are going to use the new customer master key. Of course, everything is for backward compatibility, all keys are used only for decryption. So now we know how the system works and how about a naive usage of AES GCM. So we have the durable storage that holds the encrypted customer master keys. Let's say we have users in the distributed HSMs, they pull out a customer master key and now you can encrypt many files. So let's think of a setup where we have users, each one of them is able to encrypt Q files. So is this a scalable mode of operation for this type of system at this volume that we intend? So we need to remember that AES GCM with a random IV is limited in use. Actually the specification is telling you that you should never invoke encryption with probability to repeat an IV larger than 2 to the minus 32. This actually is telling you that you cannot use the same Q to encrypt more than 2 to the 32 files. All right. So this is 4 billions, but in a cloud scale, 4 billion is not a large number. And of course, as the cloud provider, we want to make sure that collisions of customer master key and random IV are prevented across new users. So a collision on the CMK has negligible probability. If you are really selecting a random 256-bit key and you have U user, so the probability is U squared over 2 to the 257, okay, we're good to go here. This is not the problem. The problem is what happens with the customer master key being used by each of the users for many encryptions. So I'm going to describe here a general concept that is work with joint work that I did with Yuda Lindel from Barilan University, and we presented this in CCS recently, and we call it a derived key mode. So this is a way with you to extend the lifetime of a key. So basically, it is very easy. You have a context of a scheme, an encryption scheme that accepts a nonce AAD and a message, and you have a key. So instead of using naively the key, every nonce is passed through a KDF with some key. This KDF spits out a nonce derived key for this session, and now you invoke your scheme, the same scheme with this nonce coming from the other side, the same AAD, the same input, and with this key you get out the ciphertext. All right, so let me just review the security bounds of this type of usage. So what is the adversaries advantage when he can use n different nonces? So it is the sum of the advantage of three factors. The advantage for n key derivations, whatever KDF is being used, plus at most, the multi-instance experiment of ciphers, you have a cipher used with the m keys, hopefully n keys, we'll see what happens if the keys collide, and then whatever property is for the original scheme, in our case it's going to be a SGCM when the encryption scheme, when the block cipher is replaced with a random function. Now, number one depends on the actual KDF, number two depends on what we're willing to assume on the block cipher, and for AS, what we're willing to assume is that AS is, with a random key, is indistinguishable from a random permutation beyond the birthday bound, way beyond the birthday bound, so this is a standard assumption that we all hope is true, and now the number three depends on the scheme, how the scheme behaves if everything were to be completely random. So let me give you one example of the simplest mode, which is counter mode, how we use this setup. So in counter mode, AS counter mode with a unique 96-bit nonce, if B is the total number of blocks encrypted under a key, and B max is the maximum number of blocks in a message, then the counter mode advantage is B squared over 2 to the 129. If you prepend the key derivation, actually you can prove that the bound is n times B max squared over 2 to the 28, and there is a big difference, because if we take 2 to the 48 encryptions, each one of them has 2 to the 16 blocks in length, then counter mode is broken. This is the birthday bound. But the key derived mode over counter mode has 2 to the minus 46 advantage, and basically here what it says that you can even do 2 to the 64 plaintext encryptions of length 2 to the 16 blocks and still remain with this magic bound of 2 to the minus 32. So this is a way we can say we extend the lifetime of a key. Now let's go to the actual encryption mode of KMS. So it is using ASGCM, and we have an AAD, we have the message, and the number of blocks of the message is at most 500 of the AADs, at most 512 blocks. The message itself is at most 256 blocks, and the KDF is the next KDF based on counter mode with the PRF HMAC SHA-256. Now what are the steps for the encryption? If you have a master key, you select uniform random nonce, a 16-byte nonce, and a 12-byte IV. Now you derive a 30-byte wrapping key from the KDF, and this is going to be the encryption key for that file. And then use AES256GCM with this wrapping key, the random IV, A, and M. It's a little variation over what I said before, if you remember, the derive key mode is actually using the same nonce as input, but here the nonce and the IV, both of them are randomized and input to the scheme, but basically it's the same idea. So pictorially, there are two random derivations, 128-bytes nonce, 96-bytes IV, and this is the scheme. So nonce AAD message go into AESGCM, but the key is whatever was derived from this KDF and a new nonce. Now we're going to see if this solved the problem on a large scale. But if we were just to use AESGCM directly with the random IV, then each customer would be limited to the 32 at most, if we want to enforce the NIST bounds, which are part of the specification of AESGCM. All right, so we have two perspectives to consider from the customer's perspective and the cloud provider, and they are different. So the customer is in a multi-key scenario because the keys are derived, each encryption presumably is deriving a new key. And the cloud provider is concerned with multi-user over multi-key scenario, which is a higher hierarchy. All right, so let's see the customer's perspective, let's see what is the probability that a key and IV will be reused? This is a bad event, right? And then the second thing we want to see, what advantage is there for somebody who views all the ciphertext to distinguish them from random? And then what protection do we have against forgery on the authentication of the AESGCM and what is the probability of recovering one of the wrapping keys from all of those that you've used as a customer to encrypt files? So all of these considerations actually come from this theorem, and we're going to review them and see if this is good enough or actually how many encryptions can each customer do where there are overall you customers and still everything is the advantage, cumulative advantage is below 2 to the minus 20 or 2 to the minus 30. So that's the question. So let's see at the right key IV collision. So of course, you can do the math on the fly and check that I didn't make a mistake here, but you'll have to put some trust in the computations. So what is the probability that two nonces collide? That's easy. If you have Q nonces that have been derived for Q messages, then it's Q squared over 2 to the 128, okay, what is the probability that three or more will collide? Okay, this requires some more thought, and you can bound it by Q cubed over six times 2 to the 256. And this is negligible as long as Q is less than 2 to the 64, we're only interested in reaching the birthday bound. This is enough. All right, I'll give you another lemma. What is the probability that 10 keys or more are repeated? It is less than 2 to the minus 32. You need some work, but okay. All right, so let's say that at most two keys were repeated, this means that Q minus 20 files were encrypted with unique keys, and 10 pairs of files were encrypted with the same key. But this is not really a disaster, it just says that most of the files, Q minus 20, were used once to encrypt a single message, and 10 of them were used to encrypt two messages. So if you combine all of this together and you do some computation, okay, the probability of a bad event happening is 1 over 2 to the 91. Okay, we are happy with this. So as long as Q, the number of files that a customer has encrypted is not more than 2 to the 64, we are good with the NIST probability of collisions. All right, one thing is done. Next thing, what is the PRP-PRF advantage? All right, the message, longest message is 256 blocks, which means that 257 blocks have been encrypted, because there is one more block encrypted in ASGCM to mask the G hash, the universal hash. All right, so what is the advantage, PRP-PRF advantage, 257 squared over 2 to the 129, okay, we are good here. Now when two keys collide, so then the number of blocks is twice that the amount above, so this is 514 blocks, okay, we have an expression. If, again, the customer is encrypted at most 2 to the 64 files, then we do this computation and the indistinguishability advantage is less than Q over 2 to the 110, and we are good to go here. All right, next, what is the forgery protection? All right, so what is the forgery success probability in ASGCM? It depends on the longest message, so the number of blocks in the AAD plus the number of blocks in the message plus one, if you want to really be precise, 769 blocks is the most you can see, so the forgery probability is 769 over 2 to the 128. I think we are good, we are good to go here. Remember that decryptions happen in this system at 1200 times per second at most, so I think we can all agree that it is not, that forgery is not a threat in this system for whatever number of attempts you can possibly do, okay, we are happy here. What about key recovery? So if you think about it, there is a customer who is using a single customer master key and encrypting many messages, but if you look at the encryption scheme at AES, encrypting X time, a fixed block, then you can recover the key with probability of X over 2 to the key length. Fortunately, the key length here is 256, we are happy, and the probability, so in order to encrypt the same block, the minimal assumption is that the IV repeated, so let's say what is the probability that an IV was repeated five times or more, so we can compute this to, okay, to the 320 over 5 factorial times 2 to the 384, you do some calculations, okay, believe me, this is a very negligible number, even if Q is 2 to the 64, of course, 2 to the 64 is like a magic number, of course, I hope that nobody thinks that there are going to be 2 to the 64 encryptions so fast, right, but we want to have this up to the birthday bound, so this is negligible. Now we go to the cloud provider's perspective, so what is the probability to have this catastrophe derived key and IV collision across all the users, and what is the advantage of an adversary who can observe all of the encryption files, not only for a single customer, for all of the users of the system, and of course forgery probability doesn't change, right, here with the number of samples that you see, because it is limited by some rate, and each one of them is protected with the bound that we saw before, of course, what is the probability of recovering one of the wrapping keys, so now if you think about it, we have U users, each one of them is encrypting Q files, okay, this is a huge quantity, so let's start again with the same sweep across all these problems, so what is the problem, what is the probability to have a collision over U users who requested U customer master keys, okay, U squared over 2 to the 50, 57, and with this lemma of things don't repeat more than 10 times with negligible probability, we can just say that if we have 2 squared over 2 to the 97, this is the number for the IV collision to the power 10, we are happy here, because the probability that we'll have a collision is eventually linear with the number of users, it is U over 2 to the 91, and okay, whatever number of users you are going to use here, which is reasonable, let's say below 2 to the 50, 2 to the 40, we are happy, all right, now what about derived key collisions, what happens here, so remember, there are two possibilities, we have 16 bytes, random notes, and 256 bits customer master key, so again, the collision probability you can compute is U times Q squared over 2 to the 385, and a collision of the KDFI output is also expressible as U times Q squared over 2 to the 257, so even if we have 2 to the 48 users, each one of them is doing 2 to the 64 encryptions, what is the probability of a bad event, okay, it is 2 to the 64, 2 to the 48 squared, but divided by 2 to the 57, okay, this is 2 to the minus 33, and this is our magic number, if you remember the NIST specification, don't use a yes GCN if the probability of repeating an IV is more than 2 to the minus 32, so okay, we are happy with all these, a number of users, and the last, I think the last, okay, PRPPRF advantage, we do the same exercise as before because the messages are limited in length, and the advantage is actually linear in U times Q, all right, so 2 to the 40 users, let's say each one of them is crunching 2 to the 50 encryptions, all in this, the advantage is less than 1 over 2 to the 20, so we are always doing the worst of the worst cases to check the bounds of this system in this multi-user, multi-key scenario, now what about key recovery, there are so many encryptions, just imagine that everyone was encrypting the same block, this is the worst case scenario, right, because we will have the same block encrypted under many, many keys, and we have this key recovery property, so in a multi-key scenario, we have x over 2 to the 256 in our case to recover a key, this is a probability to recover a key, or if you wish, this is the amount of work you need to do in order to recover a key, and now we ask ourselves, okay, how large can x be, so we have you users, each one of them is doing Q encryptions, and now let's ask ourselves, let's be, you know, rude, what is the probability that a 96-bit IV would repeat more than 16 times, all right, because then we want to bounce, so the probability to do this miraculously, okay, we work this out this way, right, ends up as 1 over 16 factorial, now if you do the math, and you are taking 2 to the 40 users, each one of them is doing 2 to the 50, and we want to see how many blocks can be possibly encrypted, how many times the same block can be possibly encrypted under different keys, so the probability that this is 16 is less than 2 to the minus 44, all right, so we are happy here, we can say that we are safe, even with this number of 2 to the 90 encryptions overall, and you can realize now that if we were not doing all this derivation, we would be clobbered before that, because the AES GCM, or as far as I know, any mode of operation with AES that is 128-bit block cipher, you cannot cross the birthday bound, and if you want to leave a margin of 2 to the minus 32, you have to stop way, way before that, so this is exactly the consequence of the key derivation, so in summary, we have a secure cloud-scale implementation of an encryption scheme that we can support up to 2 to the 40 users or master keys, I mean a user can request more than a single master key, so it's not a one-to-one relation necessarily, and each master key, each user can perform 2 to the 50 encryptions, and I checked the earth population in 2017, just so we have 7.2 billion people, so it's something like 2 to the 32.7 users, so even if everyone on earth would be requesting a customer master key, and would do 2 to the 50 encryptions, then we are still safe, and we are still safe, and have this wonderful security margin with AES GCM, with the twist of the key derivation mode, and I will conclude with that, thank you very much. Great, I guess we have time for one question, is there any questions? An easy question? Why don't I ask an easy question, so we will also talk about the mode of encryption, but how much support do you give customers, for example, in key management services, if they want to rotate keys or ... All right, so there are some, I would say, defaults, so the key would be rotated after a year, but you can request a key rotation. And then, how did that work? Well, you start from fresh, so that's it. Our worry was actually to show that you can have a customer who can safely encrypt as many files as possibly conceivable, and many of them are doing the same thing without key rotation, and we are still safe, right? If you rotate the key every time, then things are easy. So if you want to look at this, it's like taking AES GCM and scaling it up two levels. The user is doing many, many encryptions, more than 2 to the 32, which is the bound, and there are also 2 to the 40 users doing the same. So this is the scaled problem. Yeah, any other questions? If not, then let's thank ShotKed. Okay, thank you very much.