 don't have them yet. OK, and so with that, I think we can start this new session on hardware. The next talk, the slides are going to come up in a second, is called Everybody Be Cool. This is a robbery given by Jean-Baptiste Bedrin. Please be calm, enjoy the session, and talk to you in a bit. Hi, everyone. So my name is Jean-Baptiste Bedrin. Today I will talk about HSM security. This is a joint work with Gabrielle Campana. We are both security researchers at ledger. I cannot pass the slides. And we got the opportunity to evaluate an HSM. We found vulnerabilities inside. So in this presentation, I will explain the bugs we found, how we exploited them, and more importantly, the methods we employed for that. So first, I will describe what is an HSM and the specificities of the HSM we evaluated. As most of our work is focused on the security of PKCS11, I will introduce the standard. I will explain then why we had to develop a few tools to improve the security analysis of the device. And then we'll talk about the bugs we found and how we exploited them to fully compromise the device and actually extract all of its secrets. So what is an HSM? An HSM is a hardware device that protects your keys. It is able to generate, store, and use cryptographic keys. It is a physical device. It can be either an internal device, such as a PCI card, PCI Express card, or an external device, such as a network appliance, for example. It contains one or more cryptoprocessors to speed up or to secure cryptographic operations. And also, almost all of them have anti-temporary countermeasures to prevent physical attacks on the device. So why are they used? They are used in many applications. The most prominent one is PKI on public infrastructures, so the certificate authority is stored into the HSM, at least its private key, and all the certificates that are delivered by DCI are actually signed by an HSM. They can also be found in bank environments, for example, to verify your CVV when you do a transaction, or to personalize your credit card. They can also be seen in telco to authenticate your device on the operator network. And actually, in many places, where a long-term key must be really protected. So you can find them in many places, but there's not much public security research on the subject. Why? I'd say, first, that is that they're expensive. So actually, no security lab can just buy an HSM and perform an assessment on it just for testing. Second, they protect secrets, and they're often disconnected from a corporate network, and most often out of scope during penetration tests, as companies do not want to lose all of their secrets during a pen test because it can wipe itself if it detects an attack. And at last, there are a few vendors that sell HSM, and none of them provide detailed internal information about the way it works. So it's actually a big black box, and we have no idea if they're internals. So we got the opportunity to evaluate an HSM. You can see it on the picture on the right, and it's actually in red. So this is an internal HSM, a PCI Express card, and what you can see is that it's all black. You see nothing. All the components that are covered by epoxy resin, so you do not have direct access to the components. And if you try to remove the epoxy resin, it will be detected, and the device will wipe all of its secrets, right? This HSM is certified, flips 140 to level 3. We'll see that later. And it has an internal controller with a connector. Actually, this HSM exists in two form factors, a network appliance and a PCI Express card, and we think it is actually the same device, except that, in our case, the connector is not soldered. The other one, we think it's the same and closed in a server. So we put it in a standard Linux server, installed the modules, kernel modules, provided by the vendors and the utilities, and started our research on it. So briefly, I explain how you communicate with the HSM. You must write a client that will use the library provided by the vendor. This library will issue calls to the kernel. They will be transferred to the memory of the HSM using a shared DRAM stored on the HSM memory. The HSM will process this transaction, execute the cryptographic operation, and we'll send back the result to the host system. So all the transactions are performed on the initiative of the client. So this can be seen as a classic client server architecture. As I said, the HSM is certified, and what I can say about that is that the certification is only about hardware attacks. So as I said, if you try to attack using a hardware attack, it will wipe itself and you will lose all of its secrets. But this is not a certification about software attacks. So that means that if you try to attack using software methods, this device, it is, you have no insurance that it will be safer than an uncertified device. So I introduced our device. I will not talk about PQC-S11, as most of our research is about the subject. So what is PQC-S11? It is a standard invented by RSA Security that aims to provide a generic interface to communicate with a cryptographic device. This device can be a smart card, an HSM, something able to perform cryptographic operations. This is a portable API. It defines an API which is named CryptoKey, and this is a portable API. That means that the code to communicate with a cryptographic device that supports PQC-S11 will be the same for every device. However, the implementation can be very different. You will not have the same stack if you communicate with a smart card and with an HSM and the same various HSM have a very different stack. It defines APIs to perform cryptographic operations. These operations can be any kind of cryptographic operations, encryption, decryption, signatures, key generation, and so on. So the API is not that big, only a few dozen of functions in the standard. So the attacks of face on the device doesn't look very big. But actually, most of these functions can be parameterized using mechanisms. And there are many, many mechanisms defined as a standard, more than 300, and vendors also add their own mechanism to get very rich functionalities. So actually, the attacks of face is much bigger than what we expected at the beginning. What are mechanisms? Mechanisms define a way to perform a cryptographic operation. So a mechanism for encryption, decryption, wrapping keys, and so on. And this mechanism really depends on the device that we consider. For example, a smart card will have very few mechanisms. Maybe one mechanism to generate an aristarchy, one mechanism to encrypt using free death, and one mechanism to sign data, and that's all. However, an HSM has much more functionalities. It will support many more ciphers, it will support authenticated encryptions, for example, and higher level constructions. And it will also possess custom mechanisms that will be used in telco or in bank environments. All these mechanisms require, take objects as parameters. What are objects? These are three types of objects defined in the standard, keys, that can be secret, public, or private keys, certificate, and data, that can be actually arbitrary data, but most of the time it is domain parameters used by DSA and ECDSA. So what interests us as attackers is obviously the keys, the secret and private keys. And CryptoKey manipulates objects through their handles. If you want to perform a cryptographic operation, for example, your clients want to encrypt data, first you generate a key on the HSM. You request the HSM to generate a key. The HSM will store the key in its memory and send back a handle to the client. The client then will send the data to be encrypted with the handle of the key, previously generated, and the HSM will send back the encrypted results. So at no time, the key is sent back to the client. It always stays in the memory of the HSM, right? This is important for the rest of the plantation. Objects are composed of attributes. Each object has a class attribute that defines its role and the other attributes depend on that class. For example, here is a private ECDSA key. You can see it has a value attribute. The value of the key is actually an attribute and you can get its role. It can be used for signing, in that case, not for encryption. And these attributes are critical for security. For example, an object that has a sensitive attribute the value of this object cannot be extracted in plain text. It must be wrapped with another key. For example, usually a key that stays in the HSM. You have an extractable attribute. If the object is not extractable, its value can never be extracted even if it's wrapped with another key. Another interesting attribute is a private attribute. It means that the user has to be logged in to access the object. Each object belongs to a token. What are tokens? Actually, when a user wants to log in to the HSM, it logs to a slot. What is a slot and a token? Actually, the token was the object at the beginning that contains the cryptographic objects like a smart card. And this slot was a physical device that is used to access the token. In that case, it was a smart card reader. In the case of an HSM, the difference is a bit blurry. Actually, let's say that you have several slots on the HSM. Each slot has one token and the objects are bound to a token. For example, if a user connects to the token on the left, you will only be able to access the object that belongs to that token to the object that belongs to the token on the right. From there, we can define a few threats. For example, an attacker that logs into a token must not be able to get the keys from another token. An attacker will try to read the value of an unextractable key. This is an attack. But we will consider a much simpler model. In our case, we are in the position of an attacker. We have no credential information, only unauthenticated access, and we want to access all the cryptographic data from other slots to dump all the keys. So the PQC S11 has been quite sturdy a lot in the previous years. Our work is a bit different. We do not look at the security of the standard, but the security of implementation on a single HSM. Moreover, we give an insight of the internals of a given HSM. So let's start now with the vulnerability research. We consider that we are in the position where we have compromised a host that is able to talk with the HSM code. But we have no login information about this. We started using the software provided by the vendor. It was a CD-ROM that contained an SDK and a few code samples. What was very interesting for us is that it contains also a firmware update. That firmware update was just signed and not encrypted. And it was quite easy to extract. And we saw that it was running actually an old Linux kernel that was released 10 years ago. So we were sure that everything was not really up to date on that device. It contains a big library that actually embedded all the crypto-key implementation. And we spent a few weeks doing reverse engineering on that library. This was not enough to get precise insights of the internals of the device. And looking at the features, we saw an expected option on this device that we are able to load code on it using custom models. The usage for this is to implement new mechanisms, for example, to filter a few messages or things like that. This is not a vulnerability in itself. You must be admin to be able to upload a custom module. This is really a feature. And we developed two modules to gain information about the device. The first one was a shell. We modified BuzzyBox to be able to launch various commands on the device. And from a black box, we got something that looked like a user-friendly device. We developed then a debugger, a modified version of a GDB server. This was a bit tricky because we... Actually, the main process is responsible for the communication with the outside world, with the host. And if we debug it, we have no way to communicate with the outside world. So we had to create an auxiliary channel to be able to talk with the host even when debugging the main process. That worked, finally. And what we saw is that everything ran as root on the device, and there are no hardening or mitigation options, so no ASLR, no StarCookie, and so on. So that means that every exploitable vulnerability can be enough to extract all these secrets. Let's see later. We wanted then to understand how sensitive data and cryptographic objects were stored on this device. Actually, all the persistent data is stored in a flash memory on the PCI card. What interests us is the PCI-C11 object. They are stored in a single partition, dedicated-page partition, that uses a proprietary file system. And actually, all the objects are stored in plain text on that partition, but sensitive attributes, all the sensitive attributes are encrypted using a key which is external to the flash. It is located in another component on the right. So when you try to attack the HSM, and it is detected, that key is immediately destroyed. And then if you dump the flash, manage to dump the flash physically, actually you will get only access to public information. All the sensitive data will be encrypted, and you won't have the key to read it. But that also means that there's a single key for all the objects, all the sensitive attributes. So there's actually no logical separation across the HSM slots. So actually, if we manage to get code execution on the device, we should be able to extract and to decrypt all the secrets on the device. So let's go. The first thing we did was to find simple vulnerability on the device. So we looked at basic memory corruption bugs, stack-based buffer flow. We did manual analysis on it, and we regret calls to memcopy to find a vulnerable call to memcopy. And actually, there was several hundreds of calls to memcopy, but this can be done quite easily, manually. After a couple of hours, we managed to find there was a single call that was actually vulnerable to stack-over flow in the millenaged derivation mechanism. This is an algorithm that is used by telco operators. This is a UMTS authentication algorithm. In that case, in the pseudocode at the bottom of the screen, you can see that there's a value of a key that is retrieved, and the size of this attribute is never checked, and the value of the attribute is directly copied on a variable AES key on the stack. That means if you generate a big key on the device and you use it to derive a new key with millenaged, it will trigger a stack-over flow. From there, actually code execution is very easy because there's no stack cookie on the device and no AES error, but if you try to exploit it, actually, you will corrupt a lot of data on the stack, so a resuming execution will be a bit difficult, and you have to be logged in to be able to call this mechanism with a specific key. Moreover, this algorithm is only present in the latest version of the HSM, so we thought it would be a better idea to look for another bug. And for that, we were a bit pissed with manual analysis, so we started to write a further. We mutated the message sent by the client to the HSM, and we actually wrote a dump further that does a random byte mutation on the device, and the further is simple, but there are two main challenges. The first one is that the kernel module on the client side is not very robust, and actually we crashed the kernel of our host before sending correct data to the HSM. So this was the first problem. The second one is that by modifying some bytes on the data that was sent to the HSM, we triggered a big memory allocation, and the HSM had no memory, and we got a lot of out-of-memory errors, and that was not interesting for a security point of view, so we had to filter all of these messages. But finally, we got a descent further that found 14 vulnerabilities, 14 different vulnerabilities on the device. All of them were more or less exploitable. All of them were memory corruption bugs. I will present two of them now. The first one is quite similar to Heartbleed, which was a vulnerability in OpenSSL that allowed you, for example, to dump the content of the memory of a server and to retrieve its private key. Here we have quite the same bug, and we are able to dump the memory heap of the HSM. You can see here Bla on the top of the screen is actually the password of the admin slot we put on the device. So we can get really sensitive information like password and maybe encryption keys. But this bug is not very interesting for us because you need to be authenticated to trigger it, and that does not give you code execution. So we found another bug. We looked at an interesting mechanism, which is serialization on the HSM. Actually, you are able to start an operation than to pause it and to resume it later. So you start, for example, a hashing operation on the device. You say that you want to restore it later, and the HSM will send you its state on the client. Then you send back the state to the HSM to restore the operation. And actually, if you mute at this state, you get a strange crash. At first, we think it was just a new derivative byte, new derivative vulnerability, which is not interesting for our security point of view, because it's usually not exploitable. But the stack trace was a bit unusual, and we dig a bit so that it was not a new derivative, but actually a type confusion bug. So actually, the byte, which is unsecured in red, is the type of the digest you are restoring. And if you change it, and actually restore a different hash object, that's the one that was probably saved. And that allows you to call different methods on that object and to trigger some bug. We, from this primitive, from this bug, we managed to get two primitives. One was a read primitive that allowed you to dump the heap of the HSM. The other one was a relative write primitive that allowed us to write data located after the state that had been restored. So I won't go into details, but by allocating several objects on the HSM, and by triggering the bug, we were able to corrupt the memory of other objects and then get code execution from that. So once we had code execution without any authentication, we had to develop a payload because this is just the beginning of the adventure. And this payload had to be specific because the usual payload, like running a shell or trigger a network connection, doesn't work because there's no shell on this device and no network connection. So we bought a bit and we came to the conclusion that the easiest stuff to do was just to disable pin verification on the device so that the attacker can just log in without any password. And that worked. And from there, we have admin writes and we can use the custom module feature to upload a bigger library that is able to do what we want. For example, dump the flash, wool content of the flash and dump the encryption key that is needed to decrypt the flash. And this exploit is a single binary executed from the host and it allows to get all the cryptographic keys from the HSM in plain text. We wanted to follow our analysis by looking at the way the signature, the firmware was loaded onto the HSM. Actually, I will start first with the signature verification of the custom module. You can install of the device because this is quite the same mechanism. So when you want to install a module, you first generate a certificate on the client and you send it to the HSM. The HSM store the certificate and sends back a certificate handled to the client. Then the client sends the module it wants to load which must be signed with the certificate with the handle of the certificate. If the signature is correct, the HSM stores the module and loads it into memory and executes it. It is almost the same for firmware updates except that firmware updates can only be issued by the vendor. You cannot load your own firmware. Here is a process. Instead of supplying your own certificate, the client looks for an object which has a custom attribute on the HSM. This is a specific attribute that cannot be set by any user. There is a single object that contains this attribute. This is the certificate that is used to verify firmware. The client then sends back the handle to that certificate with the firmware data and the HSM verifies the signature again. If it is valid, it writes the firmware to the flash, loads the firmware and reboots. But actually, there is a logical bug here. When the client sends the firmware data with the certificate, the HSM never checks that the attribute is present on the certificate object. You can install any dummy certificate and provide the handle to that certificate along with the firmware data to bypass the firmware signature on the HSM. You can actually load any firmware on the device. To be able to do that, you have to be admin. You have to be admin. But with the previous vulnerabilities I explained, you can be admin. From there, you can install a persistent backdoor on the firmware. This is actually a critical bug because this attack cannot be patched easily because if the vendor patches the signature verification mechanism, you can still upload an older firmware to the HSM that will be accepted and trigger the signature verification bypass on it and load your backdoor firmware. As a conclusion, I explained how we were able to trigger memory corruption bugs that lead to a complete compromise of the HSM and to dump all of its secrets. We are not sure the bugs we found also work on the network version of the HSM. We think it will be the case, but we cannot be sure yet. We were able to dump the whole storage of the device and to decrypt all the secrets that it contained. We show a way to bypass the signature verification of the firmware and a way to break the integrity of the device. And we think this is hard to fix. All the bugs have been disclosed to the vendor. The vendor patched all of the bugs in a few months and the software should be safe enough. So I want to point out this is not an exhaustive HSM study. We only studied one model from one vendor, but we hope that our methodology can be used by other researchers to perform the same assessment on other models. Finally, I would say that most HSM are protected against all attacks, but this is, in our opinion, not enough. You have also to be protected against software attacks. Thank you very much. Okay, we have time for a few questions. So we don't have to outlaw crypto. We can just send your talk to the FBI, right? Sorry, I didn't understand. And please, let's thank the speaker again. One last question. Yes, one last one. I understand you were two guys working on it. How long did it take you to do all this? I don't know. I'd say one summer. One summer, including holidays for two months. Do you have a next target already? We'll be glad to follow our study on other devices, but we have no other devices and actually no time to spend on that. Thank you. So the next one is TPM fail. TPM meets timing and lattice attacks by Daniel Moghimi, our speaker, Berksunar, Thomas Eisenbott and Nadja Henninger. And let's see if we can do now remotely, maybe. Hello. Hi, everyone. Thanks for attending my talk. Today I'm going to talk about TPM fail, which shows some critical vulnerabilities on one of the most popular hardware security products. This is a joint work with Berksunar, Thomas Eisenbott and Nadja Henninger. Well, TPMs can be found almost on any kind of computing devices, from your laptop to your computer, desktop systems, auto industry, industrial systems. Why is it so important to have these trusted platform modules everywhere? Well, we have known for a long time that softwares are vulnerable, we have seen the heart bleed attacks, and many different types of attacks, root kids, we cannot rely on the OS for security, and computers are just evil that we have seen lots of CPU attacks, so we cannot really rely on the host and the CPU to provide any root of trust for us. There is a need for hardware-based root of trust. This need has been around for a long time now, and some people came up with a solution, some security chiefs called Trusted Platform Module, or TPMs, that these TPMs are embedded into your computer, into your laptop, and these TPMs are supposed to be temporary-resistant, side-channel-resistant, and they provide some cryptographic functionalities, so you can basically narrow the root of trust, put your cryptographic inside this box, and then keep everybody outside of this trusted computing base, root of trust. So with that, we expect lots of cryptography functionalities here. There is an organization called Trusted Computing Group that they provide a standard on what these devices should support, what type of cryptography functionality and what kind of security guarantees and qualities. There are random number generators, hash functions, encryption modules, and digital signatures in this type of device. Today we focus mostly on the digital signature. Well, we only focus on the digital signature, that's what the TPM fail attack is about. And one thing we mentioned, okay, these devices are supposed to be temporary-resistant, side-channel-resistant, but how do we know, how can we even rely on that? Well, TCG also provides some certification that is done through some organization like FIPS and CC, and these certifications are supposed to guarantee that these devices meet certain qualities, certain standards to be protected against attacks. And on the TCG website, you can find a list of devices that are considered to be certified. But again, as we talked about security, we don't know anything about how these devices are implemented. There are just black box security chips. We mentioned digital signatures, and why do we care about using TPM digital signature? Well, you can just use TPM as a trusted execution environment for digital signatures. There are lots of different applications like SSH or VPN or your email client or server that they can just keep the security key, the digital signature key inside this TPM module, and they can ask the TPM module to perform the signature generation. And now the latest version of OpenSSL, the Linux environment, they all support TPM by default. There is also the FIDO alliance that they basically push to use TPMs for things like UB key, for things like two-factor authentication, and these devices all use digital signatures. And the other important thing about digital signatures for TPMs are that they need to support remote attestation. For instance, if I want to make sure a party on the other side of the network, on the other side of the board has a legitimate computer, has a legitimate firmware running, I can actually use the digital signature PKI type of a scheme to verify if the other party is a valid device with a valid TPM firmware and version. The new version of TPM, TPM version 2, support elliptic care digital signatures, and this has been popular for a while now, almost all new laptops and computers since Windows 10 all support TPM 2 instead of TPM 1.2. So the question is, with these certifications, with all these blacksparks obscurity, are these devices really secure? Can we really rely on these devices for secure transactions, secure encryption, and et cetera, et cetera? So we did the most simplest and most common-sense test to actually see if these devices can handle side-channel attacks. We did a timing test, we wanted to do a timing test, and the first thing we need to do a timing test is a good timer. One way to measure the time of a TPM device, for instance, is to use the power signal using oscilloscope and get a very high-resolution timer, but that's not very scalable, that's not very easy to use, so we just try to make a timer using the CPU frequency. We know that the CPU runs much faster than a TPM device, because a TPM device is generally based on a small microcontroller. It runs with maybe less than 100 MHz frequency, but our CPU is generally more than 2 GHz, and the TPM is generally attached to our system, so we can just use the CPU cycle as a timer. So we started our test when we started this work, like last January, we started, okay, let's look at the most common TPM product on our computers, which is a product called Intel Platform Trust Technology, or Intel PTT, which is essentially a firmware TPM that is integrated into your CPU. It has its own processor inside the CPU package. It runs as part of the management engine, which management engine hasn't been a very trustworthy security system in the past, but we also rely on this management engine to run the firmware TPM, basically. But what is important is even if the CPU, the host CPU is compromised, it doesn't have any access to the management engine directly. And this has been supported on almost all new operating systems in those Linux systems. So we did a simple test. I ran the CDSA that is supported by this firmware TPM, and it gave me some distribution, histogram distribution. I was like, okay, we did a simple test, and this is clearly not constant time. When I measured the timing from the CPU using the CPU cycle count, for some of the signatures, it runs faster. For some of the signatures, it's run slower. So there is a suspicious that here we are leaking something about the nonce, which is the only thing that changed when we signed different messages using the same key. We created a tool. We modified the Linux kernel stack for these TPM devices, and we tried to measure the timing of the TPM operation as close as possible to the interface of these devices. When we did this measurement, we realized that, okay, the same timing measurement gets to a very clean cut distribution with different peaks, like three or four different peaks depend on how many measurements we do. And each peak has a frequency like 16 times more than the next one. So this kind of gave us the idea that, okay, this ECDSA implementation is probably based on some fixed-window implementation, and they probably leak something about the nonce. That's the idea that we got from this, and this also matched our previous observation of some of Intel's cryptographic libraries and products. So with this, we got some confidence. We're like, okay, we have this tool. We're going to collect some devices in the lab, like desktop computers, some Intel new computers, laptops, and run our tool and see how do they behave, how do these devices behave when we do a timing test on public key schemes like RSA and ACDSA. Among these devices, most of them were used by Intel FTPM. We even updated these devices to the latest version of their firmware, to the latest version of their Intel management engine firmware, and we noticed that, okay, the timing behaviors are still there, and even other devices have some non-constant time behavior for RSA computation, for ACDSA computation, and most importantly is that that's what we're going to talk about for the rest of the time is Intel FTPM and ST micro-electronic devices. They actually have vulnerable ACDSA implementations. So there is a more in-depth analysis of other, like we analyzed RSA and et cetera in the paper, but to focus on ACDSA, this time we programmed the device with a noun key instead of using just a noun key that is generated inside the device, and we unblinded the non-status used for this signature, and we noticed, okay, there is a direct correlation for every additional 4-bit of leading 0-bit in the non-status with the timing operation of this ACDSA computation. And Intel FTPM device supports ACDSA, CSHNOR, BN-256 care, and they were all vulnerable in a similar fashion with just different timing distribution. And then we got to this one. This is the timing distribution we got from running ACDSA and ST micro-electronics, TPM device. And at first sight, when I ran this, I didn't pay enough attention, and I thought it's constant time. It looks like normal Gaussian distribution, but then later on the next day, I looked at it more closely. I'm like, okay, it seems like the left side of the plot is less steep, so there is actually a timing behavior here. This is not actually a balanced Gaussian distribution. So similarly, I programmed the device with a noun key and then unblinded the non-status, and I noticed that, okay, there is actually a direct correlation between the time and every additional leading 0-bit in the non-status. So this time, it's similar for every short non-status. For every short nonce, we get a faster time. So if I compute ACDSA, if the non-status is used for that signature is short, the computation is going to be much faster. So based on this, now that we have two vulnerabilities, we created an attack. This attack, basically, we have a template. We know that with what threshold these signatures are going to be biased, how many leading 0-bits they're going to have. And when we have the template, we have signatures from the device, and using the template and the timing samples we have collected, we can basically filter the signatures that are biased, that they have been generated with the biased knots. And then we applied the standard lattice-based attack to recover the key from these devices. So in the final attack we did, we actually just asked the TPM device to generate this key instead of programming it with the noun key. Because the lattice-attack work, I'm not going to go to the detail of what the lattices are and how they work, but general overview is basically we rewrite the ACDSA equation in a way that the public parameters are on one side and the unknown parameters are on the other side. And then we get this nice, small linear equation here, system of equation here, and then we can basically construct a hidden number from this and we know that all the nonsense KIs are smaller than some size because we can leak if a signature is generated based on a small nonsense. So this is a standard hidden number problem that is introduced many years ago and it has been studied by many people in side-general community, in theory community as well. And yeah, and we constructed a lattice like this using the public parameters and we applied the common LLL and BKZ algorithm to actually extract the keys from these devices. We did some analysis of how efficient these attacks are on these devices. For Intel of TPM, we did the analysis with three different thread models. Well, for TPM, even a system adversary, even a physical adversary is part of the thread model, but we wanted to see what is the impact of such a timing behavior like from a user space, from a network. And here on the plot, we can see some of the results for the system adversary that, for instance, we can recover the key with a lattice dimension of 80 or that range of kind of lattice dimension. So on Intel of TPM, we could basically recover the private key in like three or four minutes and the time is just time to collect the signatures. The computation of the lattice just takes a few seconds because it's a small lattice size. For SD microelectronics, which is a CC certified device and it's supposed to be protected against timing attacks and side channel attacks, we could recover the key in 80 minutes, which, again, is the time to collect that amount of signatures, 40,000 signatures. And with that, we moved to, okay, what else can be done with these devices? So timing attacks, remote timing attacks have been demonstrated more than 15 years ago. There are various work on remote timing attacks on RSA and ACDSA. And TPMs are very close to like some of the smart cars that they used to have a timing issue. But the difference here is if I connect the TPM device to a network, which is a very common scenario and it has a vulnerability, timing vulnerability, it's going to be highly exploitable to be exposed to the network because this device runs with a very low frequency. So even some timing distribution with lots of network noise can still be observed from a remote adversary. We came up with a case study for this. We looked at some of the applications and one of the applications that actually have default configuration to use TPM is a strong one VPN solution. So we use the configuration that is provided by the product, by the software, and we configure the server to use its built-in Intel TPM solution to do the authentication with a client. So a malicious client that's connected to this VPN server is going to, by default, do a DFM and key exchange, get a shared secret, and after that it needs to authenticate the server. In this authentication, when the client sends its parameters to be assigned by the server, the server is not going to have the private key its own memory and the private key is going to only exist in the storage, the secure storage of the TPM device. And then the VPN server asks the TPM device, okay, sign this message and give me back the response which is the signature. And then the server is going to provide the signature to the client and the client can verify if the server is actually a legitimate server or not. But at the same time, the client can keep doing that until it collects some number of signatures with timing and timing samples. And every time the client can just drop the communication and try to issue another handshake. With that, the client can basically have enough timing information to learn good amount of information about the TPM, the private key for the server that was actually only stored in the TPM device. So we did an analysis of this attack and we actually managed to recover the private key from the server after like 44,000 handshakes and it took us five hours to do a remote timing attack in this very practical scenario. And five hours for a remote attack on a TPM device is something that I would say it's not good in 2020 as like a result, really. Yeah, so in this case we could basically recover the TPM the key basically in like with 60, 70% success probability for instance with the latest dimension of 90. So with that I would like to also show a comparison of different timing distribution we did in our work. So we build up our attack like eventually we started with the system adversary. There is no noise we can just use this to recover the key in a very predictable way. And then as we do the measurement harder as we make the scenario stronger we basically have some noise. For instance in the user adversary we can see in the blue one that there are more noise. Or we also implemented our own small network application which doesn't have any handshake and finally we saw that okay even with a real application the timing distribution is too good that we can still recover the key. So we went through the responsible disclosure we first reported our first finding to Intel Acknowledged Receive and they actually told us the reason this was vulnerable is they were using an outdated version of their Intel IPP cryptography library. And this library we actually reported a similar issue to this about this library to Intel maybe one or two years ago but this shows that even the vendor sometimes they forgot to update their own products that they're using their own cryptography libraries. So a firmware update came out in November as we released this issue and if anybody using a CDSA and Intel product they really need to update their firmware to avoid this vulnerability. We also reported this issue ST macro electronic vulnerability to ST they were quite surprised that this is even possible they acknowledged the receipt they acknowledged okay spine you're right and we had lots of calls lots of emails to clarify okay that's the disclosure process surprisingly they never had any experience anybody report any vulnerability to them or knowing what is this process for disclosure which maybe it shows how good they are in terms of I don't know producing products. So we verified that the new version that they patched this issue and they provided us with a new device and we verified that okay the new version that they patched now it's secure against our attack last September and since then there are two OEM vendors that they have issued firmware patch for this issue and ST also released the list of devices that they were vulnerable to this problem and if anybody I don't know in the industry in the car manufacturing or every place that there's no update for this firmware they really need to take cautious and actually go and fix this problem and use an updated version of this device. There is also a challenge here which if anybody is interested notice that the Intel the Infinion TPM device which is even a more popular dedicated TPM solution also has a timing variation in the SDSA computation but with the test we did we couldn't find any proper correlation between the nonce and the timing behavior but it could be something that we didn't pay enough attention to it. So with that I would like to take questions there is the link to the website the source code and this paper is going to be also presented at the next next conference in Boston the upcoming summer. Thank you. Okay, questions. Kenny. Hello, thank you for the great talk. Could you say a little bit about how remote the server was or the client was in the remote attack setting? How many hopes? Yeah, so we configured a switch and two computers that are talking to each other over this normal gigabit switch. By a switch? A single switch? Yeah. Okay, good luck doing it in a one. You can try. Yeah. Why I think on cloud environment most of the time the host chip and the TPM chip are separate what kind of mechanism we have to make sure that communication is secured to? So that communication I've seen some people have done some work on tapping on the LPC boss and for instance making things there but in this particular case that you're attacking the public signatures we don't care even if that communication is not secure because the secrets which are the nonce and the private key they will never be even communicated from the TPM device and the CPU. But in other case scenarios for instance for example may try to send the private key to the TPM device or may use the TPM device in a way that they program the device then the communication again can be tapped and that's an attack that has been known actually for a long time. Thank you. So for your last question so you said there is no correlation between the time and the nonce the ways of nonce did you check if there is any correlation with the inverse of the nonce? We did some checks like for instance we did bit frequency correlation test and we did leading 0-bit test and we couldn't find anything with that but that could work the one you say that maybe there is a reverse correlation or another thing we thought maybe there could be an encoding before using the nonce like they may encode the nonce to some boots recording or WNF representation and then they could do the computation but we really just ran out of time and energy to actually do more analysis. Question to Serge, why would you suggest the inverse of the nonce? That's a good question. Okay, other questions? I maybe have one. Did you in your discussions with the vendors suggest fixes to them or did they come up with them and you check whether it worked? I think in 2008-18 just a couple of years ago last year we shared a tool with the macro walk and we analyzed the cryptography library and we already found the similar issue then and they already fixed it there so the only thing they had to do was to update the firmware which was a surprise for us that it took 9 months for them and they asked us for such a long embargo date. So we had to do that and it was a new TPM device with a new laptop and then we verified it and like okay this is now fixed. Thank you very much then, thanks again. Can't turn it to the side. Okay, our next and last speaker is for today is Roberto Ivanzi. He's going to talk about memory protection for the memory protection. What I'm talking about today is for general purpose computing but since I work for an employee that shares my initials I will have to specify that what we have been doing is on the ARM architecture. So how does the remote work? Is this supposed to work? The other one. This one. So this work has been done by team at ARM, team for analysis, research and development in security together with a host of other people from various institutions and also other companies that are in our partnership and building and standing on the shoulders of a few giants that have worked on this area for us before. So the topic of my talk would be confidentiality of memory contents and now this comes a mouthful and memory integrity violation detection. And why? Why am I about to stand between you and the reception? Well, the reason is that memory contents are important. We are stakeholders that put some of our assets in the memory of our computer and as soon as you have these, well, you have a security problem and so the usual cat and mouse game begins. So RAM can be relatively software then use access control. You have called the boot or platform reset attacks. Then encrypt things. Use ephemeral keys to avoid things to be correlated across boots. Do you have attacks that can adaptively modify the contents of your memory then use some freshness so that you encrypt the same thing twice you get different things. We have seen these things this morning when Iwata-san gave his fantastic speech and of course add integrity to prevent these modifications. And indeed this is not a new problem at all. There's a great abundance, a great affluence of commercial and academic solutions starting from the seminar work on some edges, edges version 2 bastion and going to commercial solutions like SGX, secure blue plus plus and many others. So to be clear I'm a technical person at ARM and I'm not announcing any product, no any feature today. It's not my job and my employer pays handsomely other people to do this. But of course we would just be a giant bunch of idiots if we were not studying this problem. So I will be limiting myself to tell you what we have studied to protect memory contents cryptographically. Okay security problem, let's talk about threat models and there is a big misconception at times. People say well we have client devices and infra devices which means everything is in clouded edge, the fog. This is the wrong assumption because memory protection is always needed when the owner of some software or some data does not want internal state of their stuff to leak or to be tampered with while it is running and beyond. This might be some software module running on your device, handling some DRM scheme, but it could also be your application or virtual machine running in the cloud. And there is something in common between these two scenarios. In both scenarios the software runs on somebody else's computer. And in both scenarios again an attacker might use software running on the same platform or some hardware manipulation to get at your stuff. And even the hardware's owner can be an adversary. Never trust somebody that says trust us, it is running on my platform and I'm keeping it safe. So what is in the security perimeter and what is not? Well, okay, the CPU is there, fine. The boot ROM, of course it must be there, otherwise it can't configure anything for security. And what else? Well, there are two scenarios where one splits into two on his own. The first one is that the memory might be internal, which means it is on the same die as the CPU or it is package in package. In the same package as the CPU masters that need to process the stuff, maybe with some entire time per measures. In this case you cannot interpose the memory, you can't put really something very easy at least between the CPU and the memory. So you can assume that the memory device is essentially trusted because it is inside your security perimeter. So the threads are mostly called boot attacks and platform reset attacks. And variant of this scenario you might want to consider things like row, hammer or fault injection as attacks that can be mounted on your device. Hello. The second one is when the memory is external, that means it could be socketed, it could be sold on the motherboard. Nucleus guys, which of course I have no idea who they are, they designed these things that you could price a square on the motherboard and they have the two chips but they are sold on the motherboard. They are close to the chip but they are sold on the motherboard. They actually did this on an iPad. Or you can have the package on package, so you put something on a ball grid and lay on your motherboard and it also has socket on its own and you put something on top. In all these cases you can actually easily interpose memory and don't trust people that say this attack is difficult. So threads here are augmented by actually bus reading and you can see the transaction tampering. But we are lucky, there are insolutions and I already mentioned them. A couple of slides ago. So you encrypt all the memory. Okay, you are considering some easy modifications of the memory, some fault injection, row hammer, well then hash all the memory. And if you want to defend yourself against those attackers that can actively modify things and replace stuff into your memory, then hash by the way it was a very nice day to see Ralph Merkel. So we are celebrating. I did not know. I did not know he was to be here, so that's really a coincidence. I didn't have it this morning. So we are done and you can go to the reception, right? Are we? Are we done? No, actually. And why is it a thing? Well, you know, stuff costs. You develop things, you put things into silicon, you add some hashes and things and this is going to make the performance slow. And there is a very, very sad truth in our business. Because there are two things that nobody expects. Which is the first one? Nobody expects? What? No, the Spanish Inquisition. The second thing that nobody expects is that any piece of technology that kills performance and waste this memory or this or whatever for whatever purpose will ever be widely deployed, unless it is of course some spyware, hardware or bloatware including blockchains. They had a much harsher joke here, but legal of the company told me, no, no, no, no, don't call them what you're saying, use this word, okay? So we have some nice solutions already on the market for these. So for instance, SGX has a very sound cryptographic protection. So until we get to the other guys, I'm actually we are competitors of this actually making a compliment for you, but there is a 26.7 memory overhead and a 25% performance penalty now testing. On the other end of the spectrum, you are the secure extensions for virtualization or something like that by MD, which on paper offers you much better performance and no memory overheads, but encryption is not announced and there is no security, so it is almost as if you had nothing there. So I decided to try to have something let's say ideally with a first level of security, with a second performance impact. And so I started a project which is called the MOPED at ARM, it stands for memory opacification, this word exists in the Marine Webster, performance evaluation and design. And so as any project of this kind, it contains a survey of the state of the art new ideas, because we are an IP company benchmarking and selecting the winners. So let's start. The first thing was to collect requirements so we had to consider confidentiality and for these we want top-secret security level against various types of adversaries. Why? You say well you have live things, you have things in memory and things at rest. The second one needs a better protection, which is wrong because an attacker will always try to take the lowest hanging fruit. So if you're using a weaker cryptographic protection to store some important data while it is in memory, then you grab it from the memory and not from the disc. We want 20 years of classical security at the minimum and adequate post-quantum resistance if we can do that without spending a penny, which means we want at least 128, actually 80 bits of security, both classical and post-quantum, which means 256 bits worth of keys because post-quantum for integrity we take the Wikipedia definition it shall be computationally infeasible to corrupt memory but forging the integrity structure and we have the same time restrictions on that observation and we can talk that offline. We concluded that we needed 60 bits for the max, not 56 as some of our competitors because that could be soon borderline and we include 64-bit counters at least in their computation. Okay. So our state-of-the-art review led us to select some potential primitives for these which are the AS, of course which is a standard. I claim it isn't really because when this great Cypher was designed, memory encryption and these type of applications were never on the scope. The Oxys, which is a version of a turned into a tweakable block Cypher I don't have to define this because it was mentioned this morning as well I'm quite lucky today and then there is, of course, Karma because I designed it to say I want to use my own stuff and it was developed for the right size it is a public domain so if you want to use it you don't have to pay anybody it has a solid theory in my very, very not-so-humble opinion and it has been well-analyzed the main sign. There is modes of operation which basically are direct encryption it means that the obfuscation path goes through the Cypher or what we call one-time pad encryption, OTP it's very common in the industrial world to abuse this word which means that you generate a mask you use the Cypher to generate a keystream and that it's stored to your secret and the various types of hashing and mach mechanisms and now I promise you headaches a few examples will follow and there will be a lot of diagrams I tell you one thing it is to make you empathize with us because we implemented all of them and many more so our headaches were even bigger so of course we start with GCM which is basically counter-mode followed by a polynomial authenticator in a Galois field that's why it is Galois counter-mode so we all know why it works we have a key there and we use it to encrypt an initialization vector which can be in this case the memory address and the counter put together and then for each block inside the same cache line you concatenate also a different number and you use it to generate by encryption different masks which are then used to get the Cypher text from the plain text and what you do you use your Horner scheme to evaluate a polynomial in a magic value h using the blocks of the Cypher text as the coefficients of your polynomial set it with an h you multiply the c0 by h you add to c1 you multiply by h and at the end then you encrypt again with the same technique your value there to get your tag we all know this thing or a counter-mode but you can do something finer this is one of the beautiful ideas of the Intel's memory encryption engine the second time I say something good for you and which is to actually chop the Cypher text into blocks, multiply each one by the different secret magic value which is generated randomly at boot and you then add all these things which can be done in parallel or pipeline and then of course you encrypt these and you get a multilinear universal hash function encrypted as your authenticator fine and then of course we have a very even more complicated thing I don't have to describe this again thank you Tatsu because it is ocb which we know it is encryption direct encryption so you see the data obfuscation path there is some mask before and after and this is the XEX construction that's the XEX construction and we know it is a very very fragile construction but if we limit ourselves to this it still stands on its feet it's inefficient but it's there and this is just a limited set of examples of encryption techniques we can use and of course we can use a trickable block cypher which we know we can get to the cypher text from the plain text by using the key which must be kept secret but there is another parameter called the tweak which can be even manipulated and chosen by an attacker in the security model and they should not be able to violate the confidentiality of your of your stuff by manipulating it and so we get the trickable codebook mode and here's the scene today we have only one key here the data obfuscation path goes through the encryption function but the key is always the same there is no masking before and after but you get your notes concatenate them with different values and a magic value here for the tag to the encryption which is actually much simpler conceptually at least to see to design a good trickable block cypher because I hope I was able to otherwise I'm in big trouble but we can use this in a very nifty way also to simplify some security proofs countering tweak encryption so we use the trickable block cypher to generate a key stream to sort the plain text to get the cypher text the nice thing is that we use the same counter here so when you write to the same cache line you have to change the counter but that's how you get the freshness so the cool thing is that you use the address here so each location in memory has its own function to generate this key stream and it can be seen as a pseudo-random function not a pseudo-random permutation of the notes that simplifies a bit the proofs because you don't have to resort to the switching you have tighter bounds and things like that okay and of course you can do the same trick you add a multilinear universal hash function encrypted as an authenticator and you get this and there are many variants also for these but these were just a few of these but these were just the easy half of the story because integrity is actually much worse so we have the Merkel tree we have seen these you chop the memory into blocks you hash them, you put the hashes bigger blocks you hash these until you get something that stays at the top and in order to make sure that these can't be manipulated you store securely inside the security perimeter the tip of this Christmas tree and then of course we have counter-trees who knows counter-trees well it's a nice structure because if you see this one there's a problem you can verify things in parallel you just have to verify all the hashes but you have to update these you have first to hash these and then you hash again these to get these and then you hash all of these on the top to get a top value so this cannot be done in parallel so in whole in Utah and then other people devised techniques which now we call counter-trees to be able also to update these structures in parallel and the idea is that you have a live memory region and you have your data blocks and you have a special region for the max so you take your data you have a counter associated to this data you see each block of data has some counter and so you put them both inputs to a keyed has function and then bang you get your tag and then you do that for each block then you have a third region of the memory which is the counter-tree region where you actually put the counters and these counters are put in a cache line and you have a block together with your tag so instead of being stored somewhere else you store them all together so you hash the counter with the counters which is the data inside your block in order to get the little tag and you continue the nice thing about this structure is that it cannot only be verified in a parallel way like a miracle tree but it can also be generated very computed and the top counter stands securely on chip and then you have this monster which are split counters which I only put here to show off my prowess with the ticks and that's the same thing you have data you have a keyed hash and you store here in a tag but what is all this web of things above instead of putting 8 counters of 64 bits in a 112 bit line 64 small 6 bit things and 164 bit major counter so whenever you write say to the 0 little c0 is updated but of course cc is only 6 bits after 64 writes cache addictions these will overflow which means that when this one overflows you reset all of these to 0 you update the major counter and then you have to re-compute all these tags here so let's say well it's a lot of work yes but since the arity of the tree has increased a lot the trees gets much smaller, verifications get faster the information about the counter is much more compact because you store 64 values where usually you store only 8 and so you reduce cache pressure and so with this nifty trick which is from a conference in 2006 I don't remember which one micro I think then you get much better performance and there are other techniques there are the tech, the pat and whatever you want to call them and I even started talking about variations on the theme of split counters so don't get me into this still have 10 minutes right? oh my god so to many variants say we have no encryption encryption only we have direct or OTP encryption we have monolithic and split counters we have also the ability to say hey we can save in the Mac region by using instead of just one Mac per cache line we combine two Macs and hash them together somehow to get a one value H2 cache lines or H4 ones and then we can verify the Macs in a synchronous way with the encryption or not you can see there are easily about 240 different combinations we can have it is odd because nothing has no variation so we select some and we have them fight each other on benchmarks the benchmarks are spec 2006 and 17 on full system emulators for the A57 and A75 the A75 emulator is a bit buggy so some tests just don't run at all but it's a little closer let's say for instance to the Graviton 2 CPUs that Amazon uses they use A76 it's a bit faster and we emulated the crypto hardware by inserting latencies in the data pads of the emulator and these latencies are real world latencies because we have synthesized the cryptographic hardware we are going to use so that's what you get for instance for the lowest level I'm going to be very quick you get for instance if you have just AX for AES so similar to what AMD is doing here 23-24% impact and we get around to about one something for the other versions so this is actually the cheap version just encrypting things so let's skip to 2017 let's make these things fight so if you are using just simple encryption next EX mode we have about 5% this thing doesn't work anymore 5% yes 5% if we use Karma instead of AES 128 we are really on a negligible level of 1.6% we can use this with freshness and split counters we are still there we add integrity we increase the significantly the penalty we start using minority counters with a direct encryption and no freshness and then we use actually the full Santa Barbara that we have in our arsenal which is also what Intel is using but tested here on a lot of suites and we get about 25% performance penalty if we use split counters we go down to this which is about 8.7% we make the granules bigger to save some memory on the max we can get to about 10% with double granules and then it gets a bit uncomfortable if we use 4 different cache lines into one single mac and you see the nice takeaway here is that if we go from split counters here the same thing but you do the verification of the max asynchronous with the decryption which might open a very little window for said channels to be very careful there we go down to about 7% and we have more tricks to bring down to about 5% actually but I'm not allowed yet to talk about this I'm filing an IP today so that's actually the takeaway of this slide with the current state of the art on the market for good protection so what Intel is providing we get a 26.7% performance penalty using split counters instead we get down to 7.8% what about that's the memory overhead so we have the 25% memory so slow down and 26.7% memory overhead and here we get 7.8% memory overhead so the takeaways this can be my last slide or not depending on the questions we have done the first comprehensive comparison of the field, a paper is upcoming we tested we selected a few I could have shown you 25 different diagrams and you would have a horrible headache we have significant improvements with respect to the state of the art which means, for instance, with something like this an 8.7% performance penalty and 7.8% memory overhead they're both the psychological threshold of 10% so that might make this type of technology acceptable in a wider sense sacrificing some of the feature does not give a big performance or memory overhead improvement so it's really better to use the whole banana what you get somehow is that the memory bandwidth the actual memory usage increases by 83% we are going to use almost twice as much memory traffic as before this is the only price which looks bad but if you look at this, it translates to about less than 10% performance penalty you can accept this so I'm almost ready I want to tell you one last thing I want to thank the NIST are there people from the NIST here? yeah so they rejected Chameleon and for the wrong reason because it was my system for encrypting memory that is proposed to the lightweight standard because of what I put in the spec really, but with respect to one year ago we have a much, much better understanding of memory protection requirements and so better techniques so every cloud has a civil lining thank you actually, I'm sincere we will submit these techniques to your open call for modes so it's not easy to get rid of us but I'm really sincere your rejection actually prevented all of us to have potentially suboptimal things in standards so this is all I wanted to say can you do better? we are in New Amsterdam can you do better? well then if you think so slap your thookies to your work desk and show us what you can do next year in the old Amsterdam thank you very much okay you have time for a couple of questions since everybody is hungry thirsty so was the typo in the spec the Q in Chameleon sorry what? the spec was that the Q in Chameleon? no, no, no that was actually intentional I have much worse things of that like that my humor sometimes gets a bit overboard it was really, I made a mistake in writing down how you compute the trick separation for the authentication tag so there was such an easy forgery that I wanted actually to take a shovel dig a big hole and hide myself there forever but it is not the first time I made a mistake in crypto it happens on the slides you compared the 25% with the 7% from your crypto system the 25% was on the 512 bits and then 7% was on the 1024 bits and the comparison was not on the same level of bits because so you can do this of course but it is not what is now on the market so comparing what we could do we are actually implementing for silicon not announcing any product feature but this is what is on the market now this exists these all do not but this is what we could be aiming for that is the reason and that also needs additional techniques to make us capable of combining the max without having to do too many re-computations which are also original ideas so if Indel wanted to compete they could offer 13% we could say we go from 26 have memory over to 14% and they have actually a better better performance but I would prefer to take a little hitting performance and have a nearly halving so I am not sure I understand the logic behind needing such a large Mac after all the authentication attack can only really make sense when the machine is powered on and the lifetime of any machine is usually not decades yes you could use a 32 bit one but my partner is asking why can't we do a 48 bit one this is still up to debate so to be honest I would feel a bit uneasy having 32 bit max but you could have them in the bottom this is something that is still open but since I am not still completely secure about the extra security, sorry for these redundancy redundant redundancy at the moment we still test 64 bits if we agree that having 32 bit max at the bottom is okay then we go for them if no further questions let's thank all the speakers of this session and Tom has a quick announcement just two quick announcements I literally just found a coat check number 230 so if you want your coat and didn't take a picture of it you should come up and get it and there is now a reception going on the stairs in the smaller breakout room so please enjoy thank you