 Hi Thanks, so yeah, I'm I'm Laurent. I'm from the University of Cambridge And I want to talk to you about how we can erase secrets from RAM So in the rest of the talk, I'll be considering a non-malicious program P Which uses some kind of secrets? password or crypto keys and Once the program is done using those secrets you try to erase them from RAM Now ideally what we'd like to guarantee is that if we then get give access To an attacker to the entire system's volatile memory and CPU state this attacker is still unable to recover the secrets in Practice this is really challenging because there's an enormous amount of code Involved so it's not just about the program P that we're looking at. It's also kernel code libraries US libraries codes running in peripherals and so on and so forth So in the rest of this talk, I'm going to relax the threat model a little bit And I'm going to give the attacker only access to the program's user space memory and CPU states so in practice the main concern that people have when it comes to securely deleting Data from RAM is compiler optimization and there's a simple example on this slide So we have a sensitive functions Which declares a sensitive buffer on the stack and before the function returns the programmer is calling this zero mem function Which aims at zeroing the buffer Sensitive buffer and the main concern that people have is that the compiler might might actually remove the call to the zero mem function Because as far as the compiler is concerned this sensitive buffer is going out out of scope when the function returns So it's kind of wasteful To erase this this variable since it's never going to be accessed again in the context of the function And later in the talk I'll give you some more examples about problems that occur in practice, which are actually not related to compiler optimization So what we realized early on in this project is that there was no tool to actually help developers assess their program So we just decided to create a tool The first approach is to use static code analysis analysis, which is based on analyzing the source code of the program and Because of this it cannot account for compiler optimizations Neither can it account for memory accesses due to the application binary interface or register spills So for those of you who haven't heard the term register spill before it just refers to the fact that as a compiler Runs out of registers to use it will basically spill the registers on the stack that is copy them over to the stack Temporarily to be able to reuse those registers So because of the problem of static code analysis We opted for a dynamic code analysis which run the actual binary which is generated by a compiler That in practice gives us virtually no force positive at the expense of needing a comprehensive set of unit tests to have good code coverage So we use a simple technique well-known technique called 10 tracking So think of the tank as being a zero or one and When a memory location is marked with a tank of one it just means that the this memory location contains sensitive data Then we declare 10 sources from which the data becomes tented So typically that would be a file containing sensitive data say your private key And every time the the program p reads from this from this file We're going to tense the memory location where the data is big is being copied into And then as the program continues its execution we need to be able to Propagate the tent accordingly. So we do this in two ways First we do this during Assignment so typically if you have a tented variable Copied over to a new variable this new variable also becomes tented and We also propagate tent based on pointer arithmetic operations So think of this as just being a table lookup where the table itself is not tented But but the index that you use for lookup is tented and that's used a lot Every time you do format conversions say between binary format and base 64 format Because we're dealing with cryptography We also need a way to untaint memory locations that contain the result of Function that are considered one way so typically an encryption function or a hash way functions So long as the the input to the hash way function has high entropy So we implemented all those ideas in in a new tool that we call secret grind which we make available in github And we've started evaluating three crypto libraries with it GPG open SSL and embed TLS Now surprisingly, we haven't found any problems because of compiler optimizations Now to be fair all those libraries have a hardened version of this zero mem function, which is precisely Implemented to avoid compiler optimizations However, if you if you ask the compiler people they'll tell you that the this hardening is kind of it's kind of a hack and It's actually not bulletproof But it seems that in practice it's reassuring to see that the this hack is kind of working at the moment and In fact most of the problems of data being left over left in in data in left in in run in practice Actually boiled down to programmers mistakes. So for example For getting to raise a buffer on the on the stack or on the heap more interestingly we found that the IO API is tend to do Caching and that leads to certain Problems in practice. So let me give you an example the GPG program When it's trying to detect if a file contains private key or a public key What it does is it tries to read the first line of the file Which typically if you have if you're using the PEM format will contain, you know An ASCII string saying public key or private key So if you as a programmer what you do is you you'd open a file handle and then you might call this effort function to read the first line of the file Alternatively you might call this M-map function to map the first line of the file into memory and once you're done looking at the data You take care of zeroing the the buffer It turns out that this really simple piece of code actually doesn't erase Memory property. So the M-map function Actually works at the page at the page level So even though you've asked only the first line to be mapped into the file You actually get the entire page an entire page's worth of potentially sensitive data into your process And the effort function is also doing some caching in the hope that if you call the effort function again The data is already available and the you don't have to do a syscall Other thing that we found is that we've identified a set of function that are really prone to leaving residual data on the stack So typically formatting functions such as the printf and the scanf families fall into this category But more generally function that are recursive tend to aggressively spill registers on the stack And so beyond these functions most of the data you'll find on the stack actually is caused because of the ABI calling conventions and register spills So here's a challenge because as a programmer you don't have control over these However, the compiler knows about the stack layout. So here's a sweet spot Where we could actually get a compiler support to help the developer erase the sack I'm going to elaborate on this idea in the rest of the talk So as I suggested the idea is simple We'd like to get compiler support and if actually effectively we'd like to get the compiler to automatically erase the stack For us on functions that the programmer annotates as being sensitive And we think that having this annotation based mechanism is a simple way that programmers can add this feature in their code Which is already available today So what we did is we Implemented a compiler plug-in in the Klang LLVM framework, which is a widely used compiler framework But before I move on to describing You know what we've actually implemented I'd like to give you a feel for the number of problems that arise in practice when you try to implement a solution like this So first of all, there's a large amount of code Which is provided by the by the platform where the code runs and this code is not We can't instrument it at the time that we compile our user our user space program. Okay, so typically The libc and the loader linker fall into this category, but you also have code provided by the kernel Small piece of code called the VDSO, which is mapped into user space program as the program starts Signal handlers can also be problematic So what happens here is that before the kernel jumps into your signal handler It pushes on user space stack the current CPU state of your program So if you're in the middle of a decryption routine What what is being pushed on your on the user space stack is probably going to contain a lot of sensitive data We also need to be careful about registers For example, the RPP register used to originally was used to store a frame pointer so the basically an address and In the new on X on 64-bit machine it can also be used to store data So it might contain sensitive data and we have to be careful about this About the compiler. So I've already talked about this a little bit most of the problems occur because of compiler optimizations Because of time constraint, I'm not going to give you more detail about this I've put up the slide anyway. So if you're interested in this you can you can look it up later and Of course the the programmer the developer might also get in the way of proper deletion so Variable size object Stored in the stack is something that we can't really support because it doesn't allow the compiler to determine the the size of the object at compilation time And there's also some some function that can be problematic to support For example, this signal stack function allows the programmer to change the location of the stack for signal handlers So we've tried to take care of all the problems that can arise in practice In order of course to implement our solution. So the first solution we implemented is a naive solution Which works at the function level? So here the idea is very simple We instrument every function in the program even if they are not marked as sensitive by the programmer And we erase the stack and the register used at the time that the function returns So this turns out to work to perform really poorly in practice on the left side here This is the case when you want to support signals and as you see it's about almost four times as slow as the original program And if you don't care about signals at all You're still about twice as slow as the original program, which is fairly poor So we looked at another another approach and here the idea is Again we instrument every function in the program But this time in order to keep track of how much stack memory is being used at runtime And we keep track and we keep this maximum memory usage in a global variable Then in functions annotated by the programmer we erase the stack Using the the value that we have in the global variable And we also erase all the registers all the platform registers at once in this in this annotated function And this turns out to actually give you a significant boost. So it's about you only get one percent performance overhead in practice With some outliers, but I'll ignore this for now So can actually can we actually do better? And it turns out that we sort of can with some caveats That's the third solution. So here the idea is We leverage the core graph which we know at compilation time in order to compute the maximum stack usage That might ever be used by this function and once we know the maximum stack usage we can just Erase it in the function annotated as being sensitive and For the registers we erase all the registers that are written to in the entire core graph to be conservative So arguably this this approach is the best in terms of performance because we are not instrumenting any function besides The sensitive function But it comes with two major drawbacks The first is that it kind of kills the concept of a shared library Because you need to know a compilation time, you know, which function what version of the function and actually the code of the function That is being called. So this is better suited for a statically linked program or embedded systems and second There's a bunch of features that we cannot support. So typically recursive functions and more generally functions that create cycles in the core graph And even more generally every feature which leads to a non-deterministic Core graph is hard to it's hard to support because we can't determine at compilation time what the core graph is going to look like So to conclude So I've presented a new tool which we hope People will use and help her help you guys Check your code. I've presented a the implementation of a claim LLVM Plugin to automatically erase the stack and the registers of sensitive functions So what I'd like to point out here is that this plug-in is kind of a hack Unfortunately in the sense that is it's really fragile because of the complexity of platforms and because of the number of Components that we need to to consider for the solution to work reliably So that kind of raises the question of what is the best way forward? Do we actually need specific a bi for cryptography? Do we need what kind of support do we need from the kernel? From the compiler from the programming languages and these are the sort of questions that we ought to think about if if we want to solve this problem Thank you So we have time for Dimitri and then one more beer Hi, so I'll try to answer the very last question. You might have mentioned that in the c11 language There is already a built-in support for the version of memset. It is guaranteed not to be optimized And the only problem is that it's implemented in one of Possibly in only one compiler Today this the Mac version of selenc I think so what we could do is to just to press on the Compiler developers to finally implement an xk functions from the c11 and then we'll have the secure memset Which is not optimized away Yeah, so I Agree, but memset is not the only problem. So it doesn't solve the stack problem. It doesn't stack. It doesn't solve the problem of programmers forgetting to erase the Buffer of memory. So that's only one side one part of the problem really Yeah, I worry that having Support from the compiler and the kernel for this is not enough That there are probably a lot of kept a lot of caches that are not documented We don't know about in the in the in the path from this from the CPU the memory through through the through the chip set and so forth Are you aware of any concrete concerns in that area? So as I said earlier, this is a challenging research topic and I've really just taken the first step to lay down some kind of foundations and I'm only looking at user space program and What you're mentioning is something that I kind of alluded to in the I think the second slide where I said There's an enormous amount of code to to be considered and basically what you're saying kind of falls into this category So so you need a lot more research to figure out, you know, what's happening in the kernel? What's happening? Maybe on the bus. There's also some caching being done in the RAM itself So there's a huge amount of code you need to consider. Yeah, so I don't have I don't have an answer for this Wait, let's take the wrong again