 Mem said, that's the next title of the title of the next talk, why cleaning memory is hard. Ilja van Sprundel is a security researcher who loves to find out new things. And he found out that it's quite hard to get rid of sensitive content in the memory. Yes! So, today he's going to give us an overview, presentation, please give a warm round of applause to Ilja van Sprundel. Okay, perfect, yeah, so as the Herald just explained, my presentation is called Men Sad, why clearing memory is hard. Before I dive in that, once upon a time that was me, a lot more hair, a lot less fat. This is my 17th Congress in a row, I spoke here a number of times before, I haven't added 35 in there yet, but obviously that should mean there as well. I work for a company called IOActive, I am the director of penetration testing, but that really just means that I lead teams of pentesters, no double entendre there. Obviously we're always looking for good security people, so if you're interested come talk to me afterwards. I like looking at low level stuff, kernels, drivers, hypervisals, that type of stuff and I enjoy reading code. Okay, enough about me, what's the kind of audience that I think would enjoy this? Pretty all around security people, crypto people, if you like code review, if you like compiler stuff, and if you're just generally curious about technology, I think you might enjoy this. In terms of the knowledge required, the first half of these slides is relatively basic and so if you have a basic technology understanding, you should be able to understand most of the first half. If you have some C background that would be nice and then as I move forward past the first half, things become a bit more advanced, but if you just, you know, only grasp the first half, I think that will still be useful, right? So what does this talk about? Basically it's one very simple, easy to explain crypto implementation problem and the reason I've dedicated an entire talk about it is because while the problem is easy, the solution is not. There's a lot of moving parts, a lot of subtlety, a lot of nuance and I'll get into that in a little bit. Now I can hear some of you thinking, well, Ilya, WTF, in 2018, why the hell are you talking about this? This is very, very, very well known. To them I say, well, because this stuff is still everywhere and I will show that in the slides, I will show this with data, I will show this with bugs, but the driver of this talk, the reason why I start making this presentation is that this year alone, I did engagements for three different customers on three entirely different software projects where they all had this exact same type of bug and, you know, you tell them about the bug and the customer comes back and says, okay, well, yeah, that's great. We understand. Now tell us how to fix this in a portable way and that is not very easy. The other thing is that even though this problem is sort of known conceptually, like, you know, people kind of blase about it, practically not many people understand how pervasive this problem is, how realistic it is. It isn't just like, oh, well, the compiler might do this, no, the compiler will do this and it does it everywhere and these bugs do show up everywhere. Because it's hard to tell from the code that it is there. If you look at the binary, if you look at the compiler omits, you see that it's there. And then the third is given that one of the teams of the Congress this year is Foundations and to talk about, you know, things that aren't necessarily new, but sort of try and help bring the subject to the next generation, I think this fits in perfectly in the concept of Foundations. Right, so before I dive in, there's a couple of people, actually a long list of people that sort of helped me out. As I said, the problem is well known. It's been well known for at least 20 or 30 years and so many, many, many people have published papers and presentations about this and I don't know all of them personally and I wish I could include them all, but the people I've included in here are sort of the one or two away from me that have had some kind of impact in these slides. Some of you are sitting in this audience and your help has been appreciated. Okay, so let's actually start. Now let's say you're going to write some piece of code and it's going to be doing something and it's going to be handling sensitive data, be that keys or decrypted plaintext or session tokens or passwords or password hashes or anything that could be considered sensitive. Now, if you're a smart security conscious person, the moment you are done with that sensitive data, you want to dispose of it, you want to purge it from memory. Now why do you want to do this? Well, because otherwise, if there's some kind of infolique that is discovered later on, then whatever secrets or lingering memory could be used in your infolique and all of a sudden your tokens or your keys are leaked out. That may sound like a stretch, but things like heartbleed happen. This is very practical. This can really happen and if you think and make the step where you say, okay, I need to dispose of sensitive material once I'm done with it, that's really big. Most software that deals with sensitive material does not do this. So if you make the step thinking I need to purge this, you're ahead of the curve. So now concretely, it would look something like this. This is a sample code I have that basically generates a little key and it's a function you give it and you give it a key pointer and this thing declares a local variable that serves two bytes, goes and reads a bunch of random bits, puts it in K, copies K to key and then before it returns, because K is about to go out of scope and it contains sensitive key material, you go and say memset and then you clear the thing and then you return, perfect. You run this, you compile it, you add a main and it does exactly what it's supposed to do. You look at the assembly and it's all perfect. Problem there is what you're doing is that's not release based code, right? When you sort of make code ready to be released, you sort of tell the compiler that it should enable the optimizer, right? You'll give it OS or O2, those are the most common ones. Sometimes you see O3 when people want to live on the edge, but usually O2 or OS. Now if you look at the assembly again, you get a whole different picture of what's going on and I want to illustrate this and there's a website called Compiler Explorer which is beautiful and integrates a whole bunch of compilers and it has on the left it shows you the C code and the right shows you the assembly and it's like color based and it's easy to make connections. So let's take our little example and on the left we see the generate key and on the right we see the compiler and sure enough if you follow the colors in left and right you can see what C code translates to which assembly, right? If they make it a little bit easier, that memset clearly gets translated to assembly, right? What do you do minus O0 which is the default which is what you would do if you're developing code and you want to debug this stuff, right? Now once you're done developing and you're about to ship this thing you do minus O1 for example and the optimizer kicks in and all of a sudden your assembly looks a whole lot different. You'll notice it's shorter, you'll notice that all of a sudden the color of your memset changed whereas in O0 it was the sort of red-ish and all of a sudden it became white and it's nowhere to be found in your assembly, right? It just does not show up, right? That's a problem. Okay, well what happened, right? Let's, yeah, I stole that. So what happened is a thing called dead store optimization or dead store elimination and so basically that memset at the end that what you're doing is you are writing into a buffer that is never, ever going to get used again and an optimizer compiler looks at that and says hey, you know, I can just take that memset out and I just saved you a couple of cycles and you have a smaller binary, huge win and because it doesn't really change what the program does, it's fully compliant with all the relevant language standards, right? So that's essentially, that is in a nutshell our problem, right? And so one of the things I want to do is I wanted to look at all common compilers and see for which of these I can get it to effectively, practically optimize out a memset like this. I mean I had to, with some of them it was easy, some of it was harder, I had to fill around with it for some straight up memset works, for others I had to like kind of twiddle and make a for loop or you know kind of jump around a bit, but essentially these are lists of 10 compilers I tested, I tried to get my hands on the IBM compiler but I don't have $20,000 so I couldn't do that. But these are the ones that I did test and then, so the first five or the first four you will know, you know the GCC and Klang and the Intel compiler and the Microsoft compiler and they're all also on the compiler explorer so it was easy to test those and it was very easy to get those to optimize out memsets. And then I moved on, downloaded a bunch of others, you know the Sun Studio compiler and the Barcadero C++ builder and the ARM compiler and a bunch of others and out of these 10 I was able to get eight to optimize it out, right? 80% of the most common compilers do this in any, in a practical sense so it isn't just like a theoretical thing, this really happens. A funny note, I tried really hard to get the PGI compiler to do it, in fact it has a switch called, that's for elimination and of course I played with it and I tried it and I spent over an hour trying to get it to optimize out my memset. God damn thing wouldn't move. I don't know what it, I don't know what it's doing there, I couldn't get it to do anything. But basically most compilers, if you ask them to do optimization, will gladly optimize out a lot of memsets. Right, so the next question is how common is it to actually see projects use optimization, right? And this sort of stems from a conversation I had earlier this year with a couple of colleagues where a bunch of people said, well, you know, I don't see O2 or O1 or OS all that often, I don't think optimization is all that common. And so I start looking around and I said, okay, well where can I get some data? And the first thing I was, so I was okay, well I can go to opsewers.apple.com and that lists about 200 projects and I'll just go through all their make files and look for O2 or OS and so on. And there's about 100 out of there and then I realized that they actually don't use make files, they have a really bizarre build system and that build system by default uses OS. So even though it says 100 out of 200, it's probably closer to 200 out of 200. And then I had a whole list of programs I wanted to test in like FreeBSD and you've put into it a bunch of Linux distros, but that's pretty boring and I ran out of time, so I kind of stopped there. But these numbers should be good enough. In addition, if you look at the common IDEs, in particular Visual Studio and Xcode, when you tell them to build a project in release mode, Visual Studio by default does O2, Xcode by default does OS, right? So the fact that these tools by default give you optimization should make you confident enough in knowing that yes, in fact, optimization is incredibly common in release builds. It isn't everywhere, but it is almost everywhere, right? So now that we know the problem and now that we know it isn't just theoretical and that we know it's practical and that in fact it does occur very, very often and with most compilers, in fact, basically it's a real problem, how do we fix this, right? And this is sort of where things get difficult. There are many sort of solutions, nothing is portable, right? It's sort of the, okay, well, this solution works if you use this compiler and this solution works if you use this Lipsy and this solution works if you use this OS and this solution works with this version of the language spec and this solution works if you have this particular executable file format, right? So, and before I dive into any of those, let's first talk about the elf in the room. Don't just roll your own. I've seen people do this where they go like, well, I'll fight with the compiler. I know what I'm doing and they'll just, they'll kind of leader where Jenkins dis and they'll totally screw it up and they'll come up with some really stupid idea. One of the ones I heard was like, well, I'll just do IO with the buffer and then it's cool. Yeah, you could do that but then you're doing IO, right? For no reason. So don't just roll your own. You're gonna come up with a solution that's probably stupid. You're gonna look really stupid. And it'll be one of these things where, okay, you're sort of, your bad solution might work for this particular version of the compiler. But if you don't understand the concepts behind it, then chances are the next version of the compiler that is somewhat slightly smarter will sort of just bypass whatever you implement it. Or if you wanna roll your own, at least listen to the advice of the next 10 and then base your solution on at least some of the advice that I'll be giving out in the next couple of slides, right? So with that, let's move on to actual solutions. The first one is a Lipsy function called explicit B0. And this is not part of any standard, as far as I can tell, at the present time. But this was sort of concocted in May 2014 by the OpenBSD guys. If you'll note the date, it's pretty close to what hardly happened. It's a few months later. I think that may have some relation. Anyway, this function basically does a B0 but explicitly guarantees that it does not get optimized out. And what that means is that what it does or it doesn't, it's no longer your problem, it's the Lipsy's problem because they've made the guarantee, so now it's on them, right? So that's really nice. OpenBSD did it first and then NetBSD said, you know, that's a great idea and we're gonna steal it, but we're gonna rename it though. So they changed the name to explicit memset. But it's essentially the same thing. And then about two years ago FreeBSD came up sort of the same thing. And then almost two years ago the G-Lipsy guys came up with this too and then DietLipsy supports this too. OSX, however, does not support it. So if you're limited to those platforms, explicit B0 is a perfect solution. Similarly, if you are developing for the Windows world, there is an API called Secure Zero Memory, which basically is Microsoft saying, we guarantee that this thing doesn't get optimized out. And if you want to securely clear sensitive material, just use this API. MSN says it will ensure that your data will be overwritten promptly. And this is one of the cases where Microsoft was ahead of the curve by like 15 years. They've had this thing since the early 2000s. It was in XP and it was in Windows 2003. Both operating systems are no longer supported, but the API is. Okay, so now there's not a function called memset underscore s. And it guarantees it doesn't get optimized out. And it's guaranteed by spec. It's guaranteed by the language spec. It is standardized. It is in C11. It is wonderful. It is great, except it's not great. Because even though it's in the standard and it's there, it's in what's called the optional NXK. And if you read the spec and it's a lot, it's like pages and pages of boring crap. But if you end up reading the K2, it says optional extension. What does optional extension mean? It means you can be entirely C11 compliant and not offer memset s. So it's kind of this, as Reverend Lovejoy would say, yes with an if, no with a but. So if it's there, great. If it's not, it has the potential of being this great portable solution and then it isn't. Right. And then of course, the sort of obvious choice, but somehow a lot of people seem to miss this, is if you end up doing something with sensitive material, chances are you're using a crypto library. And if you're using crypto library, chances are the crypto library offers you an API to do secure memory cleaning. And so I listed the common ones. If you're using OpenSL, there's OpenSL Cleanse. And OpenSL guarantees they don't get optimized out. If you use GNU TLS, there's GNU TLS memset. Same thing. GNU TLS guarantees it does not get optimized out. And if you're using Leap Sodium, which is one of the newer ones, they have Sodium Mem Zero. Same thing. They guarantee it doesn't get optimized out. And I'll get down to a minute. So this is basically up until here. I've sort of given you a list of, OK, well, here are specific API functions you can call. If you're using this library or using this OS, use this. The next sort of solutions are sort of the, OK, well, what if you can't rely on the APIs? Maybe we can get something out of the compiler, right? The first solution is, and this isn't portable, but most compilers have this or something like it, where you can go to the compiler and say, hey, don't use the built-in memset. Just use the one from Lipsy. And what that means is you tell the compiler that it shouldn't assume that it knows what memset does. And if you do that, then sure enough, memset won't get optimized out. GCC, this is originally GCC specific. And the Intel compiler supports it, too. And then Clang supports it, too. And this is true. Up until Clang 3.7, which is maybe two years old. It's not that old. Clang basically supported f-no built-in memset. And then what they did is they kind of dropped it on the floor and it got optimized out anyway. So it's kind of annoying. It kind of ruins the whole, use this because it works except if you're using an older version of Clang. And also, it's not overly portable, but it's a solution that works. Other things might still get optimized out. If you have some kind of for loop that clears memory, that might still get optimized out. But at least if you use memset and you're using no built-in memset, then you have a pretty strong guarantee that it shouldn't get optimized out. So another sort of solution is just don't use optimization. That works. You're guaranteed not to get optimized if you don't use the optimizer. Obviously, that isn't perfect. For one, Fortify source doesn't work if you don't use optimization. So if you want to use Fortify source, you have to use optimization. The other one, of course, is that, yeah, you got to change your build environment. OK. I mean, it's not overly, I guess it's not portable, but then again, most compiles will have some way to tell it to not optimize anything. But obviously, the reason you don't want to use this particular solution is because, you know, you don't get the optimizer. So your product will probably be slower. Sort of a spin-off of this is some compiles, in particular the Microsoft one, and then GCC also kind of supports it is where you can localize optimizations based on scopes and functions. And so you can say, oh, you know, for this function, do o0. It seems, it's not a commonly used feature. It seems like it might have some side effects and it doesn't support all switches that the compiler generally does. I've seen this recommended by a few people. I played around with it. It seems to work, but it doesn't seem to be a sort of commonly adopted way of doing things. The other thing, of course, is again, this is, I mean, these are pragmas. This is very, very compiler-specific stuff. Another solution is using what's called weak symbols. Anybody familiar with weak symbols? Yeah, a little bit. I'll try to sort of very briefly get into this. So the L file format basically is a format that specifies how to have an executable that can run on OSes that support this file format. So, alas, if it's compiled, for example, for Linux, it'll be in this particular format. And one of the, obviously, as part of format, you can store symbols for things like functions and variables and so on. And generally, a symbol is what's called a strong symbol. You can mark one as weak, and what weak means is that a symbol may change in runtime. And what that means is if you declare a function or a symbol of a function is weak, that means that compile time, the compiler would have a very hard time to reason about what that thing does because of the sheer fact that you've declared it as weak. And in fact, this particular solution is what OpenUSD uses in their implementation of explicit B0. And what I really like about this is that this is the commit message for the OpenUSD guys. And they're very pragmatic about this. They say, well, you know, we think our solution is whatever, but it's not foolproof. There are still ways to defeat this, and they list a bunch of ways to do this. In particular, well, the compiler could emit runtime code that checks what this thing is in runtime right before it's called. And then you could still optimize it out if the thing matches or doesn't match. But then they go on and say, well, in the foreseeable future, we don't think that's going to happen. But it's possible that at some point down the road this might happen. And so I like the way of reasoning about this where the solution's pretty clever, but it's not foolproof. It may at some point in the future break. But at the present time, it's a fairly good solution, I think. Right, so another solution is to use memory barriers. How many people know what a memory barrier is or what it does? About the same number of hands. Let me try to very briefly explain what a memory barrier is and bear with me here, and I'm going to oversimplify it because it's not a particularly simple concept if you've never heard of it. Let's say you have a piece of code to global variables A and B, and you assign a value, you say A equals something and B equals something. And there's no relation between A and B. What that means is both the compiler and the hardware, because they have no relation. Both of them are allowed to reorder it, so B can be assigned first and then A can be assigned later because there's no correlation, that's perfectly valid. Now let's say you have a second thread somewhere and your second thread says, okay, you know, while not B spin, and then once B is set, you use A, right? This is sort of where you're basically waiting for something to be set, and the idea is that you wrote your code so that B is set after A is set. And that seems logical, and that would work, except if the compiler and hardware don't know there's a relation between your loop on one end and the assignment on the other, and either the hardware or the compiler reorders it and sets B before it sets A. Really, really nasty things happen. And this has been the sewers of numerous security bugs, very subtle stuff, the K-SAN and T-SAN kind of stuff. We've seen the Linux kernel last couple of years. A bunch of that is related to these kind of bugs. And so the way that you fix this is you introduce what's called a memory barrier. And that is basically, when you write your code and you say A equals something, and then before you say B equals something, you basically, in the middle say memory barrier and then you say B equals something. What that means is it gives a signal to the hardware and to the compiler, and it says, whatever happens before this and after this, you are not allowed to reorder this. There is a correlation there that I know that you're not aware of, so don't reorder it. And I hope I explained it well. This usually takes a lot longer to explain, but I hope I got the message across well enough to sort of give you an idea of what a memory barrier is. And now the cool thing about memory barriers is that it's a way for a programmer to tell the hardware or the compiler, I know something about this memory you don't. Stay away, don't touch. And because of, I mean, it works for reordering, but it also works really well to not get something optimized out, right? The idea is you could basically just do your memset and then on the thing you memset it, you basically do a memory barrier. And that tells the compiler not to optimize it out. And I know the concept sounds complicated, but it's pretty clever, and I've oversimplified this because it's a relatively complicated subject. But this works really well, and this is used by Diolipsy, and it's used by Geolipsy, and Nginx recently had a fix where they have their own explicit memzero, and it also uses a memory barrier. So this is a tried and tested concept, and it works. So those are kind of the solutions that are known and that work, and have been tried and tested by various fairly well-known pieces of software. If none of those things, somehow you're in an environment somewhere, and none of this is available to you, or it's not portable enough, and you're looking for a solution that works everywhere, the best you can do is fall back on constructs that are known in the C language, and this is basically the use of the volatile keyword. I call this a fallback. People often go, well, just use volatile and then that solves the problem. And it turns out optimizers can be very clever and very tricky, and even when you use volatile, there are cases that can be made where if the optimizer is clever enough, your data may still get optimized out. So the volatile solutions are sort of best effort fallback solutions, and there's sort of two variants of this. One is a volatile pointer right, and that's the fallback solution in the Lipsodium, and the other one is a volatile memset function pointer, which is what OpenSL uses. And here's what that looks like. This is the Lipsodium fallback, and this looks like, you know, if volatile is sort of the, you know, you tell the compiler, hey, you know, I know something about this, you don't. Except, and this is where it gets very language lawyer-y. If you look at the spec, it says something along the lines of the access object, something, something, and they're describing the actual memory volatile, not just the pointer L value, and so if the compiler looks at this code, and it can trace, and it can prove that wherever PNT came from, and if that isn't actually volatile, then this volatile doesn't really mean all that much, and it can still optimize it out. That sounds very theoretical, and I don't know if that actually happens, but a number of people smarter than me or that know more about the sort of, this needy-gritty little C-language things have told me that, yes, in fact, you can do that if you're a very smart, optimized compiler. The fact that it's a fallback solution for SOGM and a few others leaves me to believe that it probably doesn't, but it could. Right, and so this is the solution that OpenSL uses, which also uses volatile, doesn't do a pointer right, but instead it creates a volatile function pointer that points to memset, and it sort of gives you, you get the same concept more or less as with the weak symbols. The idea being is that your volatile function pointer can change at any time without the compiler knowing about it, and that seems like a pretty good solution, except when you, one way of in theory getting around this is if the compiler emits runtime code that right before the function pointer gets called it looks and goes, well, like it captures it and then goes is it memset or is it something else? If it's something else, then we call it. If it's memset, we just return and then you optimize out the runtime and save a few cycles. In theory, the compiler is allowed to do that and emit code like that. I don't know if that actually happens anywhere, but it's a possibility. So think of these last two solutions as a fallback. They may not work in theory, in reality they probably do. Right, so this is sort of the first half of my presentation, and I'm perfectly on time, which is great. So now there isn't one portable solution and this is why clear memory is hard. The problem is well understood, but if you're looking for an all-around solution that works everywhere, regardless of compilers and operations and so on, it's very hard to have a good solution. And this is what customers have come back to me and said, give me a portable solution. I need something better than this or that. And so the best solution I have for this is sort of, you know, apply all of the above as best as possible. And my initial idea was I'll just write a little function that does this and put it on github and people can use it. But if you look at lip sodium and then yeah, you see, I mean, I'm not going to click on it now, but that's the link. If you download the slides later, you'll see it points to github and it shows you the actual limitation. Lip sodium's Mem-Zero is really well written and it's beautiful and it sort of has this fairly elegant, you know, if, you know, this, this and this or if def this, then do this solution. If, you know, this particular setup, then do this solution and it has this for six or seven of the cases I've covered. It's really nice, it's really elegant. If you're looking for inspiration, point people to lip sodium. I think it's a good portable-ish way of solving this problem. Right. Okay. So that, now we've discovered, we've talked about the problem, we've talked some solutions. Okay. Well, I want detection. When does this really happen? I want to see this, right? And I want compilers to tell me this. Like, why doesn't GCC tell me it's not writing something out? Like, if it has security consequences, it should tell me. I want why are they not doing this? I don't understand. So I set out and I modified GCC. I looked at the decimal elimination and I came up with this patch and instead of the, this is, there are three SSA, that's the elimination pass and when it calls, delete dead call, I sort of take that out and say if, you know, it's a built-in memset before you call, delete call, emit this warning, tell me to file and tell me to line number. And then do, still optimize it out. And what this means is, every time a memset gets up to my stout, GCC now tells me. And this sort of, this is very interesting because I not only get, you know, detection for my own code, this is a great way to get really cheap, fast zero-day. And in fact, that's what I did. I downloaded a whole bunch of very well-known open source projects and I ran them through a modified version of GCC and I came up with a list of things. Oh, awesome, thank you. So, I know of this particular problem in like practically affecting OpenSSL, MIT-Kerp, Heimdall-Kerp, MatrixSSL, PHP, DHCP, Bind, SquidCache, and the list goes on. I have R-Sync as well and there's more. So, we know this problem is very widespread. If the stuff we all rely on, the stuff that is built on has these problems, that means your code probably has this as well. And of course, I'm just giving names out here. But let's give you guys some zero-day. That's MIT-Kerp. That memset's optimized out. That's PHP. That memset's optimized out. This, I think, is MatrixSSL. That decrypted plain text gets memset it. That gets optimized out. That's the lingers around in memory. This is OpenSSL. That crypto-extended data, that doesn't get optimized out. This is NGINX. That password, that memzero, gets optimized out. This is bind and the HCP. That memset of private key data gets optimized out. This is quid that it goes to LDAP and gets creds and basically it tries to clear the creds and then that gets optimized out. Yeah, well, I had to play around with PowerPoint a little bit. Same thing. This is a key that gets optimized out. This is rsync. These are sort credentials in a file and that gets rendered in memory and those memset's get optimized out. That's nine bugs right there. All it took was five lines of code change in GCC. GCC just gave me all these bugs. The other thing about seeing exactly what thank you. The other thing that was really nice about getting the data back from GCC isn't just that it gave me bugs. It also showed me things that optimized out that I thought it wouldn't. Obviously what I was expecting is a variable that's about to go out of scope and that you memset. That would get optimized out, obviously. But what I also noticed was that there's a common code pattern when you just malloc something or you declare something on a stack and the first thing you do is memset to clear the whole thing and then you move on. It turns out that in a number of cases that also gets optimized out. The idea is that if the compiler only gets optimized out, if the compiler can prove that every element in the struct or that the whole field gets filled in. I'm not sure if that's entirely true. And we were talking about this earlier. But what about things like structure padding or maybe enums or something like that? Or unions, that I'm not quite sure how that works. I suspect, I haven't dug into it because I found this. I mean I wrote this patch yesterday, right? These bugs are like, they're fresh. So I don't know exactly how much potential there is here but it smells like there's room for bugs here. So I was surprised to see this and some more research is needed. If anybody wants to feel free. The other thing I sort of noticed is that obviously the common case, what I was looking for in terms of bugs was something that was sensitive material. And then obviously I saw a whole bunch of things where non-sensitive material was being memset and then freed and then that memset would get optimized out as well. And that struck me as odd at the beginning. But then obviously there's a common sort of coding pattern that I've seen where anytime somebody does a malloc and then before using doesn't memset null or when they free right before they do a memset zero not because they want to clear sensitive material but because they always they want to have a guarantee they always start from a clean slate and so they end up building code that ends up working for something that always has a clean slate. And if that memset gets optimized out then those guarantees no longer hold and so code that works around the sort of well we always have a clean slate when we get a fresh piece of memory that is no longer true and so I think that coding pattern doesn't jive well with compiler optimization. Again this is the sort of realization I made yesterday I don't have all the facts on this yet but it seemed interesting and I think there's some room for research here. The other thing I noticed is that sort of close to the memset that were optimized out I noticed like other bugs and this is kind of like you know it made me think is like well bad code attracts other bad code so one of them was null derefs and the other was use after free where instead of doing memset free the code was doing free memset and then obviously use after free. So this is the case of the null deref where basically a malloc happens and then the memset null happens and then there's a check to see if the variable that was allocated is null. Obviously that memset if the value is null that memset would cause a null deref except that memset gets optimized out so it never generates the null deref. It kind of catches 22 there but that construct is clearly broken. So now I mostly spoke about memset but really there's a thousand variations that clear memory and sort of comes down to the same thing right obviously you can do a for loop or you can use other APIs like roll your own and do something very exotic and then there's C++ which there's a gazillion ways of doing it. You can have like weird classes with like a constructor and like inheritance multiple objects virtual it gets really crazy once C++ comes in the mix and so basically it all kind of looks different but it all does the same thing it has the same root cause the same problem and the thing is when you look at this from just a code perspective is that sometimes the optimizer is smart enough to see it sometimes the optimizer is not but it could in the future be smart enough to see it so it's one of these things where if you're looking at a piece of code and you're doing some kind of security assessment and you see this and like should you report to bug should you not? I think you should because even if the compiler doesn't optimize it out today it may very well optimize it out tomorrow right okay so now oldest talk has been about C and when you write it in C well what if you're not writing C code what if you're using other languages non-native languages you know and go Rust, Objective-C, C-Sharp Java flavor of the month and really I wanted to spend more time on this but then my slides would have gotten so long I couldn't I only have one slide on non-C but I spent a little bit of time on this in C-Sharp there's something called secure string which is supposed to hold a string in a safe way and the problem I have with secure string isn't the implementation it's how do I get something securely into secure string and how do I get something securely out of it and then in terms of Java there's a Java crypto guide which basically says it recommends not using strings to hold sensitive material but to use a biderain instead there was some reason behind I don't remember but the idea is basically this most managed languages don't really offer any decent way to clear memory or to hold sensitive material in memory without it leaking and it will leak and it will kind of happen behind your back because you wouldn't know it leaked especially when you're dealing with garbage collection where something can get reallocated without you ever knowing and all of a sudden before you know it there's like five different copies of your key sprayed all over memory most of these languages as far as I can tell they don't have the infrastructure in place to deal with sensitive material it seems to be entirely missing in a lot of places in other places it's kind of like you know shoe horned on or bolted on with some varying degrees of success I remember seeing there was in some places for go but it had been in provision three or four because there was always something wrong with it and so again I wish I had more time that I could elaborate on this for what I saw is that it's a pretty sad state of affairs in Nazi that most people haven't tried and those who have have not tried hard enough so now that I've sort of run through all of the sort of related issues memset problems and how to clear memory I want to sort of talk about some related issues first of all when I said initially is that when people make this step and go oh well I should clear this memory because it's sensitive I said that's huge because it really is most code doesn't even try there's an unbelievable amount of code that just keeps keys in memory and sensitive material and it goes out of scope and it never gets cleared and it just ends up lingering on the stack or the heap I mean often it would get overwritten fast but sometimes it could linger around and sit there for a very very long time the problem with this is that it's hard to find in any kind of automated fashion because you're looking for the absence of something right so that means the only way you can really find these kinds of bugs is to manually look at it and go oh this sensitive material you didn't no effort was made to clear this a second related issue and this is really cute actually so when you call memset the way it's done there that's wrong the length and the byte you want a memset with or transposed the zero should be the second argument and the length should be the third argument so what that really does is a no op it basically says memset and then use len as the pattern and then the pattern is used as the length in this case the pattern is zero so a memset of zero which becomes a no op and what's really cool is that GCC actually had you can tell GCC to have made a warning for this and one of the things I want to do is I want it to run through all the same code I had tested before and enable the warning and then sort of show it on a list of bugs but I kind of ran out of time but I strongly suspect if you use this warning and you go down on a whole bunch of like well known open source code and you run it through you end up with a very similar list of bugs so another list of sort of related bugs and related in the sense that no longer clearing secrets but related in the sense that optimization was involved that is to say if the optimizer wasn't turned on a security bug wouldn't have occurred right or would have been less severe right as there are sort of three cases of bugs that sort of ran into I think is somewhat relevant that I sort of want to talk about so the first one is what's called pointer overflow it turns out let's say you have this code ptr and then there's a lend and the lend is untrusted the idea is that you want to validate that ptr plus lend isn't beyond the end of your buffer but before you do that you also want to make sure that ptr plus lend doesn't overflow right and so you would do code like ptr plus lend is smaller than ptr if that's the case then the pointer overflow and you bail out right problem is according to the C standard pointer overflow can't happen and so that is undefined behavior and the optimizer sees that and goes oh the fine behavior optimized out gone so your bound check just got optimized out that is a relatively common bug to see if you don't know it's undefined behavior if you don't know the compiler can optimize it out you would just read over it but once you know like you start reading code and you'll see it everywhere also the way to fix this is basically to cast your pointer to an integer type that's big enough to hold a pointer and all of a sudden the optimizer can no longer optimize it out and it'll be in your code so that's the first one the second one is a lot more subtle this has to do with a switch case optimization so when you have a switch in C and you translate it to assembly you could just do a one one sort of translation but if you have, if you use the optimizer one of two things will happen it'll generate a binary tree that's generally observed in the Microsoft compiler GCC and Clang what they do is they'll create a jump table and what that means is they'll look at the value to compare and then if they compare a certain number they get the value again and they use that value as an offset in a jump table and that usually isn't the problem and that's an abstraction and most people don't care about it in most situations except if you're dealing with a shared memory trust boundary because all of a sudden there is a subtle double fetch that was emitted by the compiler behind your back that doesn't show up in the actual C code these are situations like a hypervisor trust boundary these are very very strong trust boundaries and that thing is actually a link and once I publish the slides if you click on it, it links to a blog post that shows just that it's a virtual box guest host pre-visculation because of a switch case jump table optimization and so the the bug there basically is that when you fetch the first time you do the compare it's fine and then two instructions later you fetch again to do the jump table between the first and second fetch the guy on the other end of the shared memory could change it and all of a sudden you can jump outside of your jump table and basically cause an arbitrary jump which obviously is bad yeah, I'm not going to cover that this, okay I got to wrap it up, I got three more slides and then we can get to questions this is actually very important so now after everything I've covered we're good, we know what the problem is we know what the solutions are we know there's real world problems but we have a good grasp of it now, right okay, turns out I kind of lied to you this is not the whole problem if it was the whole problem that'd be great we know how to fix that, more or less it turns out that compile optimization is really really clever and it does many many clever things and they're all very subtle and a lot of them are architecture specific and here's the scenario of things that can occur, right you'll have the optimizer will do things like well, you're handling this string of a certain kind you know what it will do I'll just shove it into a bunch of registers it'll be faster and then, you know, it passes something and then all of a sudden the optimizer goes oh, you know what, you don't have enough registers it's okay, what I'll do is I'll take whatever's in the registers and dump it on the stack and then we'll go from there and all of a sudden what happened is you leak key material in registers and then you leak them on the stack and this stuff happens throughout and secrets leak out it just happens and it is because of optimization even if you try really really hard by trying to do really hard to do the right thing and the optimizer they just screw you and this problem is echoed in the blog post by Colin Perceval who used to be I3BSD now has I think a cloud storage company very very smart security guy and I would recommend reading that blog post this problem is echoed in the linux map page Excessive B0 and the map page basically says, well, yes this is a fundamental problem we still recommend using Excessive B0 and our hope, our thought process is that in the future we will have a way to get the compilers to not do this and we can move on but at the present time there's no good fix for this and so this is the first statement I have that I want to make is that at the present time optimizing compilers and cryptography are mutually exclusive you can have one or the other you cannot have both it does not work before I get to my conclusion I want to rant a bit about optimization as if I haven't already but basically I get that optimization is great and it gives you all these things and things get faster but I have a real problem with optimization because if you're a developer and you write code and you're pretty smart, you can reason about your code because you wrote it and you know what it does but if you then compile it and it's gone through an optimization pass you can no longer reason about it because you don't know what the optimizer did and that is I think a fundamental problem and what I want people to think about is whenever they do dash O is the and I'm not saying you shouldn't use the optimizer because it has many pros many pros but don't be like blase about it before you type dash O really think what it means what it means to do that because it will introduce all sorts of things that you weren't sure about or that suddenly changes the meaning of something and what I really want for the compiler people and maybe the language people to implement so the compiler people can do it is I want strong accountability and control of the optimizer that is if I compile something I want to be able to go to the compiler and say hey, before you do anything I want you to give me a detailed list of everything you're about to do in terms of optimization so that I can take that list and look at my code and then with that list in my code I can now reason again about what the binary is without having that kind of accountability you can't reason about your binary and the other thing is control I want to have fine grain control over optimization what I mean by that is the localized stuff that I mentioned before that the micro compiler for example has where I want to be able to go and say this particular scope don't optimize or this particular scope don't do this particular optimization I would like to see something like that I'll just skip this here's my conclusion we know what the problem is, the original problem and then I have some solutions and then in retrospect there are just partial solutions but there are still kind of solutions and then I also have a call to action of things I think should happen and hopefully will happen at some point in the future the problem as I've illustrated I think is rampant basically I would like people to use that GC patch that I've shown or create a better one and go find some bugs or better yet go fix some bugs in terms of compilers as I just mentioned what I want is optimization accountability and control it's kind of the wild wild west in terms of optimization where the compilers just go and do all these things and we have some flags but there's not enough control and there's not enough accountability there's not enough transparency you just don't know what's going on and so compilers have like a dumb functionality where it's like it's like a needle in a haystack you want something that's easy to work with easy to read or easy to parse and that tells you exactly all the optimization steps you're doing and ideally I'd like the language people to get involved and the standardize on this because if they do we can now demand this of all the compilers and then lastly coming back to my Nazi what about the Nazi Ruby Python Perl Go Rust and so on it smells bad it looks bad especially when there are runtimes involved this is probably worthy of a presentation of its own or multiple presentations but I wish I had done more there that is essentially it I hope you enjoy that