 All right, good evening. Thank you for joining me. I realize there's all these fun parties here at DEF CON at start in the evening, so I'm glad you guys could make it. I'm going to be talking about memory corruption vulnerabilities, sort of a history of the strategies that you can take from a vulnerability to an actual code execution exploits. And basically extending the line of the research that's been done over the last 20 years and where we can go forward to make defense a little bit more straightforward and more robust. So who am I? As of my day job, I work a startup in the San Francisco Bay area called Skyport Systems. We're doing some pretty cool stuff and we're always looking for talented people, so if you're interested, get in touch. By night, I do research on infosec problems and the usual disclaimer, although they're very supportive, these are the opinions of the talk on my own and are not necessarily representative of any official position of my employer. So a brief show of hands before we get started so I can get a little bit of a feel for the audience. So how many of you have written programs in CRC++? Okay, yeah. Pretty much everyone. How many of you have written a stack smashing exploits? Okay, it's about two thirds or so. Return to libc or return oriented programming. Okay, about a third. Advanced return to libc with information disclosure. Someone's just got his hand up and not even just like, well done. Okay, so I'm basically going to be assuming roughly intermediate C++ knowledge which it seems like most of you guys have, so cool. So some motivations. One of the things that sort of really bothers me as software today looks a lot like this. It's a fricking Swiss cheese of vulnerabilities and a lot of them are memory corruptions and very, very many of those vulnerabilities are useful for remote code execution exploits. And we've been trying to solve this problem for 20 years and it doesn't seem like we're any closer to that. So I don't know. The end doesn't seem to be in sight in changing that. The second thing is there's a lot of increasingly better tools now. I guess there's a talk just now about constraint searches over program spaces to find execution vulnerabilities. But there's also industrial bug hunting operations like running American fuzzy law against your program. And this feeds into an industrialized exploit development process where it goes from vulnerability to eventually an analyst looks at the crash logs and they develop a vulnerability weaponized exploit for it that is trying to be reliable and work around whatever anti-exploitation mitigations are in software programs. And there's not really much of an economic incentive to let vendors know that you found vulnerabilities because you can sell these exploits for a lot of money. And that gets to sort of my third part point of motivation. I don't think bug bounties are a particularly effective process for dealing with this issue because there are people who will be paying for weaponized exploits a lot more than vendors will be. State agencies have effectively unlimited budgets to pursue these sorts of things. And I don't really think we want to be in the reactive business of finding an exploit and then effectively blacklisting it. It didn't work out very well for antivirus and I don't think it's going to work out very well for the exploit market as well. And so I think fundamentally we should be targeting supply rather than demand of exploits. Just can't solve the demand side. And so what's the path that we should take for that? There's been plenty of research over the years about how to try to prevent memory corruption but almost all of them impose a rather substantial CPU or memory overhead that hasn't proven to be market acceptable. But we can maybe look at other strategies to make it more difficult for an attacker to achieve remote code execution even in the context of memory corruption vulnerability. And I think it's important to ask the question why do we keep hitting these types of vulnerabilities, this class of attacks against our program systems? Is there a fundamental blind spot that programmers are hitting that basically encourages this type of issue manifesting in our software programs? And I think ultimately defenses need to be born in light of attack strategies so we need to understand where we came from in order to know where we can go forward from here. And so I'm going to take us on a bit of a 20-year journey from, you know, 1995 to 2015 on the major points of attack and defense from code injection exploitation, classic stacks mashing, defenses against those such as non-executable stacks, code reuse exploits like return to libc or more generally return oriented programming, the weaker mitigations that we have against that like standard address based layout randomization that we have deployed on systems today. And then I'm looking, we'll look over research over the last five years on advanced attack and defense. And then I'll be showing off a layout of Asian targeting heap enumeration which is a system I put together which I think will make it a little bit more interesting from the defenses standpoint. So I think the main name of the game fundamentally is we have a lack of memory safety. And when I say memory safety, I sort of mean that we have programming languages that provide abstractions to us as software developers that we can focus on the problems that we're trying to solve as opposed to necessarily the systems, the underlying system implementation of certain things. And so we have this notion of variables that are separate from each other like in function foo here, A and B are conceptually separate from the perspective of the programmer. And thus we don't really think about interactions between the two because we sort of have this implied assumption that the system will not allow us to interfere with one with accesses, reads or writes to the others. But that turns out to be kind of an idealized case as we know, because if we basically step through this code and we look at a simplified version of what the stack looks like, then really as we step through this program, we have things like return addresses to called functions. We keep track of where we're executing, subroutines, we allocate local variables in the stack, we call more functions, we feed data in, we copy data in and we basically follow this process where we run these functions, the stack is manipulated in this way. And eventually we hit return and everything's happy assuming that no one writes more than 23 bytes into standard in this program. So let's talk about code injection attacks in general for all of you who know stack smashing, this is going to be pretty old hats. If we go back to this point in the program's execution when we have an unbounded read into this stack buffer, if we write, you know, zero to 22 characters plus the null terminator up to 23, we're fine, nothing bad happens. This is sort of within the design expectation of the program. If we write a little bit more than that, say 24 through 28 bytes, we're still kind of okay, we've overwritten a part of A but we haven't really catastrophically broken the program in some monumental way. But if we as an attacker choose to write more than that, we can eventually overwrite the return address and at this point we have Cs controlled the instruction pointer. So something that we could do in this is the classical stack smashing attack of 1996 that was very well documented in smashing stack for fun and profit. We can actually load an arbitrary machine code payload here onto the stack and manipulate the return address that we've placed here in this location where we can control the instruction pointer to point into that shell code. And now we basically Cs control the program and it's running code that we as an attacker have supplied it. So game over, bad news. So before I talk about defense, I'm going to briefly diverge into virtual memory and paging on the Intel architecture. Just we have some context on what tools we have in play to work with this type of, from our defensive standpoint. So it would be really nice if for example we couldn't execute code on the stack but let's look at what our capabilities are so far. So let's say we have a virtual address. As I'm sure many of you know, every program has its own, what it believes to be an exclusive view over the memory address space of the computer. The reality of course is these are virtualized by the operating system and so when you dereference an address like dead beef in the context of a process, it really goes through an address translation process to an actual physical address which is controlled by the operating system. And so if we look at the binary format of this address and then we decompose it into three components, in classical 32-bit Intel architecture, it's decomposed in this way so there is two 10-bit fragments and a 12-bit fragment at the end. And so we have a page directory somewhere in memory on Intel architecture. The base pointer of this thing is control register three. And we use the first component as an offset into this table to look up a page directory entry or PDE. This page directory entry then contains the physical base address of a page table. We now use the second decomposed component of the virtual address to look up an offset in this table. We get a page table entry and this page table entry then points to the base of the actual page, physical page in memory that we care about. And then we can then use this offset to then access whatever byte the user program is interested. And so this happens transparently, the memory management unit on the Intel architecture just does this all for you, assuming the operating system has set up the page table entries appropriately. So the first 20 bits here are sort of the virtual frame number for convenience. And this is useful for other things. And page table entries, as I said, basically, since things are page aligned, you fundamentally know that the bottom 12 bits in 32-bit architecture are all going to be effectively zero so you don't actually need to store that. So these lower bits are actually used for alternative capabilities like setting permission bits or tracking access pages for performance or if swapping is enabled, you can determine whether the page is actually present in memory or if it's been paged out to disk due to memory pressure on the system. And the thing is you don't actually do this virtual to physical address translation process every time you want to do a memory access because it's like three memory accesses for every one that your virtual address space represents, which would be very, a lot of overhead. And so this translation process is actually cached in a data structure called the translation look aside buffer, which is part of the central processing unit. And it basically takes the virtual frame number. It does the translation process to determine where the page is located in memory. And then it basically determines the effect of permission. So is it a supervisor? Is it writable? Which are the two permission bits that I highlighted in the previous slide. And it basically stores those three things together as part of a TLB entry which has the virtual address, the physical address and the aggregate permissions of that page. And so there is a really awesome team, a pseudonymous team called PAX who was looking at this and realized, hey, on the Intel architecture there's actually a separate translation look aside buffer for both instructions and data. And as it turns out these are only filled based on the type of access that you have as a program. So if you're doing a data access, the data TLB will be filled, but the instruction TLB might not have a mapping for that virtual address. And so they realize that actually if they are clever about it and they are able to fault in a controlled way, as the operating system, they would be able to basically emulate the notion of non executable pages which is not a capability previously available to the architecture. And so their strategy is basically like this. They would set the supervisor bits on the page table entry. And so the operating system would or the processor memory management unit would try to access the page. It would fail because it's like, oh, I'm running in user mode. I'm not allowed to do that. This would basically invoke and interrupt and the operating system page fault handler would basically say, okay, let me take a look and see what's going on. And so for example, let's say we have an example where we have some green instruction pointers on some green page and we're trying to access this yellow page. So this is basically an orange page. This is basically a data access. And so we have the pseudocode strategy on the right there which says, you know, if it's a supervisor page and the instruction pointer is on the faulting page, which in this case it's not terminated. Otherwise what they ended up doing was they actually flipped the blue bit there to zero. They said it to be a user page. They allowed one instruction to proceed in the user program which would create this TLB entry which would say this orange virtual address corresponds to some physical address and the permission is user. Plus whatever other flags that we don't really care about. And then they would immediately trap again and then reset that blue bit back to one so the page is supervisor. And now the page table versus the TLB have a different status. And the processor really only cares about the TLB so it looks at the TLB on subsequent accesses and it's just like oh, I already have a mapping for this orange page in the data TLB so let me just use that. I'm not going to do the expensive look up through the three level page table hierarchy. If we then have later on an orange pointer trying to access an orange page, there's no ITLB entry and so the processor again is like okay I'm not allowed to do that. Let me go to the operating system page handler and it can now see okay it's a supervisor page. The instruction pointer is on the faulting page and so it's just terminate the process because this is a bad memory access violation we don't want to allow that. Basically the whole point is we want to make sure that no instruction TLB entry is created for that virtual address that permits access for instruction fetches. And so this was implemented originally as a Linux kernel module and then shortly thereafter around 2003-2004 both Intel and AMD the major XA6 processor manufacturers extended the memory model to support an explicit NX bit. So AMD calls this NX bit, Intel calls it XT bit but it's effectively the exact same thing. It's implemented the exact same way. So this is in hardware now. It's been in hardware since the last 10 years. Your system supports it probably. And so we basically moved from the situation where we had user pages, we had an ability to read and execute pages or we had the ability to read, write or execute pages basically wide open. And by adding this supervisor based bit based emulation of non-executal pages or hardware full on implementation then we actually gain this additional dimension of control over page permissions and we have a little bit more expressive power. So we have a notion now of things that are code or are probably code and things that are almost certainly data that we never want to treat as code. That's important. But obviously the story doesn't end there. We didn't end memory corruption exploitation with PAX or with NX bit. So attackers evolved into using code reuse strategies and so the most fundamental of them is return to libc. So basically this is a model, another view of a stack smash where you basically can't do this anymore. When the processor actually instruction pointer goes into the stack it's no longer permitted to do that access by the memory management unit. So we can't do this anymore as an attacker. But here's an alternative view. We can still corrupt a stack buffer. We can still overflow it. We can still put arbitrary attacker controlled values into these critical system paths. And so instead of putting an address to a piece of code that we've injected on the stack instead what we can do is put the address to say libc system call. Which takes a single command line parameter and a single parameter which which serves as a command line parameter to the system shell. So in this case we're asking system to run a bash shell. And so this view is a little bit difficult to see because it's basically a damaged memory space. But if we look at it from the perspective of what system sees it sees a perfectly reasonable function call. It has some saved return address that we don't really care about for while we're executing system. And we have a parameter which is before the return address which is a pointer to a string which has been bash. And whatever other local variables we need to allocate as part of this process. It doesn't really matter from the attacker standpoint. So you can't actually execute more than one function in this with this exact approach. But it sort of gives you an idea of this is how you can begin to do code reuse attacks. The technique was generalized about 10 years later into a notion of return oriented programming. And the idea here is instead of having a full libc function that you're calling you actually look for machine code fragments that are that are succeeded with a return instruction. And these are basically called return oriented programming gadgets. And you can compose them in however way you want to achieve whatever stack manipulations. And so if you look at the bottom one here we can actually use this to rebalance the stack with these extra arguments. And actually invoke more than one gadget and still sort of have a semblance of something which resembles a correct looking stack. So in 2003 sort of in the realm of trying to deal with return to libc even though Rob hadn't been fully generalized at the time the PAX team again started looking at defensive approaches for preventing these code reuse attacks. And so the thing that they came up with is we had developed position independent code for libraries and executables so that they didn't actually we don't actually need to load them at a fixed address in memory anymore we can load them arbitrarily. And so they realized okay if we shift the stack around a little bit we if we shift the starter and map allocations around a little bit if we shift the location of our heap around a little bit and we we load the program codes and arbitrary offsets and arbitrary order then we can limit the ability of an attacker to know ahead of time where the interesting addresses are for them. So this this was sort of added there was a caveat in the original implementation which said this we don't think this is a fully capable defense in the same vein as paging stack execution prevention but it's still something to work with. So let's take a look at a couple of ways that you can work around ASLR. So one of the things is if you have an ability to disclose memory of a particular library that you're interested in then you can actually recover the offsets of everything that you might be interested in as an attacker. So if you know that for example system lives at offset relative 23 within the library and printf lives at offset 46 relative to the beginning of libc and at some point you discover that the randomized virtual address of printf is this then you as an attacker knowing the layout of libc have now also discovered the location of the system function call. And so you can make use of this to again fix up your fix up your gadget chain addresses and still achieve this type same type of exploitation strategy. So there were a couple of research papers that came out in the intervening five years which looked at a couple of more sophisticated means of further permuting the address space. So one of them was called address space layout permutation which basically said okay a little bit of randomness is good so let's add even more randomness. And they basically instead of just working at the level of a library they said okay let's do this at the function level or the basic block level. And then a more recent paper called smashing the gadgets basically just does register recoloring these two fragments of assembly are effectively complete you're just alpha doing alpha equivalent swapping of your register allocations. And this is absolutely equivalent code from the standpoint of the processor which maps them to internal architectural registers anyway. And so there was a really excellent paper from two years ago called just in time code reuse. And they basically observed that all the existing fine grained address space layout randomization techniques were actually of not significant value. They're basically like okay the whole point is you want to reduce the value of a single pointer from the attacker's perspective. Because if they get one pointer then they know everything in your library and that's a bad thing. But the whole point of fine grained was okay you may learn one address but it doesn't necessarily teach you anything useful about multiple other addresses in your library or elsewhere in the program space. But what these guys did was really clever. So let's say they find a single code pointer at address dead beef. They observed a very interesting fact. The first thing is that if they chop off the bottom or they zero out the bottom 12 bits they're actually they know that they have a four kilobyte page of code. So four kilobytes of code address space. What they then did at runtime basically was they would disassemble this page and look for assembly instructions that included absolute jump offsets or absolute call offsets. So in this case we have this call to 646166E. And they would find you know some handful of these absolute addresses and so they could repeat this process recursively. They would find another 12 4 kilobyte aligned page and they could repeat this process iteratively iterated over and over and over again until it exhausted all of the absolute addresses that they would find in the memory address space. And they said typically in their paper the results that they were seeing were that per input pointer they would find two to 300 4 kilobyte pages of distinct code through this backtracking recursive backtracking process. And so that's like one to two megabytes of machine code. And they would basically just do a gadget search in real time and compile a high level payload objective to the gadgets they discovered at real time. And so it was game over. And they actually implemented a version of this that ran in javascript leveraging information disclosure vulnerability in a web browser. And this has the ability to basically wipe out any fine-grained randomization that you might be interested in doing. So the value of the one pointer is still a lot. One pointer to rule them all. And fine-grained ASLR didn't seem to help this problem at all. But it's kind of interesting still because when you introduce fine-grained randomization you actually change the attacker posture a little bit because they can no longer do this, do most of their work ahead of time. They can no longer do a gadget search ahead of time and find a bunch of exploits, chain together the gadgets that they find in a really complex way. Because they're just not going to be in any sort of predictable location over predictable value. And so that's important. Because it gives us as a defender an interesting advantage because if we can maintain that information asymmetry that the attacker doesn't know enough information then they can never readjust their gadget chains or dynamically construct a new one in a way that would enable them to achieve arbitrary code execution. So I'm going to do a bit of a digression into C++. Let's imagine we have a very very simple class object hierarchy where we have a parent class animal which has a couple of virtual function calls like feeding it, petting it and then making a sound. And let's also imagine that we have two subclasses a dog and a cat which might have slightly different implementations of the virtual functions like making a sound. Fundamentally every single one of these virtual functions are single pointers that are sufficient to execute a just-in-time return-oriented programming exploration phase. So we want to be able to avoid that and here's an interesting idea that I came up with I don't know how practical this one really is but it's worth considering that rather than having a fixed virtual function table for animal for these instances of animals that we actually expand the size of the table by a security parameter and sort of raise the uncertainty for an attacker. We might have these real functions at some particular offset. For the purpose of the program it doesn't care. V-table is just an index lookup for it. If it's three times bigger or ten times bigger it doesn't really matter. But all the rest of these things can lead to unmapped memory where you can't read them, you cannot write them and you cannot execute them and trying to do any of them will result in an axis violation and the program will crash. So this is an idea for a probabilistic approach for dealing with return-oriented programming that's just in time. I didn't really take this approach for a far beyond prototyping it because there's other issues with it which I'll discuss later but it's something to think about. Another alternative is to actually make the pointer more opaque not actually as useful to the attacker. So for example if the attacker does know that this page contains code maybe we can stop them at this stage where they disassemble it. Because they disassemble it they need to be able to access that page as data which is fundamentally a different operation than accessing the pages code for instruction fetches and this is kind of like what PAX did for page exactly it's a little bit sort of sideways. So can we do a TLB splitting like PAX did to do something like this? Well maybe. There's actually a root kit from 2005 by Sherry Sparks and Jamie Butler that they published on frac which used this a very very similar trick to hide the contents of the root kits code pages from operating system memory scanners. So there's precedent to doing this course sort of obfuscation of code pages. Unfortunately you can't actually do TLB splitting in the style of PAX anymore on any Intel processor produced in the last seven years. They made some fundamental changes to the way that they implement their TLB so there's actually a second level which is not agnostic to data or instruction and so you can't actually do the same sort of software emulation trick that PAX did. But what one hand taketh at Intel another giveth back and we have an option of maybe using extended page tables and these are designed for hypervisors to allow them to accelerate the physical address translation to machine address translation. It's really almost exactly the same process that's done by the operating system for virtual physical translation is just another layer of it. And so as it turns out it gets for some bizarre unknown reason maybe someone from Intel can answer it. They actually added explicit control over the three permission parameters of reading, writing and executing code on extended page tables. And there was a talk last year at Black Hat by Jacob Torrey who showed that basically you can use EPT to achieve very much the same things that you can do in Shadowwalker. So that's actually really cool. And let me circle back on the necessary versus sufficient thing. We know that if we have no ASLR then an attacker can know everything that they need to know to achieve an exploitation of a memory corruption vulnerability a priori. They don't need to do any sort of runtime discovery. When we do standard ASLR as it is deployed in its 12 year old form on pretty much every system when we run these days, an adversary needs to know at runtime discovery of a dynamic offset. And if we do fine grain then they actually need to do a lot of work. And if we can kill that runtime discovery by making pages non-readable but executable or otherwise obfuscate the pointer, though I think the execute only memory is probably the most compelling method, then we can actually prevent an attacker from gaining the information they need to achieve a dynamic creation in one of these exploits chains. But there's ultimately I think two reasons why fine-grained address-based layer randomization hasn't been widely deployed. If you read the academic research papers on this topic it's actually kind of amusing almost. Almost everyone talks about CPU overhead and how there's almost none of it in any of these schemes. But the thing that they're not saying is every time you do fine grain randomization you kind of give up an ability of you shared code pages. You lose the advantage of shared libraries is actually a pretty big deal. If you look at just the libc which is you know let's say roughly two megabytes of code across 200 processes if you're if you're able to save that that's like 400 megabytes of memory savings and that's just for one library. When you give it up at wholesale your memory costs for running a typical system just grow astronomically either two or three orders of magnitude. I think this is really the main reason beyond just the difficulties of actually what security advantages gives us why we haven't deployed fine-grained SLR schemes in the real world. But there's another interesting paper from last year called Oxymoron and they sort of take the observation of we are able to share libraries at the library executable object level because we have a notion of position independent code and we do that through layers of indirection the procedure linkage table and the global offset table. We can probably do the exact same thing but on a smaller level of granularity and so they said okay pages on x86 are four kilobytes so let's break up our libraries into four kilobyte chunks and locate them not just position independent as a group but position independent relative to each other and the trick that they use is they're using the vestigial remnants of segmentation that's still available on 64-bed 64-bed Intel architecture to use the segment selector register in a far call and so you can have this piece of assembly code that I have on the left here which has a call using segment selector register to some fixed offset in this FS segment and this FS segment can be located and specified at some totally random location in the 64-bed address space and within that you actually have the real addresses that you're jumping to so this thing on the right which the Oxymoron author is called the rattle is process specific but it's comparatively quite small it might be a couple tens of pages but the four kilobyte chunks of your actual library code can be randomly in totally random positions in your virtual address space but still be shared across your physical address spaces and so finally circling around to the work that I've been doing for the last year so since extended page tables provide us a method to extend the memory capabilities of the Intel architecture model I grabbed an off-the-shelf hypervisor which is very very commonly used in 4.4 which is what I started with they introduced a sort of para-virtualized plus hardware virtualized memory accesses so prior to this there was PV mode just para-virtualized mode which emulated the machine the physical frame that the operating system saw to the machine frame translations so pvh leverages EPT which is the whole point of EPT for this task and since EPT exposes these permissions directly we can modify Linux running under a specially modified version of Zen to issue a hypercall saying mark this page is execute only in extended page tables for me when we receive an mProtect call and system call in Linux with those permissions requested and then Zen can then catch this this occurrence in its extended page table fault handler and reinject it as an ordinary page fault handler to the Linux operating system and so this basically from that point onward it it looks like just an ordinary access violation that's natively supported to the platform and when it can just be like okay this this program has done some sort of really weird memory access I got this kind of funky looking page fault back from the processor or the operating system or the MMU or whatever whatever it happens to be the operating system doesn't care that it's actually from software and then I can just terminate the program as if it was any other type of page related violation there's a couple of caveats that are mostly related to the use of Zen for this which I go into a lot more detail in my white paper not going to go over here the other component is a very very simple fine grains address base layout randomization pass that I added to LVM so LVM is a fairly long-standing project for doing a modular compiler framework so they have modular front ends for cc++ subjective whatever esoteric language you might be interested in which ultimately gets compiled to an intermediate form and then that intermediate form is then compiled into whatever native machine architecture you might be interested in and so all three of these these zones there's the front end any sort of optimization passes and the code generation pass are all separate and you can plug in plug in whatever you want and so I've added a very very simple code gen pass to 64 bits 32 and 64 bit but realistically only 64 bit Intel architecture and basically all it does is it adds zero to three knob instructions the beginning of every function and every basic block that is call proceeded so if you leak it either a pointer through a V table or if you're able to examine the stack in detail you're still not going to know the exact subdivisions and I chose sort of two bits of entry because it's kind of I think it's enough you probably could actually get away with one bit of entropy here either a knob inserted or not but I just figured why not to and so I'm going to try to demo that and hopefully will not blow up my system so I have two demos here one is showing off the execute only pages and the second is just showing off this knob insertion process for fine-grained address based layer innovation so let's see if I can do that let's do the fine-grained SLR one first because it's less likely to blow up so I have a very simple C program here which is computes factorials it fits on a page a single slide with giant font but very very very straightforward very simple I run the process on it and I have it spit out the disassembled versions of this this machine code and if we look at it this is this is just the factorial in me and there's other ones which I stripped out of the output but you can actually see that there is a single knob instruction that's been added at the beginning of the factorial and this function is very simple it doesn't actually have any function call return edges so this is the only place where an insertion of an of a knob is possible and it's basically lets you minimize the amount of overhead that's introduced say around hot loops because you know that these these particular addresses are not going to leak onto the stack where an attacker might be able to examine them main looks a little bit more complicated because it has a bunch of these these call keys which are function call instructions so at the beginning it's got two knobs which have been inserted it's got a knob after the first called a factorial it's got two knobs after this called a factorial it's got a knob after this one and it's it's got three knobs after this one and so the idea is this is actually if you have execute only pages is actually a sufficient level of complexity fine grained just based on our layout randomization to be effective so you don't really need to go super crazy like register recoloring you really can just do quite simple rudimentary strategies for this and I can run it again we're going to get slightly different output let's look at fact since that one's simple yeah so now fact has two two knob instructions at the beginning of it alright and the other demo I want to run which is a slight risk of crashing the system so again relatively straightforward program I have a function to print out pretty print out a blob of hacks which is not very interesting and I have a very very simple stupid foo function which calls print F three times and adds two numbers together it just does something so it doesn't get optimized out and in the main function I'm calling foo I am retrieving the address of that function I am dumping out 76 bytes which I just happened to know ahead of time is the length of this this particular compiled function and then I begin I mark the page again 12 12 bits are knocked out and I do page size of exec only permission I try executing foo again to make sure it it still works and then I attempt to hex dump it again and it should fail and we should not reach this last print statement nice my system still responding so again we can see okay we have a couple print statements coming out of foo we have some machine code blob coming out you can see see three which is a return instruction at the very end we mark it executable only we can still run it's which is good but we tend to read it and we get a segmentation fault and that should actually be 139 which is segmentation fall okay so that's the demo they didn't blow up yay so a couple of the demo gods are pleased alright so a couple closing thoughts before we finish up got a couple more minutes so I have a couple of takeaways so the main one is we should be able to take it full advantage of the memory permission model we shouldn't arbitrarily restrict ourselves just to the ones that Intel has provided as the default in the architecture I think that we should we begin we should begin taking a mode where we have constant data is really just readable which we have already we should really maintain our stack and heap and map regions as read and write as data not as executable code unless we're loading a library in which case you do want it to be executed but that involves loaders anyway but we should shift from having our library and program code being read executable just to just being executable and there are some issues with with doing this especially around switches and a couple of other constructs because there hasn't been a pressure to do non non readable code pages before but I think we should deprecate the read and execute model we should definitely not be using the read write execute model I mean I don't really know what the with the other three might be useful for but whatever we can have them or we can not matter and the other thing is I think we might want to change a little bit about how we do software packaging of distributions because if we can say transmit our operating system distributions as bitcode rather than as final machine code it gives us an opportunity to basically do say a boot time service which imposes a high quality fine-grained randomization and so we can actually generate the final versions of the libraries and executables that we're using it at boot time and then we can further take advantage of a trick like oxymoron to break them apart into four kiloby pages and impose an additional not quite as strong but still valuable fine-grained randomization without imposing the gigantic memory costs that ordinary FGIS or does and this gives us this is broken up into three purposes because we ultimately have three different objectives for all for for these three representations we fundamentally want to determine a stake repeatable digitally signed cross signed multi signed operating system distribution so we can look at binaries and be confident that they've come from a particular version of source code and they haven't been subverted or whatever and we want this to be repeatable we want a high quality unpredictable randomization but we want to be able to do it in sort of a way that's seeded and reproducible because if you want to get a crash dump as a developer you want to have some way of making sense of the data that you're getting from a user so you want this to be a process that you can repeat in a forward way and again there's also some security implications there because you want to make sure this randomization service isn't is it backdoor and we just if we can add more entropy and more randomness at low cost to us as a defender but impose a nice additional barrier for an attacker than we should do it and that's all I got I'll be putting out in code I'll be putting code out in about two weeks because I'm still trying to track down like a 10% or so crash bug but my white paper should be online now which has a significantly a lot more detail than I can go in in a 45 minute presentation you can email me here my PGP key and you can tweet at me that's all I got