 I'm Zino Kova, this is Corey Callenberg, and we're going to be talking about detecting code integrity attacks today. So we work at the MITRE Corporation, which is a non-profit company that runs five federally funded research and development centers for the US government. And the main place you would have probably heard of MITRE is from CVE, so we run CVE. So most of our stuff is not public facing type things, so we're happy to be able to talk about this publicly. And so like I said, the problem we're going to talk about today is code integrity attacks, knowing when someone is compromising your code and modifying it. So the way that the security game plays out today is that you've got malicious software and you've got security software. And the security software is supposed to just tell you of the presence of the malicious software and hopefully get rid of it. Well, so at such time as the security software is actually good enough to get rid of malicious software, which I don't think we're really there today. But in a future world where we have security software which actually impedes the attacker in some meaningful way, then the security software will start to become the target by the malicious software itself. So they need to keep doing their job. And so they can start instead of going after, you know, just generically bypassing signatures and going after the operating system to compromise, you know, the stuff below the security software. They can go after the security software directly and they can use code integrity attacks. That means things where you're, you know, putting an inline jump in some code which jumps out to stuff and filters results. As you see in RootKits, it can be things like no upping out checks within code in order to force it always down some path like you see potentially with like software cracks and things like that where you want to bypass a key generator or a key input. And so the point is the security software will now lie. The security software, when it's targeted specifically, it's going to lie because they're not really built to handle an attacker at the same privilege level being able to modify them. So what we do is we, you know, we can add some additional software to try to deal with this problem. We can add more software, which doesn't necessarily solve the problem, but we can check the security software. Well, if we do this, then the attacker is just going to play the same game. You know, adding more software that's not fundamentally different is not going to make any change. They're going to do the exact same thing against our security software as well. And so our program is called Checkmate. Checkmate has, it's checking the security software. But the important thing is that now we're adding a self-checking mechanism so that we tell you, our software may be forced to lie and it may say the security software is okay. But then when it checks itself, it's going to say, but don't believe me, I'm not actually okay. And so some security software does incorporate some element of self-checking into it. But we would argue that the current self-checking mechanisms are pretty naive. And they, they really, you know, when, when you're doing a self-checking mechanism and you're evaluating it on the system that's being checked, that's the expected compromise system, that just leads to problems where ultimately it's bypassable. So, I mean, if we add this sort of self-checking mechanism, as I said, some security software does have a type of this, while the attacker can just again play the exact same game, go down to the self-check software and manipulate that. So the point is you can just keep playing this game at infinitum and it's not going to make a difference. So, but the key difference that we're going to talk about with Checkmate that differentiates this from other security software is that we build the self-checking mechanism in a very specific way where we're constructing the self-check explicitly to build in a timing side channel in its execution. So here we're using a timing side channel to help us in defense rather than you typically see it used in offense to extract keys and things like that. We're making it so that with our self-check, if you try a code integrity attack against it, it will lie just like the attacker wants, it will calculate that, you know, the check sum is good, but it will lie in such a way that the timing is manipulated. And so this is analogous to the case. If you ask someone if they're okay and they say, I am okay, you don't believe the content of that message, right? There's something wrong about the timing of that message that indicates that the person's not actually okay. And so that's the key point. There's a lot of work that has to be done in order to get this software in this specific way and we'll sort of talk through the design parameters that are necessary. So as I said, Checkmate is two components but we're only going to be talking about one today. There's the measuring other security software and measuring the operating system itself. This is analogous to other memory integrity verification things. Patch guard in Windows does some Windows OS self-checking. Companies like Mandiant and HB Gary have things which do memory integrity verification because obviously they're concerned with that. Having good experience with root kits, they know that that's a necessary component. So we're focusing on the self-checking mechanism. This is what the other security software doesn't have. So you can take patch guard, mirror, active defense, any of those. If you're targeting them specifically, you can make them lie. And the point is they don't care about that because they said, well, in some future world where the attacker cares enough to target us, then we're winning. But until we see that, we don't care. But of course you're not going to see it because they're making you lie. So why would you see it? So we call this timing based attestation. Attestation is just a fancy way of saying that you're providing some evidence about the state of something. You're testing to the integrity of your code, for instance. And so this is based on academic work called Pioneer by Seshadri, if there's any academics in the crowd, this is out of Agent Parag's group at CMU. And so the key point is we kind of consider our specialization area of research to take other people's research and make it more practical. So this was Pioneer was good. It was for Linux. It had some assumptions that we wanted to loosen so that we could get this to work in basically our corporate environment. And so we independently recreated this just to confirm that this academic work actually does what it says it does because you can't always believe that. And importantly, the source code is available. So, you know, as we're going through here and you're thinking to yourself, what if I do this? What if I do that? Well, you can go actually do it and you can see if you can write the assembly more optimized than us and prove that you actually can compromise it. So with that, I want Corey to start talking about the design. Okay. Thanks everyone for coming. Can you all hear me okay? Good? Okay. So you may be asking yourself, why do we even want to do this? And just keep in mind the goal here is we want to build tamper resistant software. If we're building some type of malware detection capability, like Zeno pointed out, the malware will eventually come and try to tamper with us. So we want to be able to detect that. So a very primitive case will be something like this. You might see this in some current commercial software. We'll be doing a basic check some of their own code. And then, you know, if the checksum is good, it will continue. And if it's bad, it will be some type of fail case. And often with malware, you'll also see like a, you know, RDTSC command. And if it's not with an expected time constraints between point A and point B in the code, it will go to a fail case where it's known as being debugged. So this is naive because obviously anyone that knows anything about X86 assembly can just come in and do something like this. You know about the call to the self-check and then you just hard code, you know, the expected correct self-check sum there. The correct hash and then the code would have no idea that it's been tampered with, right? So unfortunately at this point, most sort of commercial software out there kind of gives up and they either just rely on, you know, something simple like this or they don't do it at all. And maybe to improve it, they'll just try to obfuscate the hell out of it. But they never really try to improve upon that method and make it more hard. And that's what we want to do. We want to make this better. We want to make this better. We want to make this better. And that's what we want to do. We want to make this style of self-adastation better and more resilient to attack. And so in order to do that, we have to fundamentally change our approach from what I just showed you. So first of all, we have to make our self-check, our hash that we're doing over our own code, a function of a nonce, a random value sent from a verifier. That way the attacker can't just hard code the correct value, right? Because it's going to be different depending on the random value coming from the verifier. Another key point is with things like patch guard that are sort of trying to measure themselves and attest to their own integrity, the verifier also exists on that system. So with patch guard, you know, it's trying to verify itself and determine whether or not it's been compromised. And all that's happening on the compromise system. So the attacker, if they can compromise the system, they can just compromise the verification software on patch guard as well. And patch guard would have no idea that it's being tampered with. So we incorporate this on the client. And it's going to get the results of that check sum based on the nonce, send back to it from the client. And then, you know, in a secure environment, what we assume is not compromised, take the results of that self-check sum to try to make a determination about whether or not it thinks the software has been tampered with on the client side. So our framework, as opposed to what I showed you previously, looks a little bit more like this. Obviously it's not the real deal, but just the nonce. And so we have our check sum and then it's going to be waiting for a request from the verifier, some type of server. It's going to get the nonce from that server at which point it knows, okay, the server wants me to attest to my integrity so I can know whether or not it should believe any measurements I send in so it knows whether or not I've been tampered with. It's going to do a self-check sum based on that nonce and then obviously send the results of that back to the server. So that's kind of one mapping of this is the correct self-check sum. So now if we have a nonce that's 32 bits, we know that there has to be at least two to the 32 possible combinations. When we have a larger self-check sum that's not just a single 32-bit value when it's actually 60 words, that means you've got actually an attacker would have to pre-calculate 96 gigabytes worth of data and keep that on the machine that he's compromising. If he does that, then he can definitely respond and just say oh, let me look that up in my table, this input nonce equals that output check sum, let me just send that along. But for our laptops and things like that, we don't have 96 gigs of memory on them so we consider this a good enough initial thing for 32-bit systems. Maybe a big, huge 32-bit server wouldn't be but then you go to more 64-bit systems. On the 64-bit systems it's just completely infeasible that they're going to do a pre-calculation of inputs to outputs. So that's the first good thing you get by adding a nonce and making it so that every different self-check sum is fresh and it's a function of a nonce. So now we want to talk about what is the actual data that we need to attest to in order to say that our code is untampered. Well, the first and most obvious thing is we have to read our code itself. We have to measure our code and somehow mix that into the self-check and you can think of it like in the naive case you do a CRC over your own code in order to say my code is good or you do a hash over it, etc. But we're doing a sort of special check sum that has some additional criteria that we'll talk about next. So the first thing is you need to read your own code. The second thing, sort of the second and third thing is that you want to say something about the code that's reading your own code, the first thing that you expect and everything sort of matches up. So first of all, you have the pointer that's pointing to yourself. You include that pointer because that says something about where in memory your code is actually reading itself. So where in memory is your code stored and then you include the instruction pointer next because that says where in memory your code itself is executing and those two things should kind of match up and be expand this a little more. We dive down into the self-check and we say okay what do we need to incorporate. So now a self-check, our pseudo code here would look like get the start of my code, get the end of my code and now inside of a while loop and this while loop is a very critical component of it, inside of my while loop I'm going to loop through and I'm going to just somehow incorporate the nonce that I got in the start, the star dp, that's the dereference data pointer, that's my actual code. I'm going to calculate the EIP using the typical assembly mechanism of just calling zero bytes ahead and then popping the return address off the stack which gives me my current EIP and then going to mix that in as well and then finally there's just this generic mix that we're talking about here and that is basically it's a rotate and that's because some of the previous academic work had found that if you don't, some of the previous academic work had found that if you don't have some like a stream cipher and you want to add confusion and diffusion and so this mixing is the diffusion. You want to rotate the results so that if there's like just a change in one bit it will eventually rotate through and affect all the bits. But so there's some additional problems with even this. So even this code right here, this pseudo code, there's a couple of problems that we have here. The first thing we're talking about, we're trying to get code where when it lies to you, it lies to you in a suspicious way. The timing of it is different. And so the problem here is we want to get it to you know there's some fixed amount of time that this thing should take and you should never be able to go faster than that. Well the problem is the code is potentially parallelizable so that means an attacker could break it up. You know if I'm just writing this code and running it on my CPU, an attacker who knows more than me can go you know you can actually get around that. That's one of the first things a lot of people think of is I'll go faster by parallelizing. You can get around that by computing the self-check as what the original Pioneer paper called a strongly ordered function. I haven't been able to find any reference to this in other works so I don't think this is the correct term but strongly ordered function just means the order of operations matters for the output order. And that means you can't rearrange the order of operations. And so you can take A plus B and you take the quantity A plus B and then you XOR it with C and then you take that quantity and then you add it to D and so forth. If you're accumulating this building up this checksum, if you do it in that order as opposed to you take A and B and you go do it on one core and you use C and D on another core and then you try to XOR those results together with high probability the results are going to be different. And so by doing this just making sure that when we incorporate things into XOR and you want to make sure you're always maintaining that then we're successfully making it non-parallelizable so that takes that off the table. Then the other problem is one of just optimization so we said we want to make it so that our thing is the fastest possible implementation we're running as quickly as possible so that if the attacker adds overhead it makes it take longer and it's lying in a strange way. And so this is kind of a problem which you can't really get around there's no like formal way the academic sort of got sidetracked here with saying we're going to formally prove that this is the optimal Intel assembly and that's a very hard thing to model right? So basically we did the same thing that the previous work did which is you guess and test it and you say alright well let me change this assembly instruction and let me reorder these two instructions and you just keep trying it until you think what you have is the fastest possible way to check some over the data pointer the data pointer to the by the data pointer and the instruction pointer so that would potentially this is what one attack would look like. Okay with this new solve check some in mind an attack against it would look something like this so we have the same sort of loop setup or iterating over our own code and incorporating these key components like the data pointer, star dp you know the data at the data pointer attacker wanted to forge that check some while still tampering with the code you basically have to keep track of where the data pointer is currently and if it's pointing to some of his code that's tampering like inline hook or a software break point or something like that instead of actually incorporating the data at star dp he would want to incorporate the clean copy of those bytes so instead of incorporating his inline hook incorporate what was there previously before he had tampered with the code and he noticed while this isn't a lot this sort of if branching in x86 would only amount to a few x86 assembly codes if we were doing this loop over millions of times like two and a half millions of times that would actually cause the attacker to have to execute two and a half five million 7.5 million extra instructions in his calculation of this check some and that would induce a noticeable time difference and that's actually all that we would detect. So under normal circumstances this is what the self check some typically looks like. So on the left we have the server and what's going to happen is it's going to send a nonce to the client and the client is going to use that as the input, the seed to its self check and then that self check is going to go back to the server and then notice the server is basically incorporating the total network round trip time that it took for the client to self check some and so it's going to say okay your check some is correct because I calculated this on my own side and I know that this is what your code should look like when it's been hashed correctly but also the time it took you to respond to my request was within our acceptable limits so I actually believe you and based on that determination it would basically trust or not trust the results of this next measurement. The next measurement would be a measurement of N2S kernel or your security software or something like that. So this is what the diagram would look like if we were actually attacking and tampering with the software so it starts off the same where we send a nonce to the client and then the client uses the nonce as the input for its self check and then the self check goes back and in this case the self check is correct because the attacker has forged the value he's done that keeping track of where DP is and tampered with data he's going to replace that with good data and so the check sum checks out but then the server notices okay look your network round trip time has increased somewhat you can see that on the left side the delta G is larger than expected so yeah your check sum is correct but I know that something is up because it took too long for you to respond to me and so this measurement of your system that you also sent the measurement of N2S kernel okay in general this is sort of an iterative process where we would you know work on the check sum and then notice some problems with it and then we'd have to change the design based on what we discovered and so another problem we noticed with this approach was that when we're incorporating the instruction pointer it's the same at every iteration of the self check sum so instead of doing this goofy trick where you have to you know call ahead then pop off the stack to get your current EIP and x86 the attacker could actually just so notice in this case he's actually gained a few x86 instructions that he does not have to execute he's made the self check sum smaller and so he's he's gained some instructions by optimizing this instruction pointer being inserted into the check sum so that would allow him to go over here and forge the data pointer and so he'd be able to tamper with our code and have a net result of zero additional time into the calculation of the self check sum so with this in mind we had to go and sort of redesign how we are going to actually calculate this check sum so what we need to do is make it so that it's not a fixed instruction pointer which is always incorporated in the check sum and so how this can be done and how it was actually done in some related work is you make it so that you break up your code into multiple blocks each independent block is still doing the same sort of thing that you just saw in the previous slide it's incorporating the data pointer instruction pointer but the key thing is that you pseudo randomly choose which location you're going to go to next so basically what this sort of diagram is trying to show you can see my mouse is that when you start out at block one you go down to the end and the first loop through block one may choose to go to block two or sorry go to block three then block three you'll get done and it'll go somewhere else but the second time you get through block one it'll pseudo randomly choose that it's going to go to block four and so the key point here is the attacker can't hard code the location where you're going next because it's chosen differently each time so to do that we need to add a pseudo random number generator we use the same one Pioneer did and it's basically you see it with your nonce that's easy you've already got some fresh seed for it for every single measurement and then you just basically do this is nice and simple in assembly language you basically take your current value times the current value squared or with five and so this works as a pseudo random number where it'll cycle through all the possible values before looping back to the one where it started then we can also do some things like making the data pointer a function of this pseudo random value that we now have available so instead of doing a strict linear sweep over our entire memory we can bounce around and we can go to different locations each time and that potentially makes it so that the attacker can't say no that when you get to loop 4086 you're going to hit my data that I compromised you're going to start reading that and so I'm just going to like pop in just in time at 4086 maybe I'll use some other processor to like change this stuff just in time he can no longer have an expectation that for any given nonce he doesn't know when we're going to be reading any change that he potentially made to our own code so this is the closest thing we're going to show to the real code next but this is just again this is still really just pseudo code but this is almost real pseudo code and so how it works is we've got eight macros block zero block one block two block seven each of the macros does the same thing they're basically going and now we're not having that while loop just as straight up regular C code like you're used to we're saying you know there's going to be some loop counter that we're keeping track of as we go mentioned we do two and a half million iterations through this loop so we'll hit one of these blocks you know two and a half million times and so if the attacker wants to compromise this he has to you know modify things within this and make sure that he fixes up all the values and that adds instructions which when multiplied by two and a half million equals many instructions but so basically look at the loop counter if we're done with looping just jump to the end then you know in the real assembly language kind of thing we already have like two forms of the instruction pointer we have one of them just already sitting in the ECX register because that's what we use at the end of the block and then we have one that got pushed onto the stack so we've got like a target destination EIP which is the start of a block and we've got a source EIP which is what gets pushed on the stack by a call instruction at the end of each block and so we start mixing these things together and all we're doing is we're going to do an add XOR add XOR with each of our values that we said we care about starting with the EIP and the next thing add in the pseudo-random number which we change through each loop add in the data pointer the data pointer to by the data pointer go ahead and update your pseudo-random number with that X equals X squared over five and then update the data pointer so that we choose a new pseudo-random location with our own code that's going to be the next place that we're going to read ourselves and then mix it all together with rotate and then finally we're going to calculate the next location to jump to pseudo-randomly and so you can think of these blocks they're all just code in an array in our own kernel module and so you can think of it like an array of eight of the blocks of this assembly language so we're going to start at the base of the array block zero base and then we need to pick the new index pseudo-randomly so we take the pseudo-random number we or we end it with seven in order to get the bottom three bits so that gives us potential eight possible values that we're going to index to in this array of code and so that basically says take the size of each block plus this index zero to seven and then that's the next place we're going to go just calculate that I'll put it in ECX that's where we're going next so you can read the real paper you know this has academic paper behind it 16 pages of goodness and plenty of assembly and basically all of this stuff looks exactly like that pseudo-code I just showed the only difference is we show the actual rotates here and then there's some additional state that we want to keep about the system that we don't have time to go into but there's additional stuff you want to do otherwise there's additional attacks the attacker can do okay so defense is boring so let's talk about offense a little bit so what would the attacker have to do if he actually wanted to try to break this system now the original paper that we mentioned the academic paper out of CMU X where I guess really three attacks that you can do against the style of a self-check sum and so keep in mind that we're incorporating things like the data pointer star DP, star DP, the program counter into the self-check sum so if the attacker wants to forge the check sum he's going to have to and be manipulating the code he'll have to forge some subset of those components we're mixing in star DP, DP or the instruction pointer notice if he wasn't forging any of those values if DP was the same if star DP was the same he would just be running our untampered code and we would be winning already so he's going to have to forge some of those values so in the naive case the worst case for the attacker is basically where he is forging most of those the work components he has to forge the worst for him so in the worst case you can see on the bottom right is when he's forging the data the data pointer and the program counter so how this attack typically works is the attacker would create a clean copy of our self-check sum code what we're measuring over and then he would point DP at that code and he would have some custom his own version of the self-check sum that is going to point DP at that code and then notice if DP is pointing at that clean copy star DP will already be good since it's good and untampered but DP is having to be forged because it's not pointing at the real copy of our code it's just a clean copy it's not where our code actually exists it's executing and it'll also have to forge the program counter the instruction pointer because he's not executing the real authentic self-check sum he's executing his own custom version that is forging these values so that's the worst case for the attacker he's having to forge all those components the best attack we could come up with was in the bottom left where the attacker does not have to forge star DP because he's using a clean copy like I outlined before he does not have to forge DP the data pointer the reason being we're running this on a Windows environment as a Windows kernel module and even in Windows XP that doesn't have ASLR we have something that we have to deal with called foe we call foe ASLR and that's because the Windows kernel module loader doesn't respect the base address that you put into the kernel driver so you could tell the kernel hey I want this driver loaded at address OX dead beef but the kernel is going to do that so we actually rely on the self-check sum code to report back where it's located in memory and this introduces an attack into our system where the attack can also lie and say hey guess what you're actually located where your clean copy is and this allows them to not have to forge DP but they're still having to forge the program counter because they're running their own custom version of the self-check sum code hopefully this will be a little more clear when I show you the diagram next and once again this is what the self-check sum looks like on the left we have the server where the verifier and on the right is the client and so the measurement request comes into the client including the nonce for the self-check sum we have this main switch case that's receiving all these packets and it says okay I've got a self-check sum request so let's go ahead and do that here's my nonce the self-check sum occurs goes back to the server the server is like okay everything's all good you and then obviously the whole reason we want to do this is because we can measure other important parts of the system like the kernel and also believe those measurements so the system is also going to do a measurement of NTOS kernel and then send back to the server and the server says okay NTOS kernel looks good and I believe what you're telling me about it because your self-check sum was good as well alright now let's switch back to attacker mode this is what would happen notice the main thing the attacker is concerned with again is corrupting this this general system measurement so he wants to install hooks into NTOS kernel things like kernel loggers or keyboard loggers and that sort of thing and have them be undetected by our measurement system so that is what he is interested in protecting and not have show up in the measurement values okay and so to pull this off first the attacker installs an inline hook at the beginning of our self-check sum that way he can gain control of EIP he has this custom version of the self-check sum that is going to forge the self-check sum value even though the original code has been modified he still wants to produce the correct self-check sum the measurement request comes in from the server we naturally call the self-check sum code but look here we have the inline hook so the attacker gains control of EIP he is now running his custom self-check sum code over on the right we have the clean copy of the kernel module the unadulterated copy not tampered with the attacker points D.P. at this clean copy and he also lies notice on the response back to the server he is saying hey guess what my base address is actually 0x2000 not 0x1000 and this is allowing him to not have to forge the data pointer because we don't know on the server side where our code should actually be located in kernel memory because of that FOA SLR so then the attacker wants to execute his corrupted system measurement call that way he can lie about what N2S kernel actually contains that goes back to the server the server says N2S kernel looks good the self-check sum matches up with what I calculated on my own side but you took too long to do this so I don't really believe you we would flag that as something that needs investigation that was the design and justification so we ran experiments on our actual corporate intranet and we're going to show you data for two types of experiments the first one is we wanted to see over the maximum hop count that we can find at our campus is this attack still detectable what about network jitter and things like that is there too much noise in the network that we can really see this attack so that's the first thing does it work over the maximum hop count is there a single bound that we can set to say this is the good self-check time and can we use that same bounding at different locations on a network so we're going to take two hosts we're going to put them at the ten links and then we're going to physically move them to eight links away and measure them again physically move them to three links away measure them again and so these were desktop systems so that was not fun so here's some raw data so we had 31 different hosts over in a lab that's used for our training and so 31 hosts you can see that basically the first thing here is that all 31 hosts basically behave the same way at the same time so when we have the no attack present all 31 hosts basically take about 111 milliseconds to do this round trip time so we send a request from our server they're ten links away they calculate the self-check sum in 111 milliseconds when the attacker is there so we have the ability to turn on our reference attacker just like toggle it on with a request when the attacker is there clearly the attack is a much different time it's clearly outside of this baseline time we can set some fairly narrow gauge on what's acceptable for the amount of round trip time but so this is the first sort of validation that although it's not a large magnitude of increase it's not a large absolute time it is still a decent percentage time and so we can always just kick up the number of iterations and we're still going to have this what ends up being like 1.7% overhead for the attacker it's always going to take in 1.7% longer in order to do this so if instead of doing 2.5 million iterations we do 25 million iterations then instead of taking an extra 20 milliseconds it's going to take an extra 20 milliseconds so you can always increase that but it's at the cost of this thing taking more time but you see a graph like this and the first thing your eyes are drawn to are the outliers and you say what is that what does that say about your results and so first of all you say when there's no attack present it actually sometimes goes faster than the quote baseline correct good thing and so we think that's just sufficiently optimized this or that we're not potentially taking some caching effects and things like that into consideration but the important thing is that first of all the amount of magnitude that this pulls a good guy down in these best case runs is not enough to pull an attacker down into the good area if an attacker could somehow take advantage of this and the second thing is that like I said this is 31 hosts and this is actually just one off measurements here it's basically you know this one host this one time somehow got lower and so we basically think this is just like I said we're not necessarily taking all potential caching effects and stuff like that into into account and so maybe in some cases it's just getting lucky and it's hitting you know the perfect all the all the cash and everything is just lined up so that the thing runs a little bit faster and I should say we validated these network roundtrip times against RDTSC self reported self amount of time which we don't we wouldn't use in real system but for experiments we confirmed that these are not like network abnormalities these are actually on the host it definitely took less time and so the next thing you're going to ask is what about the things where in the attack case it's getting pulled up so you see there's no attacker data where he gets like pulled down accidentally there is attacker data where it gets pulled up we think that's again the same thing we optimize the thing as best we could through guess and test and we know the attacker is adding in additional instructions clearly it adds overhead but then he's maybe having more opportunities to get unlucky on the caching and side effect type things as well so the next thing was just to say something about you know is there a single value that we can say in baseline like this is the correct time that it should take for a you know core 2 duo running at 3 gigahertz you know this particular revision of the hardware and so the answer is yes there is some upper and lower bound we can set but of course as you would expect the more network hardware you go through the more potential there is to pull up the amount of time that is the quote good time and so all this is showing basically is that when you're one length when you're connected through a crossover cable your measurements are down there in the you know 110.25 range and when you're over 10 lengths you're up closer to the you know 110.75 range things like that so this is saying we can have we do have a wider range in this sort of case but the key point is we can use a single baseline and it's still sufficiently distinguishable from the attack traffic as you can see at the top of the graph and so that worked for us network crunch of time within our internet but obviously when you have jitter and things like that we don't expect this to work over the internet we don't expect this to work over wireless so Corey is going to talk about what we can do in those cases so like Xeno pointed out this scheme obviously wouldn't work over like a VPN over the internet because of the large amount of network jitter lucky for us these academics proposed a similar protocol that used the trusted platform module as sort of a trusted stopwatch is the best way to think of it and so that way you can push the timing to the client side and you can actually trust that's that timing because the TPM is assigning the times unless you're Chris Tarnowski and have a fib or something hopefully you can afford your own time values so the way the system works is it's very much the same nonce comes into the client the client before calculates a self-check sum it does a tick stamp which is asking the TPM okay give me a signed current time value then it incorporates bits from that signature as it's nonce just to avoid some possible attacks there and then it calculates a self-check sum with that as a nonce and then when it's done it does one more time and that way we have the signed time for the beginning of the check sum and the end and the server can obviously just verify the signatures on the server side to make sure the times haven't been tampered with and then take the delta the difference between the two times and say okay this is how long your check sum actually took and so this is great in theory because it would all the timing is on the client side we don't have to worry about network jitter at all unfortunately things with the TPM are kind of crazy and don't work very well so let's talk about some of those results alright so here's that TPM implementation on the single host and you can see it's much like the diagram I showed you before the attack is installed in the middle obviously and we have a very distinguishable attack is installed attack is not installed so we can detect that yeah notice down here on the on the right the median goes down after the attack is installed and we really have no idea how serious things happening with the TPM that we don't understand and let's talk about let's look at this diagram so this is a 32 host running that self check sum protocol yeah the TPM as well and notice they're all exhibiting this okay the time goes up with the attack is installed but also notice that all their baselines are completely different right so these were all identical systems identical CPUs identical standard platform modules and everything but what we discovered was that each TPM exhibits different timing characteristics which is kind of bizarre and I even weirder for some of you people that do side channel attacks is that the TPM will actually each TPM key will exhibit different timing characteristics so anyone that's into side channel attacks that's something you should investigate and you can also see there's this one host up here you know the crazy blue line towards the top and we really thought the hell was going on to that TPM it was just totally broken or something we think we actually broke the TPM from overuse we were using that specific system for previous experiments where we were just like hammering it with a bunch of TPM tests so we somehow broke it alright so so at this point this is where most academics would claim success and move on you say I have proven that I can detect you know I can detect when you're tampering with my software I can detect an intranet and I can detect it you know using this TPM mechanism yes with the TPMs I have to baseline each system individually but you already have to go around to every TPM anyways to install keys or tell it to generate keys so when you're generating the keys you can potentially you know do a timing baseline anyways but when we were doing this you know we were concerned with how can we actually get around our own system and so what we found is that the what we've been doing here is we've been trying to channel the attacker to this very narrow place we want to force him if you want to hook the kernel you got to hook the security software and then you got to hook us and then you got to hook our self check but aha you know it's a trap the self check will actually behave differently once you hook it well it turns out that the attacker doesn't have to hook any of that or he can change the way he's hooking in order to get around us and you can do this with talk to attack system it'll look different from when the system's running normally so there's three conditions that have to be meant for a talk to attack and if we can break any of them as defenders we can break the whole talk to attack and you know different systems have different assumptions which break some of these but in our case where we're assuming we're at the exact same level as the attacker and we're in the Windows kernel where there's a lot of flexibility this can be pretty difficult so the conditions are the attacker has the same privilege level that's pretty easy right they can always know when you're about to start a measurement by just putting a cell they can put a hook into your code right they're always going to bounce out to their code before you ever even have a chance to measure your own code right way up in the prologue way up in the code that listens for network packets so the attacker has to know when you're about to start they have to be able to move themselves to some unmeasured location or they have to remove their code so they basically have to clean things up so that at the time that you check it it looks clean and they have to be somewhere that's not measured and then the third thing is that they need to reinstall as fast as possible after the check is done because if they don't reinstall right when you're done measuring then the system is running for some period of time where they're not in control and that's kind of a problem that you know leaves them vulnerable to things to other just regular sort of picture we left off with you know the attacker has fallen into our trap they've gone after the self checking software but you know if he's being clever about this he actually doesn't have to do that what he's going to do is he's going to say you're about to do a self check you're about to do you know a measurement of NQS kernel or you're about to do a measurement of you know McAfee I see you're about to do a measurement I'm just going to go ahead and erase all of the things that you're about to measure everywhere that you would still have hooks elsewhere in the system he just has to target it to the specific measurement you're doing right now and so in that case you do a measurement and there's no changes detected once you're done he reinstalls himself immediately at the end alright so so we really found that there's actually a bunch of academic work that is talk to attacks but it doesn't like claim it explicitly and so that's why we kind of thought it was important to pull out this because you look at things like academic and non-academic work as well I mean if you're thinking in a talk to sort of sense of things you can look at like where to join a Rakowska leave off the you know virtualization based root kits and things like that she left it off with okay yeah you guys can go time me in order to detect that I've got a hypervisor malicious hypervisor but I'm going to watch for you timing me and if you do that I'm going to uninstall myself let you do the timing check and then once you're done I'm going to reinstall talk to attack and so there's this is sort of a problem for any sort of measurement system there's the potential here and you have to design with it explicitly in mind and so we don't think that most things are doing that and we certainly didn't do that and so now we're going back and trying to figure out how we can deal with talk to attacks to break things and so most academic work will just say like you know I'm in the kernel you're in user space you can't see when I'm about to do a measurement I'm an SMM you're not hypervisor you can't see when I'm about to do measurement right you can break it by violating the assumption that they don't know or you can break it by saying there's no unmeasured location I'm just going to measure everything and that'll work on you know some limited systems embedded systems things like that but for desktop systems we're not going to measure every all four gigs of memory or you know if it's a 64 bit system however much memory we're not going to measure is reasonable so we're sort of focusing more on the you know preventing them from reinstalling themselves deterministically having some window where they've let go of the system and taking advantage of that the other just minor point is that we've said this talk is about code integrity protection detection control flow integrity just meaning you know if there's other ways besides hacking the code that they can get acts they can start running their stuff the most common is for talk to attacks that's why they can see that you're about to start doing things all right so in conclusion what we've basically shown in this work is that you know we have a more practical system runs on Windows XP runs on Windows 7 we have a system where we can detect if you're modifying our self-checking code can detect it on the intranet based on the round trip time and we can detect it you know sort of arbitrarily based on TPM trusted stopwatch line the time for your specific system and we know that the attacker can get around it with talk to attacks but the key point is we he's doing a talk to attack our code is unmodified so in the academic sense we've won but you know we're still concerned about this so we still want to find ways to mitigate talk to attacks so thank you for coming you can go check out the code on the Google code page and we've actually got videos rather than videos that show you exactly what you should do in order to install our software and validate our results run it on your own machines tinker around with the assembly code see if you can make it go faster than we can and so speaking of videos the last thing we're going to pimp here is open security training that info that's a site that we contribute to on our free time where we put up class material that we use for content videos so I've got eight days worth of material on things like x86 assembly architecture virtual memory type things p binary format root kits cori's got two days currently up on exploits intro exploits but he's got three more days of videos that we're going to be editing after this so we've got things on that we've got a TPM class in the works which will sort of make TPM not a black magic voodoo DRM box it's sort of interesting chip that you should really try to play around with and we've got things like virtualization reverse engineering and so forth so check that site out and come visit us in the Q&A session thanks for your time