 So, good morning, everyone. Welcome to the first rainy day of this conference in Vienna. It's my pleasure to introduce the third invited speaker of this conference, which is Christian Kohlberg from the University of Arizona. So, in case you have been asleep in the past three years, you may have missed that obfuscation has become quite a hot topic at crypto conferences, but it has actually been a hot topic and from a more practical engineering point of view for way longer. And I think it's fair to say that Christian is one of the key players in this area. He's on his PhD at the University of Lund in Sweden, then had held position in New Zealand, China, before now moving to the University of Arizona. And he's the author of a pretty well-known book in this area called Surreptitious Software about Obfuscation, Watermarking, and so on. He's a co-developer of a tool for obfuscating a Java bytecode called Sandmark. And, for example, I know his work about taxonomies of obfuscation transformation, which is definitely worth a read. We're very happy that today he accepted our invitation, and we'll talk about code obfuscation from an engineering point of view. Thank you very much for that introduction, and good morning, everybody. Today I'm going to try to cover five topics. We'll start by looking at some common applications that, from a security point of view, fall in what we call the mate or man at the end scenario. So this is a scenario in which an adversary is in complete control over a device and can violate it for some... And next I'll talk about tools that protect against such mate attacks, particularly code obfuscators. At the heart of such tools are obfuscating code transformations. And I'll talk about those next. And at the same time I'll talk about ways to attack these transformations. I'll give some examples. Having access to good obfuscating code transformations isn't enough, however. So next I'll talk about how we deploy these in practice, particularly talk about various kinds of diversity. And finally I'll talk about the major challenge in this area, which is how we evaluate obfuscating code transformations. And then I will conclude with some discussion. So at this point I should issue a fair warning. Title of this talk is engineering code obfuscation, and I'm not a cryptographer, and there will be no crypto in this talk. There will have to be quite a lot of programming language stuff. So let's start by looking at some scenarios where code obfuscation could be part of a security solution. So this is Alice. Alice is your teenage daughter. She has a boyfriend named Bob, and today she's sending him a message over Snapchat with a picture of her thumb. And she sets a time out, so after eight seconds, that picture of her thumb gets deleted from his phone. Of course Bob, so in my scenario, unlike yours, Alice is a good girl, and Bob is a bad boy. So Bob, of course, wants to save this picture, and so he takes a screenshot on his phone. When that happens, a message pops up on Alice's phone saying, Bob took a screenshot, and then she can unfriend him, or what kids do these days. Now, Bob doesn't want that to happen, of course, so he downloads a hack from the Internet that allows him to take a screenshot without anyone finding out. So Snapchat doesn't want this to happen, of course, so they add some software protection code to their application. It checks that the app hasn't been tampered with, that the environment in which the app is running is healthy, and that Bob is not being curious in looking inside this application and extracting keys and pictures and whatnot. So Snapchat recently hired Moti Young as their security and privacy scientist, and Moti actually is the one who invented the term that we now use for this type of scenario, namely mate or man at the end. So what's a mate attack? Mate attacks occur in any setting where an adversary has physical access to a device and compromises it by inspecting, reverse engineering, or tampering with the software or hardware. Okay, turns out that this type of scenario pops up all over the place. Now here's Alice again, this time she's a distributor of television programs. She runs a television network. Bob is a customer who subscribes so he can watch sports. And so he has a set top box, a TV that allows him to watch a program as long as he's paid the bill. Of course, Bob is a bad boy and he'd rather not pay his bill and still watch sports. So he can tamper with this set top box, he can clone it, he can extract keys from it, he can extract media or content and serve it up to others on the internet for profit. Now Alice of course doesn't want this to happen. So she adds some software protection code to this device. Checks that hardware hasn't been tampered with, that the software hasn't been tampered with. And again, Bob is not being curious and looking inside and extracting information. And if he does, then she can punish him by cutting off access. Now in this scenario, Alice runs a public utility, such as a power company. Again, Bob is a customer and outside his house is a smart meter. And this smart meter sends back his consumption in real time to the power company. Of course, he'd rather not pay his bill, so he tampers with it and so it sends back the wrong information. But this smart meter in fact does more than that. It allows the power company to turn off power to your house in case, for example, there's an acute power shortage. Now Bob, the evil terrorist, hacks this meter outside his house and figures out how he can send out 5 million disconnect notices to all the customers in Manhattan causing global financial crisis. So let's now look at the kind of tools that we might use, Bob and Alice might use to protect an application or to violate an application. Alice uses code transformation tools. And those take a program as input and produces a more protected program as output. And inside that program are some assets. This could be source code itself. It could be algorithms. It could be keys. It could be IP addresses of service we want to connect to. It could be media we've decrypted. And this tool also takes as input a set of code transformations and also some description of the overhead that we can accept and the amount of protection that we would like to achieve. Now there are all kinds of protective code transformations and you probably combine them depending on your scenario. So office getting transformations, which is what we'll talk about today, make programs hard to understand. Tampa proofing transformations protect integrity of an application. Remote data station transformations check the health of an untrusted node in a distributed system. White box cryptography protects media in a digital rights management situation. Environment checking transformations checks that an application is running in a healthy environment such as not running under a virtual machine or emulator and so on. And watermarking transformations embed unique identification programs so that we can track them later. And there's a wide variety of free and commercial tools in this area. Commercial tools may run you $100,000. There's currently an EU FD7 project called Aspire, which is building a tool chain. And I have a tool called Tigris that's freely downloaded from our website. And if you're interested in this kind of stuff, you can download it and play with it. And that's what I'll talk about today. Bob, on the other hand, his tools to use code analysis to try to extract from the Alice's protected program her assets. And there are lots of different types of code transformations. They fall in the general category of static and dynamic static trans code analysis, take a program as input and just looks at the program itself, the code, the binary executable, for example, to extract some information. Dynamic analysis, on the other hand, runs the application with some particular input and extracts information that way. And Bob's goal is to extract the asset that Alice has embedded. Bob, too, is concerned about performance. He wants his analysis to be as quick as possible and as precise as possible. And here, too, there are a wide variety of tools available, both free and commercial. At this point, it might be interesting to think about what makes for a good office-gating transformation? What matters in this scenario? Well, one thing that matters, turns out, is performance. One reason that it matters is because often we are running on very constrained devices, like phones or smart meters and so on. Obviously, security matters as well, or we say time to crack. And we'd like to protect our applications in a variety of attack scenarios, a completely manual attack when Bob just buckles down and gets in the code and tries to reverse engineer it, or a completely automatic attack when he uses one of these advanced code analysis tools, or a combination where you have a smart human combined with the latest and greatest in the code analysis tools to track the violator application. In some situations, you're also interested in the stealth of the code, how well it fits in with other kind of code. And this is true, for example, for office-gated malware, which just wants to hide itself from malware scanners. So what does it mean when I say that performance matters? So to find out, I queried some colleagues in industry, say, what does this mean to you? And one guy who works in, let's say, the media distribution industry, told me that for them, user experience cannot be harmed. In other words, every operation has to run in under a second. Think about flipping channels on your TV. Every time you do that, you have to check whether you should be allowed to watch that channel that has to be run in under a second, or you'll be very annoyed. Another company writes that for a particular customer, they expected no more than 150% slowdown and reduction in increasing size. Another person says that they recommend for their customers a slowdown of 100 to 1,000, but that's for a security kernel, not for the application as a whole. We also timed some commercial code obfuscation tools, and they also ran in the order of 100 to 1,200 slowdown. Now, of course, as you know, as we've seen this week, there's been some exciting developments in indistinguishability obfuscation. And I'm aware of two implementations at this point. The most well-known, I think, is the one that was presented as crypto two years ago, where they put out a challenge. It was an obfuscated 16-bit point function, took seven hours to generate, was 25 gigabytes large, and took four hours to execute. And it was broken pretty quickly, essentially, by speeding it up. So given that this is a pretty trivial piece of code, and the overhead was so large, we're still a little ways off, I think, from being practical. But this is certainly very exciting developments. Obviously, time to crack matters as well. And what does that mean? So again, I asked some colleagues in industry, and they said, one guy said, well, our application has, our protection has been in the field for years and not being cracked. But that one, I believe, uses the combination of software and hardware. Another one said that he expected a highly skilled and motivated reverse engineer to be able to crack a well-protected application in four to six weeks. Another one said that something protected by VM Protect to completely extract the complete semantics of the program, original program, might take like 12 months. On the other hand, mass market malware, which uses obfuscation, is easy to crack, minutes to hours. All right. So with all this in mind, let's look at some actual obfuscating transformations. And at the same time, I look at ways that we can attack these transformations. And Alice is going to start by virtualization. And I'll tell you in a minute what that means. And Bob is going to counter with a completely manual analysis. And then Alice will counter with randomization. And then Bob will counter with a complete static analysis. And then Alice will counter with a dynamic obfuscation. And you can see where this is going. All right. So let's look at virtualization, which is a very common transformation, both in free and commercial tools. So you take a function P0. And you can, from that function, you construct a unique and random virtual instruction set. And then you construct a interpreter that will execute programs in this instruction set. This is similar to an interpreter for Java or Python, except it's specific to this particular random and unique virtual instruction set. And then you translate your original program P0 into this instruction set. And that's what we call the virtual program array. Now, inside this virtual machine goes two things, essentially. The dispatch unit, which is the yellow thing here, which fetches the next instruction and then jumps to one of the blue pieces here, which is what we call instruction handlers. Each of those implement the semantics of one of these virtual instructions. And there's a virtual program counter and there's stacks and registers and stuff like that. The virtualizer also takes as input a seed, which allows us to generate multiple variants from the same program. Each one with a unique instruction set, a unique interpreter, a unique virtual program array. So what does it look at runtime when we execute one of these virtual machines? Well, the dispatch unit fetches the next instruction, jumps to the appropriate instruction handler, executes it, we increment the virtual program counter and go back up the loop of this interpreter. And then we pick the next instruction and do the same thing. And we continue until we're done with the program. Okay, so how would Bob go about attacking this? Well, first of all, he'll just start with a brute force manual analysis. He digs into the binary code of Alice's protected application and sees what he can find. So he locates the instruction handlers, the blue pieces of code here that implement the instructions, semantics of the instructions. And he manually reverse engineers the instruction set. And then from that, he manually constructs a disassembler which can take the virtual program array and gets it back to code. And then he optimizes that code and he magically turns it back into C source. Now, clearly, there's more going on to do this that's on this slide. And it can be very tedious and hard work, but we expect that given enough time and motivation, Bob can certainly do this. So what does Alice do to counter this? Well, the first thing she can do is to create an arbitrarily complex semantics for this virtual instruction set. So here's an example of an instruction that Tiger's virtualizer generated. It's 53 bytes long. And those of you who remember the old Vax VMS, Vax, the longest instruction could be 52 bytes. And so I beat that by one byte. 52 bytes long, it takes 14 arguments and semantics is just very, very crazy. And you can look from, you can see from the instruction handler that that also gets very complicated. You can also randomize the interpreter itself, for example, by picking a random dispatch method. And in general, every decision that the obfuscator makes, the virtualizer makes is randomized, which allows it to generate a large number of unique interpreters and unique instruction sets. Another way to frustrate static manual analysis is to compose transformations. So after virtualization, we can apply any number of other transformations to make the code unusual. And it's nothing stopping us from virtualizing multiple times. And each time using a different dispatch method and a different virtual instruction set. All right. So what's Bob going to do now? Let's assume that we've frustrated him so much that he will no longer attempt a static analysis. So instead of a manual analysis, he'll now try an automatic static analysis. And this means he's going to just look at the code automatically with a tool without executing it. The problem with static analysis is that any results will get out. It's going to be an over approximation of the program semantics. So the question is, when you virtualize a code like this, when you take a program and you translate it into a virtual instruction array, virtual program array, does static analysis actually get you anything? Can you extract any interesting information? So let me go through a very simple example. And if those of you who took an undergraduate compiler class and saw data flow analysis, this is exactly what I'm doing. So here's a very trivial virtual machine. It has two instructions, move and increment. And the blue boxes are the instruction handlers that implements the semantics of the instructions. And the yellow box is a dispatch unit that fetches the next instruction. At the bottom of the slide, you see the program array, which has three instructions, a move, an increment and an increment, and the green boxes are the upcodes for the move and increments. And the purple boxes are the arguments to these instructions. So what are we going to do here is we're going to work on what's called abstract domains. And for us, this abstract domains are going to be integer intervals. And our goal is going to be compute the range of indices that the virtual program counter can take on. So let's get started with this static analysis, with this data flow analysis. So the virtual program counter starts being zero. And so our initial domain is the range from zero to zero. And the first instruction is 52, which is a move. And so we're going to push this domain, this state, abstract state, down into the right branch of the interpreter. And we'll pass the instruction handler and we increment the VPC by three. And so now we have a range from zero to three. And at this point, our analysis is precise. And then we're going to push that state back up to top of the interpreter loop. At this point, we've reached what's called a join point. Control can come to this point from two different directions, either through the initialization or through the back edge of the loop. And so we're going to have to join these two abstract states. And now we get the range from zero to three. Now notice that we already have some imprecision here. It looks like the bytes at index one and two can actually be opcodes. It turns out they can, they're arguments. But let's ignore that for the moment and just press on. The next instruction is an increment. So we push our abstract state down to the left, increments for the VPC, go back up the top of the loop, pick the next instruction, which is also an increment, go down, back up the loop. And in general, we will continue doing this until we reach the fixed point. Now you see that what we've achieved here is a pretty gross overgeneralization of what the actual result should be. It looks like all the bytes from zero to five could be opcodes, which is not true. And this is because we're computing these summaries over the interpreter, not over the program, the actual program which is in this virtual program array. And this is not to say that we couldn't do better. This is pretty bad static analysis. We can certainly improve it. But in general, we're always going to have some imprecision. Okay. Nevertheless, Alice wants to make sure that Bob can't learn anything from a static analysis. So what can she do? So the best trick you can play on static analysis is to make the code never available to it. And so that's what we can do. We make sure that the code does not exist until runtime. So we virtualize the program and then we take the interpreter and we turn that into instructions for a compiler which is embedded into the executable and which runs at runtime. And only at runtime constructs actual instructions for the interpreter and we jump to them. Now this completely frustrates static analysis because the code is not available to it. Now it turns out that Bob can easily bypass this type of transformation. He just has to find, because the code will be in clear ticks at some point during the execution. So all he has to do is find that point, attach a debugger, dump that code, and now he has to code in clear text and can perform a static analysis as before. So what's Alex going to do now? Well, she can improve this situation a little bit with what's called a dynamic analysis. And the idea here is that we're going to keep the code in constant flux at runtime, always going to be changing. And also at no point in time during the runtime is this code going to all exist in clear text at the same time. And there are lots of implementations of this idea. And many of them follow the following scenario in that they have encoders and decoders. And before you jump to a piece of code, you decode it. And when you're done with it, you re-encode it. So here, before jumping to the blue code, the pink code will decode the blue code. Then the blue code will decode the yellow code and jump to it. The yellow code will re-encode the blue code and then decode the green code and jump to it. The first implementation of this idea came out many, many years ago by Intel. And the idea here is that we're going to encode blocks by exhoring them with each other. So the striped block here is actually the exhor of the yellow block and the pink block. And before the blue code jumps to the yellow block, it decodes it by exhoring it with the pink block and the yellow block appears. Here's another idea where we're going to just encrypt the blocks. But we're not going to encrypt them with a key that we store in the executed block because that's generally a bad idea. Instead, we're going to use other blocks as keys. So here, before jumping to the yellow block, the blue block will decrypt it using the pink block as a key. And finally, in this case, we're going to overlay two blocks in memory. We have two blocks of code that occupy the same memory location. Now, that seems weird. How can we do that? Well, we're going to have two patch functions which will either get the yellow block out or the gray block out in this case. So the blue block says I want to jump to the yellow block. It patches that memory location and gets the yellow block out. And then the brown block says, well, I want the gray block, so I'll patch that same memory location and get the gray block out. Okay, what's Bob going to do now? Code keeps changing at runtime. It's never in complete clear text. Well, he can now turn to dynamic analysis, meaning he will run the program with a particular input. And what I'm going to show you next is a very clever idea for doing generic de-opfuscation. This is a technique that works on many, many different types of obfuscation, including the ones you've seen already today. And so if we're going to think about programs as pure functions, they have some input, they trace some path through the program to the code, and they produce some output. Now, when we obfuscate it, well, it does the same thing, right? It takes an input, it traces a path, it produces some output, except now it takes a more secure path through the code. So what we're going to do is we're going to take a complete trace of the program. Now some of those instructions, the pink ones here, originated with the obfuscation code, right? They could be the decryption code that we just saw. It could be the back edge in the interpreter loop that we saw. Okay, let's just get rid of those. And then through some advanced magic, we'll flow that trace back into source code. Now, you might imagine that dynamic analysis has some problems with it, and it does. First of all, traces can be huge. And you can imagine that as an obfuscator, I may have tricks to play to make the trace even huge. And you can imagine that the trace does not always cover all the parts of the program, depends on input. And so we will all only be able to de-obfuscate the parts of the code that the trace actually touch. And you may also imagine that there may be ways for me to stop you from collecting a trace in the first place. But we're going to ignore these issues today. All right. So how do we do this? How do we get rid of these obfuscation instructions? Well, we're going to do something called dynamic taint analysis. So take the obfuscator program, get a complete trace, and then we do a forward taint analysis, which means we'll start with the input and we'll find everything that depends on those inputs transitively. We mark every instruction that depends on input. And then we do a backwards taint analysis where we mark everything that contributed to the output. And then we get rid of those instructions. We do some compiler optimizations of this trace, reflow it back into something that's a reasonable approximation of the unobfuscated code. So this is brilliant, right? Now we have a technique that gets rid of all instructions from a variety of obfuscations. So let's look at what happens for virtualization, for example. Here we have the interpreter, it has the virtual program counter, the stack, the virtual program array, and so on. And if you think about it for a minute, you realize that none of those are input dependent. So when we take a trace, none of the instructions that contributed to the interpreter loop, the dispatch unit, and so on, and the instruction handlers will be marked by the taint analysis. And so we'll get the original code back. All right? Alice will not be deterred, however. So to frustrate this analysis, we somehow need to trick taint analysis into thinking that all the instructions are actually tainted. And all we have to do is to somehow trick them into thinking that the DPC and the stack and the virtual program array are in fact input dependent. And it turns out this is reasonably straightforward to do. And in that case, taint analysis will mark every instruction as being input dependent and the deophascation no longer works. Now so far, what I have we've talked about is deophascation trying to get all the instructions back, all the source code back, the original source code. Now in many cases, what the adversary wants is something much less than that. For example, if you're a virus scanner, you don't care what's inside the virtual machine. You just want to detect if the program is doing something suspicious, such as having a virtual machine that might hide some malicious code. In this case, all you want to do is detection. So now you can, as a virus scanner, you can just take a trace of the program and you can notice that you have lots of forward jumps from a dispatch to a instruction handlers and lots of backwards jumps from the instruction handler to the top of the interpreter loop. And it's a pretty telltale signature of an interpreter. So again, what can Alice do? Well, she can try to make the trace stealthier. And it turns out that with these interpreters, it's easier to construct an interpreter. And tigers can do this that you can start and start and stop and restart. So now you add calls to the interpreter, all of your program, run a few iterations in every place. And so now the instructions from the interpreter that maybe contains a malicious code is interspersed with the instructions from the original program. So now you've increased the dynamic stealth of the malicious code. So at this point, I think it's important to understand that performance matters not just to Alice but to Bob too. So to give you some examples, static analysis of a Fibonacci program, a tiny program, virtualized with tigers took 40 seconds in a trace in 71 megabytes of memory. A taint analysis, which is bit-level taint analysis, which is what we saw for the generic de-obfuscation of a Huffman coding program, which is still a small program protected by VM Protect, which is a commercial tool, took 500 seconds. And concoct analysis, which we haven't talked about today, of a 14-line trivial program protected by VM Protect, took four hours. So performance matters of both size and speed for these analysis. So it should be clear at this point that what we have is a classic cat and mouse game. Alice makes better, Bob makes better analysis, and Alice counters with better transformations. And we say that obfuscation provides time-limited protection. It will always take Bob some amount of time, some non-zero time, to extract an asset from an obfuscated program. So the question is, how do we get useful levels of protection from individual transformations that only provide time-limited protection? So let's now talk about how we might deploy obfuscated programs in practice. What do industry do in this space? Well, they essentially have three strategies. First of all, industry monitors online hacker forums to find out which of the transformations is about to reach the end of their life. And then, in their back pocket, they have new transformations that they can roll out when the old ones no longer work. And third, you have to deploy your current set of transformations in such a way that your adversaries get an ever-changing and diverse set of targets to attack. And I'm going to next talk about the third point here. We talk about three kinds of diversity, spatial diversity, temporal diversity, and semantic diversity. With spatial diversity, you give every client a different obfuscated program to attack. And this is easy for us to accomplish, because if you remember, the obfuscator takes a CDES input from the same input program we can generate multiple different obfuscated versions of that program. And the hope is that these obfuscated variants are so different that collusion between the customers or adversaries is not going to be possible. With temporal diversity, your client sees a sequence of code variants over time. And the idea is to overwhelm his analytical abilities and to give him a small time window to execute his attack. This is sometimes, this goes back to a paper on the Depression from 1932, which calls, which talks about planned obsolescence. Semantic diversity is similar to temporal diversity. In this case, too, the adversaries see a sequence of obfuscated code over time. But the difference here is that the variants are semantically incompatible. So that means that Bob has to crack and re-crack every new variant of code that we send him because the old ones have no value to him. So early on in this talk, we talked about a particular scenario when we have a trusted server talking to untrusted hosts. For example, the Snapchat app on your phone is untrusted, but it talks to a Snapchat server which is trusted. This turns out to be a good scenario to implement a diverse system. And so the idea is that we're going to the secure server, the trusted server, is going to force new code updates onto the untrusted client. Again, the idea is that we're going to overwhelm his analytical abilities. And this is a system we built a couple of years ago around the Tiger's OfficeGator. And it does continuous replacement of code at the function level. And so you have a server to the left, you have the client to the right. And in a normal case, the client and the server communicate the way they might do over remote procedure calls. But once in a while, we'll pause execution and we'll start sending over newly office-gated function blocks. So that's temporal diversity. And we can turn up the rate of replacement depending on the threat scenario. We're also giving every client a different set of blocks. So that's spatial diversity. So let's look at how we do semantic diversity. So here's an example. The server starts by sending over a function 2 and the client immediately hacks it. Then the server again sends over a new version of function 2, which is semantically different than the previous one. And this client says, ah, you know what? I already have function 2. I don't have time for this. I have other things to do. I will just ignore this update. And then at some later point in time, the client sends a remote procedure call but from the old outdated code. And the server will detect this and can either shut down communication or punch them in other ways, for example, by sending some function of death over. So where are we so far? Well, we have seen some scenarios where obfuscation is useful. We've seen obfuscating transformations that give us time-limited protection. We've seen updatable security for longer-term protection. But how do we know that we're doing anything good? How do we know that these obfuscating transformations help us at all? So let's talk about how we evaluate obfuscating transformations. And the sure story is that I don't think we have a clue. So as an academic, I would like to be able to write a paper that says, hey, I've got this wonderful new code of transformation and it's got better performance, better security than previous one, and so please, Mr. Reviewer, accept my paper. And without being able to do this, it's very difficult for this field to make progress. So what does industry do? Well, when industry has a new obfuscating protective transformation, they give it to a professional red team. It could either be internal to the company or external that they hire. And the second thing they do is that, again, they monitor hacker communities over long periods of time, so they built up a wealth of the information on which transformations work, for how long they worked, and so on and so forth. Now, unfortunately, and this may come as a shock to you, they do not share this information with us. And this leaves me as an academic in a bind. Because I can't afford to hire a professional red team whenever I want to publish a new paper. So in academia, we instead invented programmatic ways to do evaluation. And it goes something like this. You take a program P, you obfuscate it with two transformations, T1 and T2, you get new programs, P1 and P2, you push those through some sort of metric function, out comes two numbers, you compare those numbers, and you say, well, it looks like transformation T1, T2 is better than T1. So this is like a stand-in that we have for red team evaluation, which is really what we would like to do. So which metrics should we use? Well, people have tried a number of things. The first thing that comes to mind is, well, let's just use students. Students are almost like professional red teams, except infinitely cheaper. Unfortunately, students are nothing like professional red teams, they're inexperienced. And also it doesn't scale, right? If every time I want to write a paper, I have to get new group of capable students together to evaluate my code, it's not gonna work. And so how would you do this? Well, you would obfuscate your program, and you give it to your students, and you would time them, give them a task like extract some asset, and then they would time them, see how long it takes, and that would be your metric. And also if you use the same students over and over again, they get better over time, and so your measures will be skewed. The second idea that people have come up with is to use what's called software complexity metrics. And these are developed in the software engineering community many, many years ago to evaluate the structuredness of a program, how well-structured a program is. And here are two common ones, one called not count and one called cyclomatic number, and they evaluate the structuredness of a control flow graph, of the control flow inside a function. So the problem here is that these were invented for a completely different task, and it's entirely unknown whether these metrics actually measure something that's useful for an obfuscated transformation. It says anything about how an obfuscating transformation would perform in the field. And also there are literally hundreds of these in the published literature, and which ones should we be using. The final and most common one is called, is using analysis tools, and this is what people tend to do. You take your obfuscated code, you run it through one of these tools like CLI, and you measure the precision and the time, and that's how much that's a metric. To give you an example of that, while back we were interested in evaluating one of our transformations, and so we wrote this little point function, ran it through Tigris, and then CLI, and CLI finished in half a second. So this is not good, right? It means that our obfuscation is not very powerful. So we say, okay, well let's tweak it, let's add some other transformations, and we did the same thing again. Now if CLI had come back and said, still takes me half a second, we were to think, well, you know, that's probably not a good improvement, but that's not what happened. What happened was that it now took 10 minutes, and of course we're very excited. Clearly we've made an improvement, but did we? Well the fact that CLI now failed or took longer could be due to many different things. It could be a bug that we just tickled, it could be a performance bug that we tickled, or it could be that our transformation is actually fundamentally better than before, it's really impossible to know. So what's missing here is some sort of validation step. We need to know which of these metrics, or which combination of these metrics in this programmatic way of evaluating obfuscations actually related to how well transformation would perform in the field, how well a red team, for example, would be able to attack this. How would we do this? Okay, so this is what I'm kind of working on right now. We're gonna start by building a model of real hackers, how they really attack obfuscating transformations, how long it takes, which tools they use, how long they use them for, and so on. I'll tell you in a second how I'm proposed to do that. And then we're gonna correlate, so we're gonna build an adversarial model of what hackers do, and then we're gonna correlate that with these potential metrics, a combination of these metrics. Okay, how are we gonna build this adversarial model? Well, soon you'll be able to download from our website this virtual machine. It has inside it all the state-of-the-art code analysis tools that we've seen. It also has a bunch of challenges. And then you are going to download this, and you are going to crack these challenges. And why, Christian, are you telling me? Why should I be cracking these challenges? Because if you do successfully crack the challenge, I will give you a boatload of cash. And in exchange for this boatload of cash, you will allow me to run monitoring tools in this virtual machine that would send back to me information about the steps that you took to crack my challenges. And if we'd be able to run this for a long period of time, hopefully we'd be able to collect enough data that we can build these models and use them in our evaluation. And generating these challenges, by the way, is kind of an acute problem. So we modify tigers to generate random programs. And that in itself is a huge problem, in which I don't have time to talk about today. But essentially, you can generate a random program, and that random program can have random assets in it. And then we pipe that through tigers again, this time officiating with different sequences of transformations, and that's how we generate these challenges. We can have many different challenges of different complexity. And this VM is not up there yet, but there are some challenges, and there are some cash prices and book prices. You can give this to your students as exercises. And the first challenge has been broken pretty quickly by a Google engineer, but the other ones are still up there. All right. There are four points I would like you to take away from this talk today. The first is that meeting security criteria without meeting performance criteria is not a solution in a made scenario. Because with officiation, you can always produce arbitrary levels of protection and arbitrary complicated and arbitrary large and slow programs. That's not a problem. The problem is creating useful level of protection while keeping the performance within bounds. And this is complicated by the fact that we're often working on constrained devices and have severe usability constraints. The second point is similar, but it applies to Bob instead of Alice. Meeting precision criteria without meeting performance criteria is not a solution for anti-malware analysis. Real programs are really, really large and these analysis need to scale. And if you tell me, hey, Christian, my analysis broke your new obfuscation transformation, that's really meaningless without you telling me what the performance is. The third point is that obfuscating transformations are primitives that provide time-limited protection. So we expect all obfuscating, language-based obfuscating transformations to be breakable, but we expect it to be some performance cost to the attacker. And sometimes we can use updateable security to extend that period of time. And I showed you how this is particularly applicable in a client service scenario where it's easy to push new updates to the clients. And the fourth point is that to make progress in this field, we as a community must get together and settle on rigorous evaluation procedures. So evaluation is really a mess. We need to fix this. And maybe someone in this community would have some ideas on what to do. I would love to hear about that. And I showed you how we're gonna try to learn from public challenges. So I'd like to leave you with this table. It's not a very interesting table because it's an empty table. I would like you to fill it out yourselves. There's some exciting developments in the hardware community. Intel is now pushing out secure enclaves into their chips. There's exciting development going on in this community with indistinguishability obfuscation. And how is this going to play out? What's going to happen long-term? Which techniques will prevail? Maybe they will coexist, but in different scenarios. And the question that's most interesting to me is, are there scenarios when actually we wanna combine these three techniques in some way? And how would we do that? I think that's very interesting. And so anyway, it'll be very interesting to see how this will play out long-term. So thank you very much for your attention and I'll leave the floor open to questions. Thank you. Any questions or comments? Aren't you worried about the same problem with challenge problems as you are with students that it's a measurement at a point in time and that two years later, people won't be using the same flows, the same techniques, and your metrics get outdated very quickly? Well, I think what happens is that the tools get better over time. I don't think humans get better over time. We are limited by this. I'm more worried about there's one guy out there who really, really wants the money and breaks all the challenges. And we learn only from him or her. I'm not sure what to do about that. Thanks. You said that indistinguishability obfuscation is prohibitive now performance-wise. I wanted to understand more about your security criteria if performance wasn't an issue. Let's say tomorrow there's no issue. It's very fast. Will indistinguishability obfuscation provide a good enough guarantee for your needs? Well, I think the point of the talk was that performance action does matter, right? I absolutely agree. I just don't understand the security criteria. I have no real answer to that. Only that I have one data point which will not help you, which is one guy in industry, and this is a person who is a cryptographer and knows what he's doing, said, no, it doesn't help us. It's not what we want to do. But I can't help you anymore. He didn't expand. And so, like I said many times in this talk, industry, lots of interesting stuff is actually going on in industry and they don't talk to us. And this is a huge problem, right? So sometimes they throw me, I mean, I know all these guys, sometimes they throw me a morsel of information, but it's often not enough to, in this case, answer your question. Approach, you talked about a million software based. I just wonder if there's any development along this line that maybe employ some secure hardware. Oh, there's lots of stuff. There's minimal secure hardware. And like I said, the most interesting thing is probably these secure enclaves that's coming out from Intel, right? So essentially you can put encrypted code on your CPU. And I'm not sure exactly, I'm not an expert in this. I'm not sure how this will play out, but it's one of the issues is who gets to put encrypted code on your CPU, right? Because the two people or two organizations that would most likely want to, we first in line to put encrypted code on your CPU would be the NSA and malware writers. So probably not everybody will be able to get the keys to do this. And there are also side channel attack issues with these type of, and there are also software hardware type, like used to be that people would put dongles on their software, right? So you could only run your software with a dongle. That's still done in little bit for very high value applications. And then the link between the dongle and the application software has to be heavily, heavily obfuscated because that's where the attacker would, the attacker would not attack the hardware. That's painful. They would attack somewhere in the middle between the application and the hardware. So in your models, the bad guy is the end user, but obfuscation is also used in malware industry. So is there any difference with the techniques used in this malware industry and in the software industry? And who is in advance? Is there any big difference between the two? Well, so VM Protect, for example, was the one that tools I was telling you about. And it's a commercial tool that's targeted at normal software developers to protect their tools. But of course, they have customers in the malware community as well. And so the same tools are used both by the malware writers and by the software developers. The latest, you know, high quality malware that come out from state actors, like Stuxness and those things are pretty well protected. It took quite a while to reverse engineer them. For the mass market, maybe not so much. So you talked about evaluation only from the point of view of running attacks and these threat teams. Are you aware of anybody who has come up with these obfuscation solutions trying to show formally some hardness or some formal argument about why their transformation are non-reversible? Oh, yeah, we do this all the time. Just like you guys, we'll say, oh, here's a hard problem. Let's build, and in our case, it's a hard problem from a static analysis or dynamic analysis point of view. So we'll say, here's a hard problem for static analysis. Let's build an obfuscation that leverages that. Can you give us an example of what you consider a hard problem? Yes, so alias analysis is a problem in static analysis. Essentially it's pointer analysis. The question is, if you have two pointers in your program and could they ever point to the same thing? That problem turns out to be undecidable in general. Wonderful. Let's build some obfuscations based on that. Great idea, but there are two problems, at least. One problem is that, well, the attacker doesn't have to do static analysis. He can just turn to dynamic analysis and then this problem just goes away because alias analysis is hard for static analysis. It's not hard for dynamic analysis. And the second thing is that, even if, yes, this is undecidable in general, how do we create a random instance of this problem that is actually hard? It's like using knapsack for cryptos. It turns out to be a bad idea, right? So I don't think anyone knows how to do that. But yes, we try to build our obfuscations based on known hard problems. But practically getting to an instance that's hard is difficult. And it's always possible, it's maybe always possible for the attacker to go around this and try something completely different. Any further questions? If not, let's thank Christian again. Thank you. I think we have the usual coffee break now and continue at 10.40 with the last parallel sessions.