 My name is Janice. I'm here to talk about how to break obfuscators in a Bicode language, Java. Most of you don't have a lot of respect for that language that includes me. But at the same time it's being used more and more. You still have to cue the slide. Better? Yeah? Okay, a brief outline. A bit of background of what we'll be doing. A basic hello world. What is Bicode? I don't want to spend too much time on this stuff. Okay, so how does the whole thing work? You've got an architecture, something like PowerPC. You've got the host OS below. And then below that you've got the VM. That has a class loader, an execution engine, sun's infamous sandbox, and the Bicode sits pretty much between the class loader and the execution engine. And that's how you run your JAR files, class files, et cetera. The language is full of security mechanisms. The LSD group did some work back in 1999. Not a lot of focus is to be spent on the VM. The sun is producing magnificent new releases of the virtual machine, locking it down as they go along, more and more. And then you have the basic scenario. So what are we talking about? You compile a file, you start with a program, you compile that program. This is really basic, this is why I'm skimming through all this. You obtain the class file, if you strings that, the output has the string file. If you disassemble it again, it's there. Okay, so the question is how do you go... Is that okay? So the question is how do you go from bytecode to source code? So there are two ways you can either go through a decompiler. There are a few popular ones. And that's where ffascation becomes part of the process of making your life hard in starting reverse engineering. So the motivation behind this is trying to make code less readable from a human perspective, more than a machine perspective. What do obfuscators offer? Basic operations, remove debug information, change control flow, encrypt, constant values, inject unnecessary code to confuse you. And also there's the usability element. So where do we see obfuscators? Generally it's Java standard edition applets, client-side applications that have some level of trust. We don't see it in the J2E sphere a lot, even though SAP will disagree. And it's applications that are delivered to the user. In terms of basic obfuscation techniques, you have renaming, extending objects classes, super classes, removing basic debug information, encoding string values, and also splitting loops. Now, in terms of reverse engineering particulars, you obtain an application code version, you extract the class files, and you try and establish quickly what level of obfuscation is in place. This is even being done in terms of scoping for penetration tests. It's a process that helps you identify how much time you need to reverse engineer a particular component or application. Who's using obfuscators? I wanted to bring up a browser for this. So, if we do a basic search for JAR files that involve trade applications, for example, you start getting interesting hits. You would if I typed it correctly. So, you start getting... This application is containing trade secret information, et cetera, et cetera. There is some motivation for the work of de-obfuscating code out there. As we're in Vegas, virtually every online gambling casino has a poker client. Some of them are written in Flash, Java, and they also have a no download client, which typically runs as an applet, or it could be Flash. So, that's a big variety of people using Java client-side security pretty much. Okay, so how do we go about attacking this? What are we trying to do? If we look at what people have done in terms of obfuscation, try and see what they're using as algorithms and methods. We can establish a methodology and build on that. Now, in terms of obfuscating transforms, so how do you change the code that's already there? You can change the data. You can split variables. Int x equals 5 becomes 2 plus 3. You can promote scalars to objects, inheritance, split, merge arrays, and reorder instance variables. You've got layout obfuscation, so scramble, identifiers, formatting, et cetera. You have control flow, if you like, obfuscation, clone methods, reducible loop and extended loop conditions, outline statements. Finally, you've got transforms which are in place for people like you and me to make the process of deobfuscation harder and a lot more difficult. The way to start off this methodology, you have to have an entry point. In terms of establishing one, it's a process of putting together a code, a program, an application that can serve as an identifier so that when you pass it through the obfuscator, you know what the original state is. This is something we've seen in cryptography, plain text attacks, ciphertext attacks, et cetera, and it's not that new. The basic elements that are working to our advantage is that obfuscators these days are not in very beta testing phase. Obfuscation is uniform, so if you've got a GUI component, you know what GUI code is going to look like. The same with IO operations. Design patterns will have a particular structure, so if they're using LIFOS, FIFOS, et cetera, and generally obfuscators don't have a lot of algorithms embedded in them as choice of data of operation. The basic attack is obfuscation is trying to make code harder to interpret. If we transform the notion of a plain text attack to that of having a code base that we knew what it looked like, we'll do that through the obfuscator, and then from that, from a black box perspective on that, we try to obtain and see what methodologies and algorithms that's embedded in terms of the transforms that I've spoke about in the application. Let's talk about encryption in obfuscation. This is really basic. It's mainly XOR operations. It's kind of rude to the cryptor folks to talk about encryption of that level. There's little benefit, or at this stage, the word out there in terms of obfuscation is that there's little benefit in using strong crypto. There's no public-private crypto offered that I've come across, and generally the only main place that you have obfuscation cryptography supposedly being used is when you're trying to hide string constants and values that the user sees, let's say, in the interface. What I've done is put together a fingerprinting tool that you give it your obfuscated code and it tells you who wrote the obfuscation software. The idea is to build a list of particular obfuscators and then have some key operations that act as fingerprints through a calibration check from those fingerprints. You can produce results which tell you what tool is being used. The basic idea is that you feed a very generic class to an obfuscator. This is something we were discussing just now. Most of these tools are commercial and have a very high license, but you can feed one file through and send me the results. I can embed that in Alucidate. What is Alucidate? Alucidate is a Perth script at this stage. It's not really advanced in any way in terms of programming that has all these fingerprinting signatures inside it. You run it against a particular jar that you might have and you check whether or not the obfuscator used can be fingerprinted through what's already there. There's a big overlap and we'll come on to that during the demo. What does this tool offer? The basic idea is I think it's time to bring up a command prompt. Can you guys see that okay? No? Maybe? Bigger. This is going to get a bit messy. You've got a few obfuscators that are supported within this application. The basic idea of Alucidate is take a file as input in a particular directory that you might have and it will tell you that it's founded to have a match against the list of obfuscators it's supporting. If we do this in a bit more verbose manner I'll show you exactly what it's doing. It's picking up particular bytecode operations in a sequence and interpreting potential key values and on that basis it's saying this operation you either have a very silly developer doing repetitive operations of this sort inside a class file or the code has been obfuscated using this technique, this tool. Now, the target deliverables of this little tool is, as I said, provide a jar file identify what obfuscators have been used and recover strings which are within that file supposedly been encrypted. This is really basic version 0.1 stuff. The main development behind it came from trying to understand here's an applet, how much time do you need to spend on it in order to reverse this applet. So you can get an estimate of the complexity that's the long-term goal and also map out particular sections of the code and it's very easy to generate a graph around this idea. Now, if we look at the particular fingerprints that are within particular tools you have, as a class master is just an example and they're using string encryption for example. So if you look at the string values here by code they definitely don't look like normal ASCII printable text. They're using special characters again, very rare that you see those within code written for a particular GUI or IO operation. And you have a particular signature that looks like this. Now, if you do a match on this against the file you have a very high probability of catching an obfuscator. This is what I did before. Now the interesting thing is that you can take this to the next level which is actually get a map of where code is within the obfuscated application but we'll come on to that towards the end. It has various modes of operation and again you can build on each one The fundamental concept behind catching obfuscation is that it has to run. So it has to run, it has to be interpreted in terms of machine code language. If we look at another product, J-Shrink it's doing a similar thing to string encryption. So it's calling an I.I method which obviously has been renamed and then it's passing an integer argument and it's doing it in a static way as well. So straight away you have certain contents within the app without even looking at the bytecode that can tell you what the tool is using. What these guys have done is they've created an invalid GIF file they've dumped all the string literals in there in obviously non-printable text and then they're calling a reference point which is the integer of the array and feeding it through an XOR operation and recovering the text that they want to use. Again this is what their code looks like if you put the effort in reverse engineering it. Basic operations and the interesting thing if you look at the last line in the bottom is that they've broken the I.GIF call so if you string's grep for a GIF anywhere in the source code you'll have very little luck finding it. At the same time it's not that hard to put together that the call to the file is being made somewhere from a static location. Again what I just described and then you have people who don't really look at string encryption and that's good because it makes my task a lot harder I've got to go and look at how the obfuscate for loops, while loops etc. and at the same time there's no deliverable in terms of recovering key operations that the obfuscator does. If you learn how you can crack string encryption you can take that exact same process and see how they've decided to swap around while loops for example because as I've said the obfuscator application is using a very limited and set technique of algorithms to obfuscate. Before we come to the conclusions it's worth showing you what elucidate can do. Again let me bring the magnifier up. Hold on, can we do that? If you take a typical application like a jar file and unzip that or pass it through elucidate then you have a number of a.class, b.class, c.class files. If you open those up in a decompiler to follow some basic reverse engineering calls you see that they're calling a.a method, a.b method etc. It's very frustrating. The basic idea is that if you supply a directory let's say it does it with files, directories that's the easy part. It will pick up if we just see what we're doing here. We're cursing through the directory we're finding a.class file the fingerprint is matching that of Xelix. Well Xelix has a set algorithm in terms of string encryption so we can provide more information back to the user. The keys change dynamically so they're not embedded every time even though the pseudo algorithm they're using I don't think it's very solid. Then if you take the string literals which have been identified within the application obviously they've been encrypted and if we do the XOR operation of the five keys on top of the string literals we get the actual values which you can have inside the application. It fulfills its role in terms of giving you an entry point and a map towards attacking the application. You'll know what part is a GUI and you'll know what part is actually something which might be a bit more custom made a protocol, an algorithm, etc. To finish off we have pretty much obfuscation being at a primitive level today. I think it's something that we're going to see a lot more. It's very frustrating as an idea for people that spend time reverse engineering because it's regarded as a middle tier but there's both Java and .NET out there, especially .NET 3 that use virtual machines to load up the code. An excellent entry point is seeing where the string values are and seeing exactly what they represent and in terms of identifying the crypto used we can see what tools being used and we can see what changes to expect within the obfuscated code. We're starting to get a feel of the algorithms they're using. In terms of providing a problem which I think I'm doing here you can also provide a solution. The solution to this space is it's an idea I think it will have a lot of potential maybe it's early days for this is the proposal to use polymorphic obfuscation it's not just a fancy term it's you need to engage the development team if you want your code not to be easily reverse engineered and have a heavier hurdle of obfuscation and map out the critical elements and then you understand what an obfuscator does and use different elements of the application will be obfuscated in a different way because you'll be feeding different algorithms to it that's where polymorphic comes in as a notion so any user interaction will be treated differently to a protocol implementation in terms of reversing any part of let's say string literals that I just demonstrated and the last thing is very the algorithms a lot more this is something that I think we'll be seeing a lot more at the same time it's an element that we're nowhere near this stage because most of the tools out there are not in the public domain so we can't easily provide information around them that's it are there any questions? you can do that you're making your life more difficult they're really not at the level of doing any advanced operations in terms of deciphering what the application is doing the obfuscation application is doing so I think in the future yes but at this stage no anyone? okay this presentation is a tip of a very large iceberg and yes yes you've got to avoid a couple of checksums and you don't want to trigger the sandbox but you can do real time patching in Java and .NET sure well this is the thing this is exactly a very interesting element you've got obfuscators that break a while loop in a way that a known bug in JVM 1.4 is going to be triggered that will generate an infinite loop so the element in the IELC code being passed through the VM is being used for obfuscation but it's a very rare instance at this stage yes sure you have on more than one instance here's an example let's say you're using the Apache Commons HTTP client and you're deploying a little JAR file that does whatever okay if you obfuscate that then the call to let's say the SSL proxy class negotiates SSL can't be made because that can only be made by reference by name so it breaks the application in real time but you'd never see that well you'll catch it in UAT but you'll never see it during development it's a big debate you can trust the application then do you trust the operating system it opens another avenue the worrying thing is that if you take something elucidate which as I've said is a PEL script there's not a lot of magic in it and do the basic googling that I just demonstrated and run that on JAR files you start getting some results back which are worrying there are a lot of people putting client side trust out there in very weird ways in the name of obfuscation which is a tick in the box an IDE environment you can sign there's an excellent chapter in hacking Java exposed about how you can bypass the class loader that does exactly that so if you combine profiling the application at a basic level with triggering your own class loader then even a signed applet can be because you have the code in your runtime environment can open up and be reverse engineered in a normal kind of way it's a nice analogy to that is that you're signing for a nail bomb but you're still getting the nail bomb delivered to you so you're signing for a part of code that's potentially trustworthy and secure and that's really interesting because it's a recent project of mine that if you take an obfuscated app no one knows what's inside it well, no one really knows what's inside it and if you sign that but if you sign the original part of the code then you can do a lot more so you can tweak you can reverse the model if you can see what I'm saying in there so it can work you your disadvantage adding all these features which are bypassable the fundamental problem is client-side security and one way or another you can influence that and you can have a huge impact okay, let's go through the scenario you've got a certificate the certificate spawns up and says this has been signed by XYZ great, it's using this library this library is obfuscated, great and then at runtime you load a class loader and you patch that library but you must have some level of influence on the client-side environment the user will still see, you haven't triggered a worm you haven't owned the operating system it's basic stuff the user will still trust the application because of the certificate okay, but they'll still trigger whatever I want them to trigger so it's a two-fold game in a way you can take obfuscation and use it to an attack as advantage which is the scary element because no longer do you know afterwards where the library calls match out in terms of your code you can't even do memory profiling in an efficient way okay, I think that's all, thank you very much