 Um, so back in February, basically I wrote a decompiler because everybody was talking about Ethereum. Uh, I wish I'd bought coins instead. I would have made more money out of it. So, just to give you a quick introduction, um, if you don't follow me already on Twitter, this is my ender. I'm also the organizer for conference in Dubai. I didn't come to Vegas since 2011. I took a Vegas break. No, it's okay, I'm fine, coming back again. Um, and my new claim to FAM is to uh, I've been called a fun guy by the uh, shadow brokers. So, so just so you know, uh, I won't be talking about like, what blockchain is, macro trees and uh, all those things. And uh, so we're gonna focus on smart contracts, uh, how to decompile them as uh, Windows reverse engineering. I thought it was interesting to analyze like the uh, VM itself. And uh, I'm also releasing a tool which is gonna be open source. So, I'm gonna give the link at the end. And uh, yeah. And of course like the tool is not perfect so you're more than welcome to contribute and uh, to give like a per request. Just like a short overview of what we're gonna be talking about today. We're gonna be talking about the EVM, uh, its memorandum management. Uh, how we can do like type discovery uh, which is important uh, especially with uh, many like static analysis but also if you're building uh, a decompiler obviously or it can be used for like static and dynamic analysis. Um, and uh, the non type of uh, class of bugs that we know so far. And what to expect in the future. So, how many of you are familiar with uh, solidity? Uh, please raise your hand. So, basically like solidity is uh, the compiler for uh, Ethereum. So, the way Ethereum works is like to execute the smart contracts. It's basically like a software layer on top of the blockchain. So, they use a compiler called solidity which is translating uh, code which is uh, written in a javascript like format into byte code. Um, and porosities the tool I'm gonna be describing today and releasing. Um, so if you're familiar with like chemistry and physics it's the exact opposite of solidity uh, hence uh, the name. So, so far like there is uh, a lot of accounts on Ethereum. So, there's like millions of them. And if you look at the actual number of contract accounts uh, it's almost one million. Probably it's one million by now. But the actual number of verified contract is very small. And the definition of uh, what's a verified contract is very obscure. Um, but basically it comes down to if the source code is provided. Um, so when it comes down to reverse engineering like usually don't care if you have the source or not. But it's interesting that since this software layer had been introduced with Ethereum uh, we see like a need for reverse engineering. Um, and especially since Ethereum introduced that concept of ICO to leverage uh, use cases for smart contracts uh, we've heard a lot of stories since the beginning of the month. Uh, I think here I'm mentioning like two but I think like three different stories happened this month uh, including uh, one with uh, coin dash uh, which is like the, the first one someone changed like the address uh, where to receive the funds. And most recently uh, Parity which is a project um, started by uh, one of the uh, key developer of Ethereum uh, had the vulnerability in one of their uh, smart contract and 30 million got lost. And few days back something uh, happened with another ICO and 10 millions vanished. Um, so that's basically what happened when you're like writing software to store money but don't have proper like security checks. Um, it's more damaging than a blue screen of death. It's like a wallet of death. But uh, so the uh, EVM so the Ethereum virtual machine um, so for like each uh, like you have like three concepts like account uh, contract on blockchain and pretty much like interchangeable. Um, even like personally to me like the, the difference is like quite obscure but I mainly like focus on the actual bytecode which is stored on the actual blockchain. So a smart contract is basically a synonym you know, a more fancy word for bytecode and it's stored inside the blockchain. Um, it uses 160 bits addresses and addresses correspond to an account. And one of the specificity also of the Ethereum virtual machine is that it uses like 256 bits registers. Um, but they don't really have like registers like you would know in traditional uh, infrastructures like with x86 for instance. So they have like this concept of virtual stack. Um, the more you look at it the more you see like it was kind of like they're trying out different things. It's still like the outcome is still pretty good but if you're gonna build like a virtual machine you know, a lot of the things are a bit uh, a bit shaky. Um, so for those who are not familiar with Solidity that's basically what it looks like uh, so on the left that's a simple like uh, coin contract. So it's very simple. Usually you have like few routines. You have like some storage memory. Um, and even like the instructions themselves are like uh, quite straightforward. You do like many like uh, you store like uh, an integer. You do subtract, uh, subtraction, addition and so that's pretty much it. So uh, like the level of complexity of a contract is uh, very far away from the complexity of a kernel driver per se. Um, then you compile it using Solidity and then you get like all that byte code which is gonna end up on the actual uh, blockchain. And at the same time you compile it also saves at the um, interface which is gonna be used for like uh, other like uh, smart contracts to call that specific contract. So, regarding the uh, memory management so you would have like three different type of memory uh, that are like significant. So the first one is the stack that I was mentioning before. Um, like under traditional architecture you would use like the uh, stack to uh, push arguments at least with AS86 not with uh, 64 bit architectures. You would use like the stack to push arguments to a function. So here you push arguments to op codes. And then there is like a limited size to the actual uh, uh, stack which is uh, 1024 elements. And then you would have like two type of storage. So a persistent one uh, which is designed uh, as its name says like to retain data. And another one uh, which is like more volatile that you can identify easily from the instructions. So um, the volatile one is interesting because that's basically what will be used to like store strings. Um, but even if you look at it the way it's done is still a bit dirty but it does the job like I was saying. Like smart contracts are like very uh, uh, simple uh, very simple pair design. So if you do like static or dynamic analysis uh, control flow graph and especially if you write a decompiler one of the most important thing to understand is the actual like control flow of your program. Uh, so the first thing we need to know is basically how to identify basic blocks. Um, so at least in that case with smart contract is so much easier than a traditional architecture. Um, because it is not the shortest concept of like code obfuscation and everything don't really exist yet uh, with smart contracts. Uh, for sure we would like to expect them in the future but it's not like uh, present yet. So in most of cases they have like this instruction called jump desk which is uh, indicating like the beginning of a new block. Um, in most of cases I would say like 85% of them uh, you would have this instruction at the beginning of each block and then you have a bunch of different instructions uh, for like conditional jumps or traditional jumps. Um, the difference with traditional uh, architecture is like here on x86 for instance uh, you would have your obcode and then the destination right in the same um, in the uh, same obcode. Um, whereas like here the first push it on the stack and then execute the obcode uh, for that. Um, but the main difference is sometimes it's just gonna push it at the beginning of the function, do a bunch of instructions so you would totally like forget which uh, was like the destination address. So that's why you would need like to write a pseudo uh, debugger that you would have to emulate most of the instruction to keep track of all the uh, destinations. And uh, it's basically like one of the main limitation of static analysis with smart contracts. Otherwise like most of it could be like done uh, statically but because of like those like weird scenarios where like the uh, destination is stored uh, before and uh, you would have to like emulate like certain basic blocks in order to like keep uh, the uh, destination of uh, the basic block. Um, for like stack manipulations there is basically like uh, few instructions so you have like uh, duplicate, swap and uh, pop and push. Uh, it's pretty straightforward. Um, when it comes to the actual obcodes of the uh, EVM you can have like different um, categories. So obviously like the main one is for like arithmetic functions. Um, it's mainly designed to like deal with money to store, to create like wireless for transactions which makes sense right? And then you have like the uh, block environment and the environmental uh, informations where you would have information about the sender, the person receiving it, and then you have all the memory uh, related operation plus some logging uh, operations to keep track of certain events. So like I was saying before the uh, main thing here uh, so that's an example where basically like obcodes are more like functions because you need to push the argument on the virtual stack uh, so here uh, for an addition you would push like uh, the two variables that you want to add and then you would retrieve the argument uh, not in a register but in the first item of the actual stack. Um, so that would come like that if you would write it in a EVM pseudo code. Um, then you have like uh, EVM calls. So those ones are like pretty interesting. They allow you to um, call like a different contract. Um, so you have multiple type of calls. So you have like the regular call but also like delegate call which is what had been uh, abused like recently in the uh, priority um, contract. Um, which is also interesting from the perspective that basically you call like third party libraries that you don't necessarily own. Uh, so that's also an interesting context. Uh, so it leaves a lot of um, opportunities for like undefined behavior and when it comes down to like uh, static analysis, dynamic analysis or even like trying to define your scope uh, what that creates a lot of issues. Um, there are like four exceptions for like outcoded contracts so including for uh, shatoo identity function and uh, for key recovery functions. So like the uh, contract addresses are like one, two, three and four. So whenever you look at the actual bytecode you would uh, notice like those static addresses. Um, so when it comes down to user defined functions that are exportable by default by each smart contract you would easily recognize how many parameters they have based on the uh, call data load instruction which is basically reading the uh, uh, environment uh, information block which is basically um, like, like a buffer that contains all the impute uh, parameter including the uh, hash of the function we want to execute. Um, so the structure of that uh, uh, block is pretty straightforward like the first four bytes would be the hash method of the actual uh, function uh, which we're gonna describe later and then it would be followed by the actual arguments. So if you look at the actual like uh, pseudocode here so basically like A and B are being recovered and the first parameter is like the actual offset inside that block. Um, and then for the addition um, that's what you would get. So that function is like very simple uh, so it's an addition right? Um, but that's basically what it would look like in uh, pseudo uh, EVM uh, code. So when it comes down to type discovery um, the main type you would see and the main type you want to recognize uh, our addresses. So if this is like the address of the uh, sender, the destination of the wallet or of another contract um, so it would be like uncoded on 160 bits right? And most of the time, every time you need uh, something which is not on 256 bits right, you would see a hand operation. So in most of cases you would see like, in some cases you're gonna see it like outcoded, but in most of cases you're gonna see like some uh, EVM assembly like uh, optimization like uh, the following one where it's using like um, uh, it's computing like the mask dynamically. Um, so there's like a few of them that we can uh, recognize like very easily. And again like uh, if we do like type discovery while emulating the code, we would actually uh, we would even be actually able to just like check the mask uh, associated to the instruction. Um, so now uh, that we have seen all like the actual EVM is kind of working uh, let's talk about the bytecode now. So you're gonna have like two different categories. So you're gonna have like the preloader code and which is gonna be in charge of co-paying the actual smart contract where like all the interesting stuff is uh, inside the executable memory. And then, which is uh, basically like the run time code of the contract. And then uh, so the run time code which is basically what we want to analyze contains all the information that we want to spend the time on. So it would contain uh, the whole class, the whole contract, so each function. And which is basically like the uh, what had been like produced by the uh, Solidity compiler. Um, so this is what the actual like preloader looks like. So there is an instruction called code copy which is basically in charge of like taking the actual bytecode of the contract that we need to put it inside the executable memory. So we can uh, execute it after at the offset uh, zero. And once we enter inside the actual uh, smart contract, there is like a dispatcher which is in charge of uh, splitting uh, all the different functions. So the way it works is basically like a giant uh, switch instruction. So it would first like recover the uh, hash method from the uh, code data load instruction. And from that uh, so here you can see even like the uh, code optimization where basically it's just first reading like a 256 bit uh, register. And then from that it would like uh, apply a mask to only extract like the uh, first four bytes. So that's basically like the hash method. And then you enter in a switch uh, statement which on each switch statement is corresponding to an actual function. And in some cases you now also have uh, a full back function so for each uh, so if there is an unknown method which is not recognized by the smart contract it would just execute a method uh, by default. And in some cases um, like the, in the case of the parietic contract which is, which is what we're gonna see after also um, it redirects like a call blindly to another contract. Um, well that's not the kind of things you would see like your channel doing you know. People would start to freak out to be honest. But that's something that seemed normal for people writing smart contract. Um, some of those things to be honest are still obscure. Like uh, I don't really understand why you would have a full back function so I mean understand why they did it because uh, like this uh, thing where they want contracts to be like backward compatible and forward compatible. But the source of so much problem like by uh, design if you think of security does not make much sense to be honest. Um, so function ashes uh, the way they are like computed basically they just take like the function name and the parameter of uh, each argument and they just like stick them together and compute like the uh, shape free uh, of that uh, input and the result of the first four bytes would basically be uh, the hash method. So it's pretty straightforward. Um, so if you have like the uh, ABI so the actual interface of the contract uh, you can easily like recompute it. Uh, but if you don't form the actual like switch function you can just like extract the actual like hash method um, from the uh, run time codes and you can create like um, um, like a name on the fly like you would do like uh, we've idled where you just give like uh, a sub function with the actual offset of the function when you don't have symbols. So like the ABI uh, json file is like the equivalent of symbols for smart contracts. Um, so here's like the instruction I was mentioning where I basically extract the four bytes. Um, and then uh, that's like the uh, pseudo code for it. So here is like a comparison. If you purely do like static control flow uh, like reconstruction or if you try to emulate so as you can see in some cases uh, you really need to emulate the code to keep track of all the uh, actual like destination um, and pointers. Um, so that would be like a simple contract where you would have like two functions uh, which is here. So once you can like uh, start to analyze the actual like run time code from that uh, like I was saying it's basically like a giant switch but we know that each uh, case of the switch is basically uh, a function for it. Um, and then once we uh, decompile it we would get something closer to what's on the uh, the right of the screen. Um, to go a bit more into the detail of the uh, run time code. So here for instance like the uh, double function so in yellow we have the actual hash of that function. So it's gonna jump to the offset like 24 uh, which is marked with the jump desk uh, instruction. And then it's pushing uh, the uh, argument two. Then we arrive in a new block. So here in that case uh, there is like a jump desk but it's not uh, a new basic block but it's used by another function. It's a shared basic block. And then you're just gonna do like the multiplication. And same thing with the uh, triple. We're gonna see it's gonna read um, same thing again like the input parameter, push like a three. And then execute like the uh, multiple instruction. So if we go back to the initial like source code of it um, that's basically what it was doing. Um, it was pretty straightforward. So if you see like smart contracts that are way uh, well I would say like way more complex than that but were like more complex than this. But that's basically to illustrate uh, how easy you can decompile the actual code. Um, if we look at the uh, bug I was mentioning before so this is uh, the parity bugs that happened like a week ago. Um, remember when I was uh, talking like before there was like different type of code. You have like a call, a delegate call. Uh, they allow you to call like a third party contract. Um, and then in some cases you have like um, a full back function that allows you to execute a contra- uh, to execute like code if like a method is unknown. So uh, here like the address uh, in the constructor like the address was outcoded. And um, then you know it's in green it's computing the actual hash method like dynamically and delegate code is going to execute that specific function. And then for some random reason you add uh, like a full back function. Uh, that was basically like allowing you to call like any function inside the wallet library and to pass any function or parameter you want. So that's why I was saying like some of concept are like really obscure that was basically like the uh, actual uh, reason for the vulnerability. So obviously like not looking at it it's obvious but that's a new type of bug that have been uh, discovered by the attacker that's a pretty uh, good find. But not like once you know that type of bug it's pretty uh, it's pretty obvious. Um, so like those full back functions. So if you have like a switch with uh, executing code with no actual like hash. So that's what your full back function would be. Uh, so I mean it's, then it's like a design uh issue right? So the main reason for that is while keeping in mind so it's adding like a software layer to uh, the blockchain right? Uh, but it also means that if there's like a security bug in it, well you cannot patch blockchain right? That's the main thing about it. It's like retaining data and moving it around. Uh, that was the main reason because uh, from reading about it sort of like backward and forward compatibility because of this lack of capability to apply patches. Um, to be honest it does not make sense. I think it's stupid but uh, whatever. Uh, it's just not how you can design a language which is very viable because you have like too many unknowns if you start calling like third party libraries, you don't even know what's gonna be called. You have to predict uh, all contracts and all uh, future contracts. And like imagine if your kernel would be doing the actual same thing you know like uh, that would be nuts. You would start to see people rioting in the streets you know? Wow so. Um, so the actual way that bug was fixed is basically like uh, some of the function were designed to be private function and so when it was able to uh, call the library again directly any function the actual constructor uh, or they could even like recall the actual constructor because it would not even check if it was initialized or not. So those are like the type of bug you would see like with smart contracts. It's very far away from like the classic old bugs you would see with like buffer overflow and everything. Um, here is another example of like uh, the vulnerable contract so that's similar to what the AO uh, was using. Uh, so here like the vulnerability was basically here. Um, it's similar to a rest condition so basically the same thing uh, it comes down to like a fallback function uh, being reused so that we create a rest condition where like the balance will not be initialized on time. Um, so for that type of vulnerability the good thing is because there's not many instructions you can like tag each basic block to see what they're doing and every time there is uh, a call with an external contract uh, you would track it uh, either as a warning or as an error. So in that case uh, we could see that the uh, SSTOW instruction was being used after so it would be like easy to analyze. Um, can show you a quick demo. That's uh, that's uh, the actual like uh, smart contract itself uh, so, so to call the tool basically you can just like provide the uh, you only probably need to provide like the actual like bytecode uh, if you have like the symbol so the ABI uh, Gison file you can just pass it. And uh, then you just like run the tool to say. And uh, and once you like give it as an impute to the tool you can easily re-strength a reconstruct something uh, very close to uh, the actual uh, source code so. And uh, if you have like some features because to build like a decompiler uh, you basically build everything you need for like dynamic analysis. And uh, also static analysis so you can easily easily um, oh I didn't see the subtitles over there, that's cool. Um, yeah you can easily like use it to track potential vulnerabilities just like you would have with uh, most of compilers now when you have like prefaced or prefixed with Visual Studio uh, you have a lot of like static analysis tools that can be used um, whenever you are writing code right. So now like if you look at the actual like smart contract community uh, is that still like building all those tools? It's something very new so a lot of the tools that we would find like uh, pretty obvious with GCC or Visual Studio compiler are not present for uh, those type of software. Uh, so because like the whole concept was like to introduce a software layer to it but it comes without a lot of testing tool uh, which would be required for enterprise softwares. Um, so so far there are like uh, few, few uh, class softwares that have been detected. So the first one was like the REST condition used for like DEO uh, then called stack vulnerability um, there's some good papers about it where like I was saying before like the virtual stack itself is limited and once you use all of it um, well it's not even returning an exception and for a while there are like some issues with like throwing S-exceptions uh, reverting like the state of the contract as well like concepts that are like uh, very new that have been introduced like uh, recently uh, time dependency vulnerability while some of the actual like uh, instruction give you like some time information but they're related to a block um, so you can easily like guess like the uh, future output and delegate code will be what happened with uh, the uh, parity contract. Um, so there is uh, a fork of Ethereum called Corum uh, created by GP Morgan which is uh, pretty interesting because the main reason uh, people were uh, a bit worried also with uh, Ethereum is basically uh, if you're an enterprise you cannot just have everything like return like transparently so they introduced this uh, privacy layer uh, and permissions uh, to uh, smart contracts which is pretty cool um, like the Corum team is here so I don't know if you guys want to stand up, I don't know if people are gonna say but uh, uh that's, that's a pretty cool project uh, we see a lot of stuff happening uh, around like Ethereum which is pretty cool uh, and uh, this week like uh, Corum just released um uh, a bundle to uh, integrate like porosity uh, to check like nodes inside the actual network so that's uh, that's pretty cool too and like I was saying the main thing which is a bit worrying and what we hear like so many like uh, stories about the uh, ICOs hack and for sure why we're gonna see uh, more of those stories like over the summer is mainly because there is no proper like testing tools for that new like software layer which is pretty nuts you know uh, you, you guys have seen how long it took for like traditional software to get like proper security tools um, to be tested even like uh, SDL stuff like uh, just type of like high level uh framework um, while we've smart contracts it's pretty much ground zero so I've never heard of it in most of cases that's why it's like a need for like such tools um, so here is like some screenshot of the uh, Corum and porosity uh, integration so then it can be integrated in the actual workflow um, which is uh, pretty good at least to uh, think of integrating uh, tools like that uh, in the actual workflow. It was very fast for the uh, Corum team to uh, do the integration um, they have a pretty good framework so they can uh, actually like add uh, more and more tools like very quickly which is pretty good and in my opinion it's like a requirement if people are gonna start really using like smart contracts seriously especially to store money uh, again like if you write a smart contract most of the time you use it to store money right, not to browse uh, YouTube and watch cat videos uh, I mean yeah, you know like if you find like a zero day in like uh, in a web browser then you would have to struggle to find where to sell it but if you find a bug in a smart contract you just like take the bank you know it's like uh, being a Lazarus uh, group but for cryptocurrency uh, so like I was saying for sure we would select more and more uh, tests uh, testing tools and like we can definitely expect like by the end of the year even more like issues with like ICOs act uh, since like every, like it's like a new thing everybody wants to like raise money with ICOs um, we're getting the tools so there's like some improvement uh, required for like uh, uh for a lot of the uh, conditional statements and uh when it comes to uh, Ethereum and security uh, like I was saying is like there is a fast growing community especially like now the main incentive is well either you want to steal money or you want to protect your money right so it's pretty straightforward in term of uh, motives if you want to get into like smart contract security uh, and like I was saying initially when I looked at it I was like oh why is everybody talking about blockchain it sounds like really boring um, then I saw there were like some like virtual machine around I was like oh maybe there's some interesting thing to do and, and for those who are like familiar with uh, virtual machine uh, vulnerabilities uh, well QMU has a lot of them but then you have like uh, you know like talks like Cloudburst that happened like many years ago uh, I think it was like at 2010 or something um, where basically uh, you would be able to do a VMScape and now like VMScape are becoming more and more common even like uh, Microsoft now we have a hyper view vulnerabilities erasing the bug bounty uh, well you can be sure if you have um, your own virtual machine you can also expect bugs in it right uh, and then like the whole thing of claiming it sandbox does not really apply um, so the question now is, is Ethereum gonna stay alive and if their virtual machine is gonna be the main virtual machine or if we're gonna select more provider for smart contracts with their own virtual machines uh, I was looking at the roadmap for next year and I saw there are your planning to use like WebAssembly um, I had no idea what was WebAssembly and then I looked into it and basically uh, so that's so it's being described it's uh, portable uh, load time efficiency format so you have like your own by code uh, that can be executed uh, by most of the uh, JavaScript engine like uh, V8 or SpiderMonkey uh, and so they're planning to use the same engine that we would send web browser to a smart contract um, I don't know if it's a good or bad idea I guess from the fact that it's also gonna be used by all the platforms it would also benefit of uh, the auditing for that specific uh, language. Uh, and in terms of performance I've seen like uh, some cool stuff like if you look at the uh, demos online the guys are like running like almost like video games with it, they can even like compile one, I don't know if it's a good thing or not, but uh, you can compile like C++ code into like web assembly uh, I mean from an attack surface you know it seems a bit confusing you go from like having a VM with a specific set of instruction to like being able to compile C++ uh, so I don't know to which extent it's gonna be used in the uh, Ethereum VM by next year uh, but for sure we would see like stuff uh, leveraging that um, so there are some people I wanted uh, to think uh, who helped me like doing uh, the paper so uh, including like the DEF CON uh, review board because initially I was just sending a decompiler I was like uh, but can we use it for security? I was like, well obviously it's a decompiler so you can do anything. So they kind of pushed me to do like uh, the security analysis for it. Um, so if you want to download like the slides and the actual white paper which is uh, more complete and uh, and in case you didn't understand my French accent which I can understand but you know, what can I say? I'm French you know, not gonna apologize but uh, you can download like the uh, actual like tool at this address and uh, yeah if you have any question you can either drop me an email or we have uh, three minutes uh, now so I don't know it works for like Q and A here but uh, yeah if you have any question uh, let me know.