 Thank you everyone. Just a quick question, how many of you come from the IT world, IT security world? Okay, so some of you already know what is reversing and what is useful, so I will try to make it simple for the other, and the purpose of today is to understand what's behind the EVM bad code, typically the bad code that will be stored on the blockchain and to help you to understand, oh, like famous security tool works, actually. So, I'm Patrick, I'm a security researcher, and typically my relation with blockchain come because I was doing like blockchain transaction tracking and stuff like that, and for that reason I was, I needed to analyze bad code like self-destruct bad code and transaction, etc. So, that's why I start actually my project that is called Octopus. So, it's a security analysis framework, typically I support EVM, of course, but I also support some other like platform like EOS and Neo. So, if you want to take a look at the bad code of the smart contract of Neo and EOS, it could be a good occasion, and let's start. So, I will do a quick introduction, and after that I will talk to you about the control flow graph and why we need to use reversing at some point, and as a developer point of view, why you can need to use reversing. So, typically reverse engineering will be really simple for smart contract. The purpose is you have the EVM bad code and you want to translate it to EVM assembly, and the reversing of smart contract typically will be the analysis of this EVM assembly. So, if you take a look at the bad code, you can have like, I mean, two type of EVM bad code. The first one will be like this big one with a loader code. It's actually the loader code that will push the runtime code into the blockchain. Typically, this shame will be present if you take a look at the contract creation transaction and if you look at the input data. So, in that case, you will get the loader code, and we are more interested about the runtime code that will be the final code of your smart contract. So, this code will run into the EVM. I will not spend too much time on that. And so on. So, first step, we need to disassemble the bad code. So, pretty simple. Maybe some of you already saw the opcode tool from Etter scan, and it's typically what it does. It's like a disassembler, basic disassembler. And the way it works, the disassembling in general, you will need to have like a correlation between the opcode and the EVM instruction. So, it's typically you just take the EVM instruction set, and you do the matching, and that's work. The more tricky part will be about the control flow graph. So, for those who don't like use tool like binary ninja, IDA, et cetera, typically, a control flow graph is like a graphical representation of your program logic. So, typically, if you have an if, else, et cetera, you can directly identify it based on the representation, graphical representation. And most of the time, you will need to actually, in every case, you will need to create block or nodes and edges. So, connection between those blocks. So, for that, you have some instructions that are really critical and that help you to recreate the CFG. The two first ones are the jumps, of course. So, it will help you to create the edges. And the other one will help you to do the decomposition of your inline EVM assembly into basic blocks. So, to simplify, you will look like that before, after. And the next step is to do the connection between those basic blocks. So, if we take like a basic idea that could be, I want to do static analysis, and I will just check if the instruction before jump is a push instruction that will push the jump target offset into the stack. It could work. Actually, it will work for potentially 50, 60% of the time. But at some point, you will get some issue. And typically, it will come when you have like a stack operation. So, typically, in these cases, you have one basic block, you have no push on it, okay? And you have a jump. So, if you don't know the value on top of the stack, you are not able to determine which will be the jump target offset. So, there is multiple techniques to simplify that and like help us to recreate the CFG. The first one that I used is you do like a dynamic analysis with a stack evaluation. So, before you have this graph, okay, with some orphan block, typically. So, those blocks have not been like cold during my like emulation of the graph. But if I do just static analysis, like there is a push to and a jump, those basic blocks will not be assigned to the graph. And in case of stack evaluation, it's largely better because in these cases, I'm able to place this block to another graph, another pass into the graph. Also, the graph looks largely better. So, once you have done that, actually, a good thing to do is to identify which basic block assign all the basic block to the function. So, if you take back this graph, you will see that there is a typical pattern for a switch. And actually, this switch in EVM bad code is the dispatcher function. So, typically, the way it works, the dispatcher function will be the entry point of the smart contract. And depending of the first four bytes of the transaction payloads, that is called the method ID, you will go to one pass or another. So, in this example, typically, you will get two different method ID. So, you can identify them pretty quickly with this typical pattern. So, you have a push for of the function hash or method ID. We check if it's the same values and the first four bytes of the transaction. And if it's my bad, and if it's okay, you will do a push too. So, of the function offset and you will jump to it. Okay. So, that's the dispatcher function. And using that, you are able to determine and find all the basic block associated to this function. So, in this example, you will get like seven different functions that is collable. And you are able to do so because you have doing analysis directly on the dispatcher function. Having the function is really cool. But it's even better if we have the name of the function. So, typically, the name is not stored into the EVM bad code. The way you can recover it is by using the four byte identifier. So, the method ID. The fact is, the method ID are compute based on the function name text. So, if you, typically, if you have a transaction with the method ID, typically, okay. So, typically, you will get this transaction. You have this hash, this method ID. Okay. And if you use like a function signature reverse lookup database, you will get the matching between the signature and the function name text. So, the most famous one is four bytes, the directory that works perfectly. And there is a lot of inputs on it. So, really good to use. And you will see that depending on the text signature, you will get the byte signature. So, using that, you are able to recover the name. So, in this example, it's actually the greater smart contract. So, you get the grid and kill function. But you can also use this technique, this recovering technique, in order to get an information about the arguments of the function. So, typically, in this example, you have distributed tokens. And in the first case, you have like an array of unsigned intaker. And in the second case, you have just an unsigned intaker. And you can see that the signature are different. Quick question of that, you can have a collision for the function signature. So, in these cases, the best way to find out which function signature is the good one, it's to check if you have like cold data load in the function that will mean that you have arguments. And you can pretty well define how many arguments and which type of arguments using this way. So, why using reversing? So, there is actually multiple way and multiple reason. I have list like four of them. There is many other, but it's just to give you an idea and give you some like appetite about this subject. So, if you are a user, typically a user of the function, you potentially want to do some reversing if the source code is not available. So, the reason for that is when you create a smart contract, the source code is not mandatory. So, you can create a smart contract just by sending the byte code. And it's typically what cryptocities have done. So, in cryptocities, you have like four solidity source code. And in the cryptocities core, you have a call to the GenScience contract and more specifically the MixGenS function. And actually, this solidity source code is not available on the GitHub. There is only the three other ones. And the reason for that is because it's all the mutation of the gen of the kitty. So, they don't want it to be public and they don't want it to be analyzed at some point. But the community start to take a look at that. So, the first step they do, they start to do like diffing between Genome DNA and some people and actually no IT security people start to do the reverse and analyze of this smart contract. So, there is some blog post, really interesting about that. And they also like write an equivalent of the logic into Python or in other language. So, you can directly try it using command line tool. So, it's really, really cool and typical application of that. Another reason, as a company, of course, you have security audit, but I will talk about that later. Another reason could be bytecode optimization. How many of you think that the EVM bytecode is optimized? Okay, so good. I don't know if some of you seen this tweet from Ryan, from Trello's bits. And actually, you find out that at some point in the blockchain, a lot of smart contracts use exponentiation to calculate the value one. So, the fact is exponentiation costs 10 gas. So, that means a lot of people spend gas for nothing. And even more, if the argument of the exponentiation instruction are constant, you can directly simplify it by a push, like really simple. And if one of the arguments are zero or one, you can calculate directly the value at compilation or directly modified by the runtime variable. So, it's more specific cases, but it works pretty well. And Martin from the Etram Foundation do some tests about this supposition and reproduce this on 16 random blocks. And if I note that 73% of all the exponentiation invocation could be simplified and could be optimized. So, it's not, it starts to be something, actually. I don't know if there is some modification directly on the Solidity compiler to fix that. Or potentially, if you use the optimized flag, do we change something? I don't know. Don't take the time to check it out, but definitely, we potentially need to say to the Solidity gigs that there is some optimization to do on this part. Another reason that is more famous is typically security audit, bug hunting, vulnerability research. And the fact is, I don't know if some of you use one of these tools. I'm pretty sure some of you does. And typically, these tools use directly the EVM bytecode in order to do their analysis. So, the reason for that, and most of them use pattern matching detection for vulnerability. But the reason for that is the EVM bytecode will be what is stored on the blockchain. And for the moment, I don't remember if there is some like vulnerability related directly to Solidity. I mean directly to like Solidity implement bug into the code. But potentially, it could be detected in this way. And the last reason that is for me, why I start looking at smart contract is for everything related to transaction tracking analysis of smart contract interaction. From like a smart intelligence point of view, like suit actor, like malware author, et cetera, that will use smart contract in order to like move their money without being detected, at least. So, a typical case is also the techniques that have been used by Jay from Trevvich just before me, is to do some post-mortem smart contract analysis. So, if you have a smart contract that is destroyed, you go on an ethos scan and you will see that. So, at this point, you are disappointed. But the fact is, as I told you, the smart contract bytecode and the runtime code is available directly into the smart contract creation input. So, if you just take a look at this transaction, you will get the loader code and the runtime code. You just need to cut the loader code and it's okay. You get your smart contract bytecode back. So, that's all for me. I hope you have learned something and I give you some appetite to learn about EVM bytecode. I encourage you deeply to participate into the creation and the community behind do stool that are really good. And also, if you want to see some other of my talk, like more specifically about reversing in general. But I also don't know talk recently about the analysis of the implementation of the L.O. World from parity technology. So, potentially some of you, if you are interested in EVM, you could get some interest. And I have a workshop recently about, like, creating patterns to detect some type of vulnerability. So, everything is available directly on the link. Thank you.