 Let's get started. Thank you so much for attending, and thank you so much to the organizers for hosting this amazing event. So my name is Mark Mossberg. I'm an engineer at Trail of Bits, and today I'll be presenting some joint work done with my colleagues Flipe, Manzano, and Jan Ivnitski on porting binary analysis techniques over to an alternate execution environment, namely the blockchain. So we'll begin with just what is the blockchain? So from Wikipedia, we have this definition. It is a decentralized, distributed, and public digital ledger that is used to record transactions. And so I've highlighted what I think are some of the more important parts of this definition, just the fact that it's a digital ledger, and it can be used to record transactions and enable transaction between parties. Blockchains have a number of very useful properties, and so they're resilient because they're distributed systems. You can drop nodes off the network, and the system is designed to function like that. They're verifiable, and anyone can recompute the blockchain and verify what's on there. They're transparent, and the data is public, and they're also immutable. And so that's just about all I'll say about blockchains. I'm not only here to extol the virtues of using blockchains, and so we'll proceed into one particular blockchain of interest. It's called Ethereum. So Ethereum is this very interesting blockchain-based, decentralized computation platform. And so it also has a cryptocurrency and a lot of other things, but there's a lot of angles we can look at Ethereum. Today, we're going to really look at it as a computation platform. Some other facts about Ethereum. It's the second largest cryptocurrency by valuation and has this ridiculous peak market cap earlier this year of over $100 billion. And it's also a smart contract framework, and that's what we're going to focus on today. Smart contracts are applications. This is from the Ethereum website. They're applications that run as programmed without downtime, censorship, fraud, or third-party interference. And these are all properties that these applications inherit from being on the blockchain. You can basically think of them as the application layer of this execution environment. Using Ethereum smart contracts, you can implement many types of applications for useful tasks, like managing assets, conducting voting operations, auctions, crowdfunding, making your own currency. But just like any other programs written by humans, they can, of course, have bugs. And so I just want to point out that it's very unique to Ethereum how much money you can lose from bugs in smart contracts. And so I actually won't focus on that too much in this talk. There's another great talk tomorrow that I'll dive more deeply into this. But the main takeaway is that it's really important to have analysis tooling for these smart contracts, because we want to enable these really useful applications built on the blockchain. But if we keep having these programs getting massively hacked, losing tens of millions of dollars, that will never occur. And so we really need better analysis tooling. And that's why we're here today. And so that brings us to symbolic execution. It was mentioned a little bit earlier in an earlier talk. And so we'll dive into it a little more deeply in this one. It's a very powerful program analysis technique that over the past decade has been really proven in terms of its usefulness for the fields of software security and testing. And so our research question was to see if we could port this over to Ethereum and make it useful. So we have a very simple agenda today. First, we'll review what symbolic execution is. Sorry, we'll review what symbolic execution is. Then we'll dive deeply into the internals of the Ethereum virtual machine and computation platform. And lastly, we'll talk about what it's like to combine the two and what we can do with those abilities. So let's begin symbolic execution. So the story with symbolic execution begins in the 70s, actually, with this paper by James King, which coined the term symbolic execution and program testing. The field was very dormant for the next 40 years. And it was only in the last 10 years have we seen real resurgence in research in this field with the number of really breakthrough papers. Called out a few here. We have the Klee paper in 2008. Bap from Carnegie Mellon, Sage from Microsoft, Mayhem. And this kind of leads up to the DARPA cyber grand challenge where symbolic execution was really heavily used by nearly every team for automatic bug finding and exploit generation. But today we're gonna focus mostly on the research presented in the Klee paper, which is kind of coined classical symbolic execution. And so let's begin. And so it's useful to contrast symbolic execution with concrete execution. And so everyone knows how this works. At every point in the pseudocode program, every variable has a concrete value. And so when the program asks for input, the user provides input and maybe they enter it in 42. The program executes accordingly, so A will get 43 and B will get zero. One really important thing changes in symbolic execution is that variables are no longer required to have specific values. They can actually have ranges of values. And so if we execute this code under a symbolic execution model, the X input variable is not going to be a concrete variable anymore. We'll actually replace it with an input symbol. And this symbol will represent any integer. So then the program will continue executing accordingly. A depends on X, so A will also be symbolic, and B is a concrete variable and so it'll remain concrete. And so here we have this distinction between symbolic and concrete variables. So things really start to get interesting when we consider control flow in symbolic execution. And so in this program, we have some code on the left that gets input and then makes a decision. It checks if the variable is 42 and does something, otherwise it does something else. And so in symbolic execution, number of the var variable does not have an actual value. It could be any integer. And so what happens when we're executing this program? It looks like execution could actually go both ways. And in practice, that's actually what happens with symbolic execution. So the engine that's running the analysis will make a copy of the program state. In one path, one state, it'll explore one. In the other state, it'll explore the other. In order to do this, it needs to keep track of something called the path constraints. These are the constraints on the variables that must be true in order for the state to be executed. And so in the do something state, the var variable transitions from an unconstrained symbolic integer to a constrained symbolic integer that must be equal to 42. Conversely, in the other state, the symbol is constrained to be anything, except 42. So this is a little bit more of a complex slide, but it serves to visualize what happens when symbolic execution happens on the code. And so on the left, we have some code. And on the right, we have what's called the symbolic execution tree. So this program on the left has a number of input variables, A, B, and C. It has some concrete integers. And then it has some control flow. And so there's branching based on the symbolic variables and some assignments. At the bottom, there's this assertion. And so our goal is to check, is it possible to ever violate this assertion? And we can do this using symbolic execution. So on the right, we can see what happens when we symbolically execute the code on the left. We have these, you can see visually the state forking that happens at these points in the program. And we can see that the analysis will fully enumerate the state space of the program. We can see that in one particular path, this one right here, it shows that it's actually possible to violate the assertion. So if A is false, B is less than five, and C is true, we end up in a state of always the assertion. Furthermore, in every possible state we find, we also collect the constraints. And these are very useful to have around because then we can use things like constraint solvers to reason about them and ask questions about the program. We can also do things like generate inputs. And so the way constraint solvers work from the outside is very simple. You just express your query as a set of constraints and it will either return satisfiable or unsatisfiable. And if it was satisfiable, it'll also return a proof. And this proof is very useful because in the context of program analysis is basically the input for the program to drive it down to state. If the constraints were not satisfiable, the solver just tells us so and that's it. So of course, not everything is super easy with symbolic execution in there. A number of challenges, I've selected three of the main relevant ones for us today. And so path explosion was mentioned earlier. You might have the intuition that programs are huge, right? You can get into infinite numbers of states. And so how could you possibly manage an infinite number of states? And that's really true. In order to make symbolic execution effective, you only need to have strategies to prioritize the search, the state space to find the bugs you want. And so we'll encounter this. We also encounter this in Ethereum to a certain extent. I'll talk about that later. Symbolic memory indexing is another kind of unique problem. And so consider this snippet of assembly code. We're dereferencing a register. Say rex is actually a symbolic variable though. And so now we have to dereference this symbolic pointer and load it into another register. Well, that's kind of a weird situation. Where do we actually dereference that pointer? It could point to a number of different locations. So how do we actually continue beyond this point? This is an accepted, this is a big problem in symbolic execution. There are a number of strategies that exist for handling it. One might be to just concretize the REX register into one concrete variable. But of course this will compromise your analysis and limit the number of states you can search. Another problem is loops. And so infinite loops and symbolic loops are also very challenging. Because if you consider this code, which gets an input and then uses it as a loop bound under symbolic execution, that var variable doesn't have a real value. And so the termination condition for the loop never really gets met. And so a naive engine will just kind of loop forever in this kind of case. And so there's a lot of challenges with symbolic execution and we run into a number of them also with Ethereum. So this kind of finishes our review, brief whirlwind tour of symbolic execution. The main takeaways are that using it we can test many paths in the program. We can do this without actually knowing that much about the program ahead of time. And so the analysis will systematically explore the different branches in the code, negate them and generate inputs for them. Also using the constraints that we gather, we can prove properties about programs and reason about them using constraint solvers. This is very useful. And so now that brings us to Ethereum internals. And so like I mentioned before, Ethereum is this decentralized computation platform. It has this cryptocurrency aspect to it also, but most interestingly for us it includes a virtual machine. And so today in our discussion of Ethereum we'll talk about a number of different things. We'll talk about smart contracts. We'll talk about transactions on the Ethereum network. We'll go in detail on the Ethereum virtual machine and talk about other things like the Ethereum application binary interface and the bytecode format used for smart contracts. We'll start with smart contracts themselves. And so in Ethereum there's two kinds of entities on the network. There are external accounts and contract accounts. External accounts are simple. They're controlled by humans and basically just have an account balance that tracks how much ether they have. Using external accounts to humans can transact with each other and send each other ether. Contract accounts are controlled by code. And so they also have a balance, but then they have some code that will execute whenever they receive a transaction. And so these are very interesting and this is what we're talking about when we talk smart contracts. We're talking code that's deployed onto the network that executes when people send their transactions. Furthermore, contracts can also interact with other contracts. So we can have these highly complex stack traces between different contracts on the network. So smart contracts themselves, this is just a little bit what they look like. Don't really, I'm not sure you can read the code on the right, but don't even try. This is just to give you a sense of what they look like. They're programmed in this language called Solidity and they basically encode state machines. And so you can declare a certain number of state variables in them and you can also declare some functions to mutate and modify those state variables if you want. One thing that's really unique about smart contracts is that they actually have a lot of assertions compared to code written in traditional languages. And so this is actually how error handling is mainly done in Ethereum. When a contract encounters an assertion or require a statement that's false, there's a state rollback mechanism that rolls back the state of the contract. And like I mentioned, these contracts will execute whenever they receive a transaction. Also mentioned that even though they're programmed in Solidity, Solidity is a compiled language and so these are actually compiled to binaries and that's what's deployed onto the network. And so because of that, binary analysis is useful because generally we don't have a source code for many contracts. So now let's talk transactions. These are the fundamental communication interface for the Ethereum network. That's really it. You send transactions between two entities on the network and you can do things like transfer ether between entities. You can deploy contracts and you can also interact with contracts. Interacting with contracts just means calling functions in them. The structure is pretty simple. You specify a to and from address because every entity has an address. You can specify some ether to send and you can also optionally specify an arbitrary data buffer. And this becomes very important when dealing with smart contracts. And we'll dig into that later in the ABI section. So let's talk about the Ethereum virtual machine. This is a pretty interesting virtual machine. So it's a stack machine and it has 256 bit native word sides which is huge. It has about 200 instructions and these fall into a couple categories. You have your standard arithmetic instructions like add, multiply, divide and whatnot. And you have very simple control flow, just conditional jumps and unconditional jumps. We have a variety of memory access instructions because there's a variety of types of memory in the Ethereum virtual machine, which we'll see in a bit. And lastly we have a number of kind of interesting domain specific instructions related to blockchain stuff. So we can get the timestamp of a block. We can get the address of an account. We can revert a transaction. We can self destruct. These are all interesting kind of odd instructions that are very domain specific. There's also this concept of gas, which is basically a cost to execute an instruction. And so you actually have to pay money to execute your programs in Ethereum. And the reason for this is because these programs are turned complete. And so in order to avoid situations where the network gets dosed because someone intentionally or accidentally executed an infinite loop, for example, instructions have a cost. And that limits the ability of denial of service in Ethereum. And lastly, like I mentioned, so this is the compilation target that ultimately smart contracts are compiled to. The Ethereum virtual machine has a number of virtual address spaces, which is also kind of pretty novel. And so there's four main address spaces that the EVM can talk to. And just to be clear, these are all virtual address spaces. These are not, there's no real hardware involved in any of these. And so the first is storage. And so you can think of this as a smart contracts persistent storage, almost like disk. This is where a smart contracts state variables get stored. And what's really interesting about it is that it's virtually infinite in that it's a 256 bit addressable address space, which is really massive. It's also relatively expensive to use the instructions that write to storage. Memory is a second region. And so this is a volatile address space that gets cleared out every execution of the smart contracts. And it's just used for intermediate computation. It somewhat resembles a heap, a very simple heap, almost like a break or S break in Linux. And it expands as you need more. But that incurs gas costs. There's a separate region of memory for the call data. And that's the transaction data buffer. And so the contract needs to access this in order to know what functions to call and access arguments and stuff like that. So that is a separate address space. And lastly, we have the stack. So like I mentioned, the EVM is a stack machine. And so all of the instructions are going to be mostly using the stack to perform the direct memory accesses. So now we'll get into something called the Ethereum Application Binary Interface. And this is really the core of how calling functions and interacting with contracts work. And so when you call a function in a contract, you need to specify some information. You need to say what function you would like to call and you need to provide some arguments to that function. All of that is serialized into the transaction data buffer according to this ABI spec. It's pretty simple and has two main components. Like I said, the first four bytes are used to identify the function that you want to call and followed by the arguments in serialized form that you're calling the function with. Here's an example of a simple ABI encoded transaction data buffer. And so say we have a smart contract function that takes three unsigned integers. Okay, so say we want to call this function with one, two and three. To formulate our data buffer properly, so first we need to compute the function identifier. And this is defined to be the SHA hash of the prototype and just the first four bytes of the SHA hash of the prototype. For the simple example, the arguments just follow after that. And so we have one, two and three in big endian formats after that point. So it's pretty straightforward. Things get more complex when we start using more complex Solidity data types. And so if you had a function that takes an unsigned integer and a variable length array of unsigned integers, and we wanted to send it the argument one and then this array of 42, 43, 44, then things become more complex. And so we have the one encoded like before, but after that point some other fields get introduced. And so we have this offset field and n elements field that basically you can use to identify where the actual data lives in the transaction data buffer. And so now this thing is starting to look more like a complex binary format. And so that'll become relevant later on when we try to symbolically execute things. And lastly, let's talk about the bytecode format. It's also very simple actually. It's really just code. And it always begins with a certain section at the start called the dispatch stub. And there's an implicit entry point of zero. And so the responsibility of the dispatch stub is to parse the first four bytes of the transaction data buffer and dispatch to the functions accordingly. And so after the dispatch stub, it's really just the functions in the contract. So here's an example disassembly of a dispatch stub. It's not super important to read every single instruction here, but it's useful to walk through just to understand how it works. And so at the top here, we have this call data load instruction. So this is used by the contractor to actually access the call data region of memory with the transaction data buffer. We can see later on in the code that it's pushing this hex number onto the stack, doing an EQ instruction and then jumping. And so this is how it's dispatching. Check. Okay, we're back in business. Sorry about that. Does anyone explain to me what the rest of this disassembly does? Okay, so like I said, this code here that's circled is pushing this hex constant, comparing it to the first four bytes of the data and jumping. And so that's how the dispatch mechanism works. It's looking at the data and then jumping into the bytecode based on that. We have an identical construct later on in the code where we have this different hex constant and it's comparing against the first four bytes of the transaction data and jumping. So that's really how the dispatch mechanism works in the Ethereum virtual machine. If we send it junk data and the first four bytes didn't match any of the function IDs, we hit this revert instruction and the state reverts. And so just to summarize, Ethereum is this interesting decentralized virtual machine based computation platform and using it we can write and deploy smart contract applications and interact with them using transactions. So now let's really get into the meat of things and combine symbolic execution with Ethereum. And so our goals here are pretty straightforward. We want to kind of follow in the tradition of classical symbolic execution and generate inputs that exercise the functionality of the contract. We want to do this in such a way that where we enumerate the state space and discover failure states, we also don't want to generate false positives in that if we find an error state in the program, we will always be able to generate an input that triggered it under concrete execution. Lastly, we want to allow humans to reason about and prove properties about contracts. And so here's the methodology of how this is going to work. Basically, first we need to implement a symbolic EVM interpreter. This interpreter will differ from normal ones in that it can handle symbolic arguments and it's able of building up symbolic expression trees and propagating things based on how the program executes. Once we have the symbolic EVM interpreter, we'll then start executing contracts with symbolic input. And in this case, symbolic input is really those fields of the transaction. And so we can provide two main inputs to a contract in a transaction. We can provide a symbolic transaction value and a symbolic transaction data buffer. And really the data buffer is the more interesting one here. This is what we can use to pass symbolic arguments to contracts and automatically do things like find all the functions. And so that's what we'll focus on today. So just to review in a concrete transaction, we have one initial contract state. We apply a transaction, a concrete transaction and we have one output state. It's a little bit different for symbolic execution of Ethereum. So now we have symbolic transactions that we're kind of throwing onto network and we won't get one output state anymore. We'll actually get a number of output states. The number of output states we get will depend on exactly what the contract does, what state forking happens, when the contract executes and so on. And so, yeah, so this is just illustrating that. We're going to make the inputs of the transaction symbolic. Each of the output states that we get out of a symbolic transaction can be classified into basically, we call them alive and reverted states. And so if the contract encounters an error and reverted, that's obviously a reverted state. But if the contract is kind of executed cleanly and got to one of the valid terminating instructions, namely just stop and return instructions, we call them alive states. And these are candidates for sending further symbolic transactions to further explore the state space of the contract. And so here's just a couple applications that we've been exploring of Ethereum symbolic execution. So for one thing, we can automatically check assertions in the contracts. This is because the way assertions just work in Soliti is they're compiled as branches to invalid instructions. I'm not quite sure why it's an invalid instruction. These are handled by interpreters as the same as reverts. But because in symbolic execution, we try to explore all states, if we can find a state that gets us to a revert, then we've found a way to make the program fail. And so we get this checking automatically just because the analysis will try to fully enumerate the states. Function discovery is another really good application. And so if you have experience with traditional binary analysis, you know how hard it is to actually extract functions from like a statically compiled Linux L binary. It's not trivial to find the functions. Things are a little bit easier for Ethereum because of the dispatch stub. And so remember that if we symbolically execute the dispatch stub, we're gonna find all paths through it. And luckily for us, all paths through the dispatch stub lead to the functions. And so simply symbolically executing the dispatch stub will automatically recover all of the functions in a contract. And this is actually very straightforward and a very nice use of this technique. So now let's walk through an example of actually generating transactions that will drive a contract into a certain state. So this is an example, smart contracts, run in Solidity, and it has some buggy code. And so at the bottom, guarded by this kind of if statement, there is this possible integer overflow. However, in order to get to that point, you first need to kind of prime the contract and get it into a state and exploitable state. And so it actually requires two transactions to reach this overflow. And so that's still fine using our symbolic execution. We can still generate transaction sequences for arbitrary numbers of transactions. And so here's a visualization of what is actually going to happen if we try to symbolically execute this program. So on the left, we have our initial starting contract state. And we'll submit one symbolic transaction. Out of this, we'll get two states. And so one of the states will be a revert state. This will be because the first four bytes did not match the function identifier for the function. And so that will revert. And so more interestingly, in the state on the top, this will be the state where we discover where the function is in the contract and execute it once. So this is going to prime the contract and get it into the exploitable state. At this point, we have one alive state, so we can go ahead and submit one more symbolic transaction. From this, we'll get two more states. And one of these will be the overflow. If we flip back to the code, we can see that there's this check. If the input parameter is less than 42, it's safe. Otherwise, it reaches the buggy code. And so those are the two paths that we'll find later on. If we dig into this a little more deeply, we can look into exactly what's going on with the constraints. And so for the revert state, there's two symbolic inputs. It's just the value and data. And the only constraint is that the first four bytes cannot equal leet-leet. And so let's pretend that the function identifier of this function is hex-leet-leet. And so the only constraint is that this is not the case. For the safe path, one transaction later is very similar. So the data must be constrained, the first four bytes must be constrained to hex-leet-leet for both those transactions in order to execute the function. And for the second one, the argument actually matters. And so we have constraints that the first argument must be less than 42. And this is to execute that safe path. And lastly, for the unsafe path, we have almost identical constraints except for the last one. And so remember that the if statement mandated that the input variable needed to be greater than or equal to 42 to reach the buggy path. And so that's the constraint we have here. So using these tables of inputs and constraints, we can then use a constraint solver to actually generate real transactions. And so basically, we can say Z3, which is a solver from Microsoft. Given that there's no constraints on the transaction value, what is an example transaction value I can use here? And it might generate this random number, which is fine because there's no constraints. On the other hand, that is of course constrained. So anything except leet-leet will work here. And so maybe Z3 will say hex-cafe-cafe, that's fine. And the same kind of thing happens for every path that we want to generate an input for. We have our set of constraints. We can use Z3 to fill in the values and the transactions and actually generate transactions that we've used on the blockchain. We'll make the contract get into one or more states. So now let's talk a little bit about challenges. And so we'll talk about three main ones here that we encountered. And so first state explosion. So here's a snippet of code taken from the Veritesting Paper, which is a paper from Carnegie Mellon, pretty famous. And it basically shows as an example of kind of worst case scenario for symbolic execution. So in this code, we have a certain structure as a loop over this symbolic buffer of input bytes and inside the loop, we have a branch. And so for every iteration of the loop, we're checking if it's a B and bumping some counter. And then finally, after the loop, we check if the counter was precisely equal to 75, we hit a bug. So this is really bad for symbolic execution because every iteration of the loop is going to double the number of states that are currently being explored. And eventually, after the loop executes, it will have produced two to the 100 states that need to be explored. And so it's very difficult to pick out this one state where counter is 75 and hit the bug and choose that out of all of those two to the 100 states. And so this is pretty bad for symbolic execution. And so when we were first looking at this, so we knew that symbolic execution really struggles to scale for large programs because you inevitably hit massive numbers of states. But when we noticed that smart contacts are usually very small, they're really in the hundreds of lines of solidity code or less. And so that was initially really promising for us. We thought this might not be a big deal. Then we kind of realized something. And so even though the smart contract itself is less than 100 lines longer or so, there's always this infinite, implicit infinite loop around it for receiving input. And that kind of throws a wrench into things. And so if we make this kind of metaprogram of what we're actually trying to analyze, it's basically this infinite loop where we're getting transactions and running the contract. Then it becomes a little bit clearer how we have not avoided the symbolic execution problem. We still have this loop and we still have this branching inside the loop based on input. And I don't know if this looks familiar to anyone, but yeah, this causes problems. And so here's a little visualization. So if we have a simple contract with three functions in our first transaction, we can we'll get three up a state space on each function. And after each transaction, we'll kind of start exploding our possible state space. And so of course, the caveat here is that it really depends on exactly what code is being executed by the contract, but in the worst case, we can have these kinds of patterns, if not worse. And so something that we're doing currently is actually exploring ways to use concoct execution, which has been used very successfully in the literature in the past, where you take a seed trace and use that to help your analysis. And so using this, you can explore deep paths close to the seed trace and not be overwhelmed by the state explosion that still occurs. So now we'll talk about a couple Solidity language features that make it really hard to have symbolic execution for Ethereum. So one thing is mappings. So mappings are the native data type for hash maps in Solidity. And this is a very simple contract that uses one. And so it uses a map to keep track of the balances of users. And so it maps from an address type to an integer. And using this update function, you can actually call and update your own balance in the contract. And so the mapping type is actually very interesting. It has some pretty interesting semantics. And so the semantics are that all possible keys already exist in the mapping and are zero initialized. Because of this, mappings are not iterable because, for example, if you key on an unsigned 256-bit integer, that's an enormous amount of keys. And so it doesn't actually, so of course it implements that using kind of sparse memory allocation and stuff like that. But the most interesting part of how mappings work is that there's this direct mapping implementation where if you want to store or look up an entry in a mapping, you do something like taking the hash of the key and using that as the literal address and storage of where you want it to store the data. And so we have this really interesting pattern where we have true constant time access for members of our hash table because there's no buckets. Everything is direct mapped into the address space. And we can do that because the address space is nearly infinite at least virtually. And so this presents us with some challenges though because, so first of all, hash accessing mappings with symbolic keys is pretty common, but then we lead, that leads to hashing symbolic values. And that is a hard situation. Furthermore, we have more symbolic storage indexes being used which complicates things. But basically computing the hash of a symbolic value produces a symbolic expression that is intentionally impossible to solve. And so this way throws a wrench into things. As an example, just look at this. And so we tried to hash a symbolic variable and let's say we want to constrain it to this random hex value. If we could provide this equation to a constraint solver and have it solve it for us, we would be effectively reversing the hash. And that is explicitly a property that is impossible with cryptographic hash functions. And so this is pretty problematic because if we just straightforward symbolically execute through hashes, we will not be able to solve for constraints in any of the later states. And so this is a big problem. And so we have kind of a work around solution that we've been experimenting with for our implementation. And it's based around concretization. And so as we analyze and execute through a program, we'll see some concrete hashes being computed. So maybe user A is hashed and used in the mapping. Maybe user B is also hashed and used. And then when it comes time to compute a symbolic hash, rather than really just straightforward computing the hash and being unable to solve constraints from that point on, we constrain it based on what we know. And so the idea is that if a dictionary map was, or if a hash map was looked up based on a certain key previously, chances are later on it will be interesting to use that from a symbolic execution perspective later on. And so we replace the complex hash expression with this kind of alternate expression object. It basically just represents everything we know about all the concrete keys. And so it's basically just uses an SMT if than else expressions. So using this strategy, we can avoid the case where we can't solve our constraints anymore. And it will allow our analysis to continue with soluble constraints and so we can continue and fork states in this way. One of the other challenges we've been encountering is trying to have our analysis support all types of solidity types basically. And so we found some challenges with supporting dynamic arguments. And so dynamic arguments are variable length arrays essentially. So expressions can receive variable length data passed to them. However, like we mentioned before, this makes the transaction data become a much more complex format with lots of offsets and elements fields which will lead to symbolic indexing and mem copies later on. Because if this field that contains the offset to the actual data is symbolic will inevitably get an instruction that tries to load from there. And I will produce a symbolic index which as mentioned earlier is a pretty tough problem to solve. And so the workaround we've been using for this is to really aggressively concretize the transaction data based on what we know. And so for example, for this function that takes two dynamic length arrays A and B based on since we were passing it a concrete length buffer even though it has symbolic elements in it we still know how long the buffer is. And so using this we can kind of compute a fair distribution of space for each element. And so we take the total space and subtract all of the space for the metadata for each argument like the offset fields and elements fields and divide that. We can arrive to the conclusion that there's actually only 32 bytes of space for each argument and that's enough space for one element in each. And so we can effectively concretize everything else based on that. And so using this we can have constraints that are much more easily solvable and have more performant analyses. Of course the main limitation of this approach is that the state space gets artificially limited. And so it's just based on however much data we chose to provide in our execution. And so if we do this we're going to miss branches. For example, that require the length of A to B greater than one because we concretized it. But this has been working somewhat well in practice so far. And then some other miscellaneous challenges that I won't get into today are that in order to really have a complete symbolic environment model for symbolic execution you need to support complex intercontract calls and in a very full environment model. You can't just model the environment for execution of one contract because it's often the case in real contracts they talk to you many others. And so you need to have a very full environment model. And lastly, gas and symbolic gas in particular is a very interesting challenge that we need to support also. So now I'll just have a few words on our implementation. So it's implemented within this project called the Mantra Corp project which is an open source symbolic execution tool. Historically it's a symbolic execution tool for regular binaries for x86 and RM and so forth. But over the past I would say nine months or so we've added support for Ethereum to it. We've made a little sub module and done it in about 4,000 lines of Python code and using it exposes a pretty nice Python API. You can do things like launch symbolic transactions and some of your own SMT solver queries to ask your own questions about what's possible for the contract to do. It's available on GitHub also and from PyPy. And so we've been using Mantra a lot recently internally at Trail of Bits for doing smart contract audits. And so our auditors have been writing quite a lot of Mantra Core scripts recently to test their own assumptions about what the contract can do and once they narrow in on a kind of a sketchy or part of code that's worth noticing more they'll often write Mantra Core scripts to test that. We've also deployed it within client test infrastructure. And the pattern generally looks like this. And so we'll have a set of Mantra Core scripts that don't attempt to just run on the contract and find all the bugs. They're more targeted for checking specific invariants that are important for security. And so the general pattern is to have a script that initializes the state of the contract and watches a fixed number of symbolic transactions and then asserts certain invariants in all states that are discovered. So now if I have some time I'll just do a few quick demos of Mantra Core tool. And so we'll start with a simple demo of running Mantra Core on the contract that I showed before. It's this smart contract that has the integer overflow path and requires two transactions to reach that path though. And so Mantra Core supports a really intuitive command line interface and so you can really easily just launch it at contracts and start sending symbolic transactions and generating states. And so in this case you can see that it's starting symbolic transactions, started one and started a second transaction over here. And lastly it detected that 100% code coverage was reached and so it halted the analysis. And so we can see if we dive into this workspace directory that's produced. Now there's a lot of files here. And so for each state that Mantra Core discovers it generates a set of files corresponding with various pieces of information about that state. And so for example it generates in this Solidity code we have these little log lines that execute when the path is reached. And so we can search through them for the overflow path. Let's do that here. And so you can see that in the test for test case there that's the path corresponding to the overflow. And so if you look at the files produced for test for we have a lot. We have the file containing the constraints that make that state true. We have a file containing the runtime code execution that the contract executed there. And probably most interestingly we have a file that actually has concrete transaction data in it. And so remember every path that Mantra Core finds it generates a set of inputs for. And so you can use these to concretely get the contracts into a certain state that's found in the analysis. And so we can see what it generated. And so the first transaction is a transaction to create the contracts on the blockchain. The second transaction is calling the test me function with this certain variable. And if we look at the solidity the input is not used for this section that primes the contract. So it doesn't matter. And so this will work just fine. And lastly you can see that it's calling test me again within this other argument. And in order to reach this code the input just needs to be greater than or equal to 42. And this number looks bigger than 42. And so that will work just fine. Now I'll quickly demo some of the more powerful features of the Mantra API. And this is what you really want to use for auditing contracts. And so we're gonna be analyzing the source code of this wallet contract. It's pretty simple. It, once you make it, it sets you as the owner. And then you can do things like deposit ether into it and withdraw it at a later time. The withdrawal function is protected such that only the owner can withdraw the ether. And so we can test this contract with Mantra Core to see if it's possible to withdraw the ether even if you're not the owner or something like that. So to do this we can import Mantra Core, create this initial blockchain state. This is Mantra Core EVM objects. And later on we can set up our custom environment. And so our environment will have an account, an external account for the creator. And we'll also have an account for a simulated attacker. Lastly we'll use a Solidity Create Contract API to deploy the contract onto our emulated blockchain. After that we'll submit one transaction. And so we'll call the deposit function in the contract account. And this will put some ether into the wallet. And so we'll pass that with, so we'll call that with the creator account as a caller and this value for the number of ether to send in that transaction. After that we'll send two symbolic transactions. And we'll send them from the attacker's account. We'll make a symbolic transaction data buffer using this API and just use that as the data for the transaction. And then after we send two symbolic transactions from the attacker we'll check every state. And each state we can do a custom query to see if the attacker's account balance can be greater than one. And so the attacker started out with account balance of zero that is right here. And so we'll see, is there any situation where the attacker actually has ether after submitting two transactions? And we will print yes if we find a state like that. And so let's just go ahead and run the script. You can see that it goes ahead and creates some accounts, deposits some ether into the wallet. And now it's starting to send the attacker transactions. So it sent the first one pretty quickly and now because it needs to run over every produced state, it's taking a little bit longer for the third one, but in a couple seconds it's going to finish. And we'll see if we found any states where you can steal the ether. And so of course we found a lot of states or that didn't work. But Manicor tells us that there was actually one state where we can steal the ether. And so let's look into the output directory. And so the script generates this custom test case called wallet hack. And let's look at how Manicor found a way to steal the ether. We'll look into the transaction file to see that. And so we can see a transaction that creates the contracts. We can see the first transaction, which is from the creator's account, which is depositing this amount of ether into the contract. And then we have our two attacker transactions. So let's see what it found. And so it found that if you can call the change owner function and then withdraw, you can then change the owner to yourself and then withdraw all the ether. And so the bug in this contract was that the change owner function is public and anyone can call it and make themselves the owner. And so the key point here is that all we needed to express is our desired end conditions. We just needed to say give a contract here and is there any possible state where an attacker gets an ether? And that's it. And then the analysis can go and find a way to make that satisfiable. So that pretty much wraps things up here. Summary is that we found that symbolic execution is definitely possible for Ethereum and shows a lot of promise for use already. There are of course many interesting challenges that we need to overcome, but there's a big potential depth in fact in this space here. If you're interested, you can check out Manticore, which is available open source implementation. And I just also want to give a special thanks to my co-worker, Flavio Manzano, who did most of this work. But yeah, that's pretty much it. Real slow hiring, so if you find this very work interesting, please get in touch. But thank you all for your time and attention.