 Yeah, so this came out of like a fun project. I wanted to write like a full like an end-to-end Symbolic execution engine in solidity with including the the solver that we usually using in the in the background and This is actually pretty fun. So I wanted to share with you what this does So a few questions for like audience adjustment who yes, who here was at Hari's talk yesterday about symbolic computation in Yule Okay, who knows what symbolic execution is Okay, nice. I am prepared to tell you what it is though for the ones who don't know So I'm quickly gonna go over some slides and then we're gonna move to code this word Oh, these expressions are thrown around a lot like what symbol execution? What's constraints? What's SMT solver? What's all that stuff and a lot of people know what it is but if you don't know that is it kind of like you feel left out and these words are used very Trivially even though they're not really they're simple but not trivial. So first I'm gonna tell you what's not symbolic execution and That's concrete execution. So let's say you have. Oh, yeah last question. Who can read assembly like this Nice, okay, that's very good So let's say we have this Assembly here from EVM and you want to run a concrete execution over it So let's say we have this call data It's 12 in hacks. I don't know what that is and what is that in decimal like 20? No 18 yeah, it makes sense So if you run this program that call data, what do we get? What's like there's no return here. So let's just consider the top of the stack as sort of a return In this case, we got a 12. We're gonna push 0 and the stack would call data load that Position we get the 12 push 3 multiply that's gonna give us 36 in hacks We add one so we get This number top of the stack, right? So this concrete execution you have a program You have a concrete input and then you execute the program that input to get an output So now what is symbolic execution? We just said what's not So T11 actually has a pretty good explanation of what it is So it turns the program into math and then solves math formulas That's actually a very precise although very vague Description of what it does we have the same program here, but now instead of a concrete call data We're gonna have a symbolic all day What does that mean to be symbolic is that you just use a variable instead of a concrete number? That's really all you do you look you keep looking at variables. So Our entire call data is going to be CD now. It's just a variable the top of our sites We're gonna be available called top that's all we're doing so here We're gonna collect constraints which are things that must be true when we run the program. So When we run call data load of zero, we're basically doing CD of zero, right? And then we can also call it X. We can call it Z. We can call it whatever. It's just a symbol and It happens that that thing must be greater equal zero because we're treating we're dealing with EVM words and they're unsigned, right? The second part is that we are signing that to X We just make a variable out of out of that when we do the call data load Next thing we have this multiplication. We can make another variable for it called Y Which is now X times 3 Finally for this ad we can make another variable for it called top And then which we just did the previous top of the stack, which is Y plus 1 So you see that for every operation we add a new variable We add a new symbol that represents that expression and we collect all these expressions So we end up with what we call these constraints and this is what is our symbolic encoding So you take the program in EVM bytecode and assembly here Which transform it into a set of constraints and that's our symbolic encoding. They're different There are many many different ways to write these constraints from a program Which means we have different ways to symbolically encode your program Why do we want to do this? So our symbolic encoding is going to turn into a system of linear inequalities So this was our set of constraints That's going to turn into this system of linear inequalities. There are equalities here But they can be very quickly translated into inequalities so we just keep them as equalities for simplicity and Why do we do this? The reason why we do this because we know how to solve these things With algorithms that you might have seen in high school or university or you still will see it when you finish high school and what could we do with it? We can for example did the same thing we did before we can give a concrete call data in this case the same call data as we had before and Run it symbolically. What does that mean? We're gonna in this case a simple substitution does it we can substitute CD by this entire call data We should substitute CD of 0 by 12 and so on and we get That the top is going to be this number When we solve that with Gall-Jordan elimination or gosh elimination or there's tons of ways of solving it a simple substitution here will do It not only gives us the value for top It gives us values for every single variable in this system We could have waited that it before with the concrete execution So the cool things we can do with this we can ask many many things So you can ask for example can top be greater than 10,000. How do we ask that? We simply add a constraint that represents that statement in this language So can top be greater than 10,000 basically gives us this constraint here and then you but here's the answer It is possible in which case such a solver such a math solver will tell us that the system is Satisfiable meaning that there is evaluation for every single variable in the system that makes this the the set of constraints Satisfiable it is possible that all the constraints are true at the same time And it also gives us what we call a model which is the actual values that make the whole the whole set of constraints valid and similar to before we have a value for top and we also have values for every single variable that appears in the in the system We can also ask can top be zero What do you think so we can represent it for this constraints pretty simple By the way, we're all these variables are on the integers. There's no real irrationals here. Everything's integers In this case it's over says the system is unsatisfiable. It's inconsistent. There's there's no way all these constraints can be true at The same time. They may be true in separation, but they will never be true At the same time what we want to do is basically Do exactly that in Solidity so but what exactly do we want to do you can do several things you can Try to prove that a certain assertion is true you can prove that a certain thing always happens or never happens kind of things For our use case, we're going to try to find unreachable branches. So here's a piece of Solidity We're going to analyze even bytecode, but this is just to show an example So I have this function takes an integer acts and force we require that acts has to be last or equal 10 Then we have in the branches has if acts last we go 50 does something and then The rest does returns false So this branch here basically the false branch of the if it's unreachable right because acts we know that acts is less or equal 10 from the beginning of the function Which means it's always less or equal 50 Meaning that it will always enter the if and it will never come to this part of the of the code you take the true branch this is reachable because these constraints are Satisfiable together But these are not right safe acts has to be less or equal 10 it cannot be greater than 50 So this branch is unreachable. We can remove the whole thing. Well, not the whole thing just the the bottom The cool thing about trying to get stuff is that we need very little support From the AVM, of course If you want to write if you don't write a symbolic education and you need to write an interpreter in the first place But you need an AVM interpreter But we don't want to deal with every up here because it gets really big really complicated And we don't want to do things like call create and storage and all this complicated stuff Which of course you could in a lot of tools do that We don't want to do that in Solidity the cool thing again about this encoding We we only have to care about the stack operations control flow so jumps and Upcodes that stop the execution and relation upcodes And so basically we're going to care mostly about as I saw before ifs with Relational operators inside so less than greater than and the negation of these so that you can get Less than or equal and greater than or equal the symbolic encoding. We're going to use is also pretty simple So for every EVM expression that we saw before we're going to transform into math Constraint of the form a minus B less or equal k where a and B are going to be variables and k is going to be a constant So this is also what Harry was talking about yesterday in in his talk And we got things like like this for example whoever was in Harry socket together. Is this system satisfiable? Yeah, so if you sum everything on both sides, then you got basically zero last or equal minus two which is a contradiction So this system is impossible to be satisfiable the reason why we like this encoding is because We can use a difference logic solver and again as I explained yesterday a difference logic solver is very simple to write What does it do? It basically takes these constraints this math constraints and it tells you Whether it's possible or not that these things are satisfiable at the same time And if it is possible, it's going to give you values for the variables that make it satisfiable Otherwise just says it's not possible at all for these things to be satisfiable together The solver is much simpler than the things you need like ILP or SMT when you need to solve it linear combinations Or even nonlinear expressions and the sort sorts of things that you end up with when you start encoding Arithmetic expressions and other things and as also higher explained yesterday, this runs in one or more time on The graph generated from the constraints using the Bellman Ford graph algorithm not gonna go much into that if you want to Learn more about that. Please rewatch high stock and This algorithm is super simple. This is basically almost the whole thing and most of it is comments so now all we need to do is Put it all together So this is the whole project So these are two tasks files one for the solver itself Just like unit test for the solver and one for the symbol execution engine and then you have all those files for the for the whole engine so So What we want to do is like write the interpreter how to be right the interpreter we need so by the way Please if you have questions, just like try to you can ask questions right now just to make sure we go numb in a good flow Actually, it's not here. So usually when you write this interpreter for EVM you have this context What is the context you have the code that are executing you have the program counter? Like what which opcode are you executing right now? We have a stack in our case gonna be a symbolic stack Don't worry about that now We have a path which basically it's a it's a path of all the program like of the program counters that were visited in the In the jumps and we're gonna use this to detect loops and exit because we don't want to encode loops We have an array of constraints. This is what I showed before like when you see the require It says require act less or equal 10 we go into that branch so we keep that constraint because that thing needs to be true for the rest of the execution and this counter variable is just a Counter to help create new variables when we were creating new variables for the expressions x y and z and so on here We're gonna be expression one expression to expression three and so on and they need to be different in the branches that we execute So that's it. We have the context Ignore the rest for now. Let's jump right into the Execution engine so we have this recursive function called run from Which takes our context so here you see it's basically gonna traverse All the opcodes select the opcode This is just a check that we just exit early in loops we becomes an under approximation, but we don't want to do loops We extend the path with the Disney opcode And here is the part that The first part of the what I mentioned with the interpreter. We need to care about stack opcodes Because they're gonna add some numbers and a bunch of different things jump locations jump destinations That we will need when when building our constraints. So this just does the usual Stack handling so if it's a swap you're gonna go into the stack and swap the numbers if it's a dupe You're gonna duplicate whatever it is we've duplicated if it's a push just push that number into the stack here The only thing we do is apply a function on The stack arguments we just so for example if you have an ad We're not gonna encode the ad precisely here, but we do need to consume the stock slots and put a new return Value there, right? So this basically just uses this handlers we have internally. It's like a bunch of function pointers To if you have an ad so this is generic for like all the all the upgoes We don't care about it takes the number of arguments this opcode takes pops all of them from the stack Creates a new work what we call symbolic variable So like a new expression 13 or whatever here and puts that expression on the stack And we don't really have don't really care what it looks What it looks like right now. We don't really care what kind of constraints we have over it And In here is a part of the that we actually care about the where we do the check So if we have a jump every time we see a conditional jump We take the the the argument of that jump we take the condition that makes it jump and we ask the solver Is it possible that we can't jump to that location? because this is what before we had this this required an if right so we have required x less or X less Then are equal to 10 so we have that saved x must be lesser go to 10 And then we see if acts greater than 50 and then ask is it possible though that is that acts is written that 50 so at this point that's when we We have all the math things encoded already in here As you'll see this makes symbol. This is all like internal helpers not important right now so here is what we Do the check So a check if the opcode is relational So if you have a last then or greater than or if it's an is zero because if you need to do greater or equal We don't have greater equal in a VM. So we need to do is zero Last then right so and similarly for Last than or equal. So if it's one of these we basically take the the condition which is here we transform it into an DL expression difference logic expression And basically call this over so we have this check here Which will take all the constraints and we'll call this little DL server that I mentioned Which and this is the check We call this over and if the server says unsat then we say this branch is unreachable Why unsat because if the branches if the branch conditions are unsatisfiable together it means that This over will say unsatisfiable right and this means that that part is not reachable Because there's something in the middle that is not allowing that will not make the code the code the code path is basically going to be inconsistent and What the silver looks like is? Yeah, this is the entire silver So you take the constraints So here just build the graph. So this is building the graph You just need to know how many variables you have This is still building the graph and then here we run the the single the single source for this path algorithm Which is the first part of the silver which is basic. Yeah, it's a very basic graph algorithm Then we did a negative cycle detection and that's really all this over does this is and 20 lines of code it's really simple and Yeah, it's a logic that I really like because it can do a lot of things While being very very simple The last thing that I want to show is how do you use this? so Here have this contract unreachable that has a bunch of functions and that is like some unreachable branches in each of these functions All we need to do to test that is really just this We call the same run function from the library from the symbolic symbolic execution library with the runtime code from the contract that's really all and Here's a simpler thing just like manually just a simple test With somebody care that I know what it does and if you run this Yeah, so here you'll see that This is the complicated case with all the solidity high-level functions and It's not the best way to report the things but it reports it emits an event That's as unreachable branch with the PC 255 this tiny example So it says this is the same as I showed in solidity just like in written quickly in in Yule and This is the bytecode that it generates well that I wrote manually and So we can quickly Compute this together keeping track of the stack and then you'll see what it What it actually does in execution so push zero We're gonna push a zero then we do call it a load of zero. It's gonna give us some x Then we push 10 we duplicate the x We do last then so this is gonna turn into this This is the stack by the way and The top is the higher position. So now we push tag zero B And we do the like a conditional jump where the false branch stops So we don't care about the false branch. All we do is just so we're gonna pop We're gonna pop this first two arguments, right because of the Jump but because we go into a branch This condition must be true for the rest of execution, right? set of constraints So we keep going in tag one jump that's dope. We duplicate acts again we push 50 we do an LT now and Which consumes the x and this becomes top of the stack just a symbolic expression like y equals x length less than 50 less than x We push 14 and then we jump To and then at this point when we see the jump We're gonna ask the solver. Is it possible that this new condition 50 less than x is? Consistent with what we already have x less than 10 and it's over is gonna say that's not possible and at this point here Precisely at this jump I that's where we basically stop and say this branch here's is the true branch like where you're jumping next is Unsettled soluble so you can actually remove that entire branch, right? But yeah, so There's a lot of code behind that But it's a lot of helpers the main intuition was basically this algorithm of just like running the interpreter carrying about a few codes not carrying but other upcodes and Yeah Happy to take questions Super cool talk. This is why you can do this in solidity I'm wondering about overflow though because your assertion at the beginning was that that one example you showed was impossible But it is very possible in the presence of overflow now that one wasn't because you need the answer to be minus one-third So it's not gonna be possible Okay, sure. Yeah, but it doesn't look like you handle overflow any yeah, right? Yeah, exactly So that means that your results are not Correct. Yeah, exactly. There's a bunch of soundness stuff you have to add to this Okay. All right. Thank you. Yeah, sorry. I didn't see but is this code somewhere? available to Yeah, this repo github.com slash that thing and There's probably bugs in there though. So feel free to fix them So since you wrote this in solidity, I do have any thoughts on why you would want this to be executed on chain Not at all Should not run this on chain. Yeah Actually, so if I wait let me quickly Just look how much gas this takes for this tiny program You said that for a given program, there's a lot of different encodings you could have Can you give a few other ideas of other encodings and what would be the trade-offs? Yeah, so this encoding uses This encoding basically chooses which has different logic because the solver is very simple so we could write in solidity, right? So the the encoding is tailored to become systems often the qualities that look like the constraints that a DL solver would take but Where is it now? In Harry's talk actually which I have here There are different So if you want to if you actually want to encode add mold of all these kind of things you will need More complicated constraints like linear combinations Where is it hurry? Yeah, you'll need linear combinations like this for example and more complicated constraints and you'll need like a simplex or integer linear programming thing or even a nonlinear solver depending and these things are a lot more complicated so the simplex is what it's It's an exponential algorithm that Kind of happens to run mostly in polynomial time these days But a nonlinear nonlinear if you're trying to solve nonlinear constraints That's an undecidable problem, and you're gonna have a hard time solving it so Yeah