 So hello everyone, my name is Sebastian and today I'm going to be presenting this topic about detecting denial of service vulnerabilities caused by gas limits using fuzzing and other techniques but we're not just going to stop at detecting, we're also going to be looking into how to generate exploits and fixes for these vulnerabilities. This is joint work together with Professor Vijay Ganesh from the University of Waterloo as well as the group of his PhD students working on smart contract security. So probably most of you already know this but I created the presentation for general public so not assuming that everyone knows about smart contracts, they're just programs executed on a virtual machine. The Ethereum virtual machine and calling a function in a smart contract changes the state of the EVM and these changes, these state changes often involve transfers of funds and so on. Programs of course might contain bugs, any type of program so therefore smart contracts also contain probably bugs and exploiting these bugs in smart contract can lead to stolen or frozen funds as we've seen many times in the past. On the other hand there's the notion of gas, the EVM has this gas mechanism which charges the function callers or the transaction senders and execution fee which is computed using gas price times gas consumed and there is a block gas limit for each block that is mined and the gas consumed by any function call in transaction cannot surpass this block gas limit. If it does the transaction is reverted and this is meant to prevent resource abuse and denial of service attacks on the Ethereum network. However it can cause denial of service attacks at the smart contract level by not allowing a user to call the function and fully executed so this could lead to frozen funds and frozen funds are basically law funds. Here's a toy example and here's this is something which you could naively implement if you're not familiar with validity or gas issues in Ethereum. Basically you want to reward all the users of a certain bank let's say and you want to push out interest payments every month or every year and to do that you basically iterate over all users and you do some computation based on how much balance or how much deposit they have and how much time they have kept their deposit there and you basically send them this interest. Now you notice that this users dot length is controlled or influenced by users joining this bank right so the more users you have the more iterations this for loop is going to have and if it gets out of like if it if it at some point passes the block gas limit it's going to cause the transaction to revert and if there's no other way to push out interest it basically is going to lead to a lot of unhappy users of this bank and other problems so that's the basic idea. More famous example are you still with me? Oh you broke up okay and you seem to be frozen so I can still see your screen. Okay oh where where did you lose me? Yes exactly now I can see the screen again perfect. Okay so yeah that was the start and you said a more famous example. Okay so more famous example is this project called governmental and it's from 2016 and they suffered for some time due to this kind of block gas limit so there was a denial of service for the payout of the jackpot which was 1100 ETH because the payout mechanism was using too much gas and a part of this as part of this paying payout mechanism the the contract was clearing internal storage using these instructions and this was compiling to something that iterates over storage locations and deletes them one by one and because the list was too long it reached that the block gas limit at that point in time and that led to frozen funds so this is this is the source where I took this information from this reddit post and of course like back in 2016 the the block gas limit was quite lower as than it is today so it was under 5 million and today it's 10 million as we see it keeps evolving it keeps growing with some exceptions most of the time the block gas limit is increased at a certain hard fork and the motivation for this work that we're doing and trying to detect and exploit this automatically is because there's simply too many accounts and also smart contracts on the ethereum network to try to do this manually and find this out so we were basically during our audits even today we're seeing a lot of gas usage issues in the in the smart contract and using state-of-the-art tools like slither or myfrill you can detect these issues it's pretty easy to detect basically look for loops and some computations or function calls in those loops and there's many tools really available that that find this issue however what do you do want to detect them right one thing you can do is try to to remove the loops and and redesign your code such that you completely avoid loops and you just try to accumulate values as other functions are called however if that's not an option you can just maybe do a gas analysis to determine when exactly the the error or the out-of-gas error occurs and you can add something like a required statement or an assert statement to basically prevent the river from happening and prevent the waste of gas and this is currently a manual potentially lengthy and tedious process so the solution which we propose in this work is to automatically generate these kind of denial of service exploits that lead to out-of-gas at smart contract level there's several challenges the first one is like how do you determine the exact gas usage during execution second one is how do you search for these through the large search space of possible inputs to functions there could be functions that have several parameters or even like there could be multiple functions that need to be called in order to reach a state where this kind of DOS or out-of-gas error is reached so the first challenge is actually easy to solve due to web3 and solidity features in our approach we use the gas left function from solidity and we basically simulate everything on top of the ganache network I mean the ganache tool and the second challenge is more I'm going to talk more about that one so fuzzing a large number of inputs is more tricky there's several possible fuzzing heuristics for instance you can brute force every possible input and that's very slow you can do a divide and conquer approach which is faster but it's not always applicable if you don't have certain rules like integer intervals and so on that you can easily divide into partitions or basically for this for our approach we're using reinforcement learning which is also fast and is more generally applicable and we'll see in a second why there's also possible other possible heuristics not saying this is the best one but this is the one we we chose for our project so the reinforcement learning approach it looks like this we basically model the problem as a markup decision process where we say that the set of states s is all of the states of the EVM basically we're just a state is a state of the EVM the possible set of actions is is calling smart contract function with some randomly chosen inputs or also like more carefully chosen inputs increasing those inputs decreasing them and so on so these are like the actions that the agents the reinforcement learning agents can take the probability of transitioning from one state to another when basically when executing a given action is is always 100% because the EVM is deterministic and the the interesting part is the reward function basically the the reward that the agent gets when he transitions to a state s is one minus the division between the the gas left and the block gas limit and this is because we're rewarding actions that are going to consume more gas you know so if if the transaction that led to the state s used more gas we're going to give a higher reward because we want to reach an out of gas error so this is pretty intuitive here's a simple example of a pure function that just receives value as input an integer value as an input and it iterates over all integer values from zero to that number sums them up and returns the sum the goal of the reinforcement learning agent would be here to find the right value for n which leads to this kind of out of gas error and we're going to see later how this code is fixed another example is maybe a slightly different function still pure function that has two parameters and or like you can even think about more parameters but basically you you do some computations with these parameters and they don't always influence the result or like the gas usage in the same way so here you can see that we're dividing n by m so the goal of the reinforcement learning agent is to find a large value for n and a small value for m but m should not be zero because otherwise it leads to a division by zero and what are the sort of the right values for this or or like what what's or maybe something else we're going to see later what's the right way to fix this and here's another example where there's a small contract that is vulnerable it has several entries in integer entries in them and it and it has several functions and the first function just adds an entry in the list i saw a typo there the second function gets the entry at a certain location and the third function sums up the list of entries basically returning the sum so here you can see that the goal is to find a trace of function calls like this basically adding several entries up to n and summing them up and the question is what's the value of n such that when you call some of some entries it leads to block an out of gas error so the the challenge is how to to determine like first you need to determine which functions affect basically the the loop bounds because as we saw before there were there was also a function called get entry and the reinforcement agent should not be calling that it would be just wasting sort of time calling that one because it's not going to affect the loop bounds inside of some entries and the solution to detect which functions affect the loop bounds that we are taking is to to reverse taint analysis and then forward taint analysis so for those of you who are not familiar with taint analysis is a form of information flow analysis where you first paint or tag a memory location for example a variable x then you trace the flow of that tainted value through the execution of your smart contract functions and you determine which instructions or which other memory values are affected by that tainted part the information flow may be explicit where you have a direct assignment memory transfer and it could also be implicit where different values in memory depend indirectly on your tainted value for instance if you have a branch condition like this if x greater than zero and inside of the if and else branches you have other values like other variables like a and b those variables will be implicitly tainted by x so then we do the reverse taint analysis on that function we slightly modified this function some entries to also include an implicit taint implicit taint sample here can you still hear me you were breaking up but now you're back okay so here we modified the some entries function a bit in order to show that can you still hear me okay to show to show that there is a possibility of an implicit taint we start from the loop at the at the close to the bottom and we taint the bound variable and we go up we go in the reverse and we see that first the end the variable n is tainted by bound because there's an explicit assignment and also the length of the entry list is also tainted explicitly and we also have an implicit tainting of those variables but since they're already explicitly tainted and basically they're tainted we also have a taint of the constant zero so that instruction is also tainted where bound is assigned zero value and based on this analysis we can say that okay the function the input of the function some entries is tainted and also the state variable entries is tainted that is the length of this state variable state variable is tainted so once we once we determine this we can do a forward taint analysis where we just taint the entries dot length and we start executing each function to see which instruction which instructions in which function may affect the entries length and we see that only the add entry function affects the length of the entries list the get entries function does not therefore the reinforcement learning agent can just try to call this function before it tries to see if it if it ran out of gas using that some entries function which is not shown here due to lack of space so the question here towards the end is like okay we took this approach we did all this stuff we ran the reinforcement learning agent so what do we do once we know we're already out of gas error occurs I already hinted towards the answer we basically fix the the code and fixing could look something like this if if removing the loop is not an option you could have a require statement close to the beginning of the code which basically signals that the parameter that you or the parameters that you provided will lead to an out of gas error the second example which I showed you you could also go for fixing actually after you've done all the computations you can add this require statement which is easier than necessary than then checking values for different inputs and the third example is also interesting because we're not placing the require statement inside of the function which has the loop like not inside of some entries but we're placing it inside of the function that affects the the length of the the loop so inside of the add entry function and of course these values are just like preliminary preliminary values like they're not they're just mocks you can also not just hard code the values in there but you can also let it be setable by the contract owner such that if there's a fork or the gas limit is increased they can adapt this or if the the upcode cost of the cost of the upcode changes these values can be changed as well in conclusion just want to say that probably no loops cause out of gas errors in smart contracts and these can lead to frozen and hence lost funds detecting such problems is quite easy with state-of-the-art tools however determining exactly when they would occur with which inputs is harder and we're taking the approach of fuzzing with reinforcement learning and taint analysis to generate the inputs needed for an out of gas error faster and in a more general way and yeah we're using taint analysis to guide fuzzing that's it from my side thank you very much any questions thank you so much yeah so now first to the people in the room if you have any questions for Sebastian use the raise your hand feature so that I can see that you would like to ask something and to the people in the live stream we know you're lagging behind so we will wait for you for one minute and you can put your question in the guitar chat yes just Lynn hey Sebastian nice talk hi thank you what tool are you using for for this you mean for reinforcement learning or which part for the wall approach like oh we're we're building a custom tool for this one okay okay so it's everything built into us well it's going to be published but it's basically the joint effort between the university and quant stamp and we're going to release the code once the paper is accepted okay