 Welcome to this workshop on how to build secure contract using fuzzing before we start just to get like a sense how many of you have used a Kina in the past a Kina it's a what we're going to present okay okay so who are we so Gustavo Greco and myself Roslyn Feist we are both security researcher at Trail of Bits if you don't know us we are a company where we specialize in high hand security technology we have expertise in blockchain but also in other topic for example we do a lot of cloud native application or we do a lot of cryptography with like the K stuff and so something that I think differs from other company is that we spend a lot of time trying to apply research and program analysis into like our daily job as a result we have built a lot of open source tool you might know Slitter which is a static analyzer for solidity a Kina we are going to talk about it today but we have a lot of other tool for example tiller which is a static analyzer for Algorand Armana which is a static analyzer for KO and so on four techniques the first one is using unit test the second one is using manual review and the tool as technique are using fully automated or semi-automated technique I'm assuming everyone here is familiar with unit tests you should use unit test and they are good usually to cover that the system is working as expected in the happy path something that we have learned of our audit and analysis that we have done is that there is no correlation between the quality and the quantity of the unit test and the likelihood of having high severity vulnerability and this is we actually have an academic paper on this where we have looked over like all the audits we have done and this is a correlation that we have not found and the reason why our intuition tell us that when you are going to write unit test you are going to try to cover happy path things that are supposed to do in the in the correct execution while vulnerability usually lies in the edge case in the things that you haven't considered the second this is an example of unit test the second technique that you can use is manual review so you can go line by line and try to understand what the code is supposed to do what is actually doing and if there is like a difference between both doing manual review require specific set of skills it's time consuming it's it's it's kind of difficult to do usually you're going to go through a security company to do a security assessment to have people that are the specific set of skills the next technique that you can use is using a fully automated tool these are like the tool that are going to find some of the common bugs you're just going to click on the bottom and the tool is going to tell you there is this type of bug or not for example you might know slitter which is a static analyzer for solidity this type of techniques might give you for running but they are so really powerful because I might cut you know like critical bugs you okay so for slitter like the best technique is spend one hour like the first time you try it there is a triage mode so once you have triage like the results as I won't show up like in the next execution and if it takes you like one hour and at the end of the day you might be able to catch critical vulnerability I would say it's worth going through the first result and like we have like a list of trophy for slitter that demonstrates that we have found a lot of like actual bugs using it so yes there is like a false positive pass for false alarm but it's not going to take you so much time and it's going to provide you value for example we have a github action with slitter when you can connect it to github on every pull request commit depending on how you are going to configure it it's going to run if to see if you are introducing you new vulnerability yeah perhaps sorry perfect perhaps next year we will do a slitter worship on yeah and this is open source and it's real okay the last technique that you can use is using semi-automated analysis so these are going to be tools for which you are going to provide some information for which you are going to have a human intervention to explain to the tool what you are looking for and this is a bit more difficult to use because it requires like this interaction for from you know from the user it's a technique we are going to see today with property-based testing with akina so what is property-based testing to understand how it works I have to introduce fuzzing so fuzzing is a standard program analysis technique that is used a lot in traditional security the idea is basically you provide a one-dom input to the program and you try to see what's what's going to happen you try to stress this with one-dom input the most trivial further that you can write you just go on your keyboard and you know you patch one bottom and you see what what's going to happen on your program again it's well established in traditional security we have a lot of tool AFI leap further go first and so on however most of the traditional further are going to look for memory corruption for crash in the program we don't have a lot of memory corruption on solidity there are some but they are not that common what we are going to try to look for is property of the system that can be broken and this is why we call it property-based testing basically the way it works is that the user is going to define invariance the further is going to explore one dummy the program and it's going to try to see if the invariant odd or not you can think of you of furthering really as like unit test on steroid where with unit test you try one specific value with a program while fuzzing is just going to try one only a lot of different value I've been talking a lot about invariance or what an invariant in invariant is something within your system that should always be true it's something that should never be false or that should never be not possible to check if it's actually holding so I have talked also about Echina so Echina is our further for smart contract it's a pen source we have been using it for like four or five years even now in all our audits you can see a list of mature code base actually using and have integrated Echina in their process for Echina we are focusing on the you know ease of use so the invariant are going to be described in solidity we have a github action similar to slitter and we support all the compilation framework if you use foundry, harddart, grony, truffle whatever we are going to support it because we are using it in every of our audit and every now and then someone comes with a new compilation framework okay I was talking about invariant so let's say you have a token you have a new C20 token it has you know a balance you can transfer token what would be an invariant an invariant could be that if you have a total supply no user in the system should have more tokens of the total supply right if you have 10 million of token if a user of 20 million something is wrong the way it works is that you're going to take the contract in solidity you're going to define invariants which are going to describe what you are trying to check one way to do that is to create functions that are called Echina underscore some name this function should work on a Boolean if the Boolean is true the property hold if the Boolean is false the property is broken you give both to Echina Echina is going to explore one domain the program and it's going to try to see if it holds okay and now it's a part for you so we're going to have a couple of theory of exercise where you're going to try to apply Echina and to define some invariant on the system you can go on this repo or you can scan the QR code for the repo check out the DEF CON branch and open the first exercise if you have any problem to install Echina if you have any question let us know we also welcome everyone who is not into the exercises just to see even even if you are ready know how to do how to do testing or even if you if you use fast in every day it's it's totally fine we will happy to take any question simple or more advanced about how Echina works so please feel free to to do it and we will take 15 minutes for this for the first one okay so the question was what are the benefit of using Echina other foundry so first I think Echina has more features and foundry at the moment they were developing foundry for like six months we have been using Echina in like four years we support any compilation framework so let's say you are using hard art because you want to do integration test and you need some complex setup using like type sweep or whatever if you move to foundry you're going to have issue because it's more difficult to you know create this type of test so you end up in a situation where you need to have a setup with two different compilation framework if you use some advanced options and you have to need to need to have like both advanced option in both compilation framework if they support it and it's a lot of you know like maintenance here we are like you know agnostic to the compilation framework in that sense we have we're going to talk about that later we also have like a couple of advanced features that the other further don't have for example something that you can do with Echina is that instead of trying to find invariant that are broken you can look for functions that consume most of the gas so you can list the further one and give you a summary of okay I can run this function with this parameter and it's going to output this amount of gas and if you are looking enough for this type of things it's really nice hey there Brock from foundry here I'm curious how do you go about benchmarking a fuzzer right so how do you because it's something like for us it's a black box and yeah okay that's a really good question and even in like traditional feathering you know like if you go like in the literature of how further our benchmark I would say that most of the benchmark are poor or one of the issues that when someone does a benchmark to you know benchmark their own tool is a bias right so we we have benchmark we have our own benchmark to try to see you know like in our past audit and everything how it works and everything but obviously we have a bias like it works well for us because we are you know building the tool on our example so I would say like the best place to have like a good benchmark for furthering should not come from to developer yeah yeah it's it's also an open question what is your benchmarking like if you're if you're if you're saying well this is faster than this other thing but it could be executing just always the same thing like you know calling a constant function over and over again it's going to be faster than calling some deep deep part in the in the call on top of that you have like bugs what about like finding bugs how much bugs you found and there is there are a couple of academic papers saying that some people like to compare this but you have like like a plot saying how many bugs you found and let's say that one fuzzer is better than the other but you don't know if the next hour you will have a peak saying well these found a lot of things so there are other things that you can do is you can use coverage but also coverage is not going to give you like it's not going to be the ultimate answer so it's still a debate how long you should run a fuzzer for a benchmark or even for you know testing something it's also debate what you what we should use for benchmarking should we use like complex defi applications like but how many of them we have like 10 or 20 we don't have thousands of different defi so it's it's we definitely are interested in a deeper discussion on how to have a good benchmark set for and we have the same problem for example with Lita where how do we benchmark that our static analyzer provide good results and it's tough we usually tends to have a practical approach in the sense that if the truth provide value during our audit if it helps us you know to find bugs and we make us faster that's good enough for us yeah and at the end of the day also depends on the invariance if the if the developers don't know how to write good invariance then no tool is going to provide some magic value so it's it's it's tough okay okay and I'm happy to discuss with with the foundry team or any other team doing doing fasting we will we'll be here today so please let us know okay my other question is regarding like the tool is only for pausing or it support like symbolic execution or something like that so yeah you want to yeah it's it's only for first thing we have another tool for symbolic execution which is called mantika however and something actually we're going to discuss later I think in practice any formal base method approach is going to have a lower return on investment at further if you have two weeks three weeks to work on a project you know and you want to invest some resources to increase your confidence in the project first thing is the best solution yes and also we found that so a kid nice a tool that works with our static analyzers leader that gets value so you have in some cases like let's say that you have a test that says if x equal to some some value some traditional faster techniques have hard time to deal with this but what we what we do is constant mining so we scan all your code look for these magic values and we replay these magic values and some mutation of that from time to time so or faster should be able to get inside the inside this if you if you if you have a test case that is not that is not working please let us know and we can try to to see it but in practice it seems like some of the typical use cases for symbolic execution in which you have constant magic values to to look for they can be replaced by a constant mining extraction how many of you have issued we install a kina to one like the exercise how many of you have issued like installing a kina or opening like the different exercise no you you you have to open your terminal and a kina will deploy the sorry you need to install install the tool and when you put like a kina you cannot test some contract they will compile it running inside the simulated blockchain and give you the answer so you don't need to connect into something uh so yeah go into them into the repository and it says like yeah so yeah this is like the original one but if you go into the depth con branch that is that is one point yeah sorry yeah so it's it's it has more specific yeah so so over there you said like if you're using magic you can do that that or you can download okay yeah okay yeah so wanted to highlight one one little feature that we are testing on a kina that is that is also using fuzzing but instead of testing a property where we're doing minimization or maximization of some value so this is a new thing that we are that we are testing it is not property based testing but it's it's something it's something that we are trying that we are trying to push so if you want to know if a user is it's capable of uh extracting tokens from your system without you to realize you can use that feature just saying uh a kina can you maximize this balance of this account so it will try to generate you the maximum uh sequence so it's it's it's a little bit outside this but uh it's something that we wanted to mention okay so our target here is a token um it has a transfer function like a classic transfer function in any rate from a possible contract which is like a basic possible system and what we want to try to do here is to create the invariance such as no user should have a balance above the total supply to test the token the way we are going to do it is that we are going to inherit the token or target we are going to create the contract test token we are going to initialize the balance of the color so of the first user um to uh 10 000 and this is initialization so you are creating a token there is 10 000 in one address and now the invariant is simply that's um no user so the user a kina color should not have more than 10 000 token again you deploy your token 10 000 token to one user this user should never have like 20 000 token all right and if you want this with a kina a kina is going to tell you that this invariant this property on total supply was broken it failed and it's going to tell you how and the answer is that it just called the function transfer with a destination address zero and 10 000 93 token so what happened here um this was compiled with solidity 0.7 so there is no overflow and underflow protection so there is an underflow problem here where if you try to send more more tokens that you have a new balance the balance is going to underflow and you know you have a really large um balance something which is interesting here is that we define the invariant you know without looking at the code without looking at the function we are not looking is there any issue in the transfer function we just define an invariant and by doing so we can realize that there is a bug in the transfer function um so this is a kind of a nice way of trying to find bugs because you don't look at the individual function necessarily you can just define invariant and the further is going to try to break the invariant for you does that make sense any question yeah okay so okay so the question is does it execute a specific function or how does it know which function to call the answer that is going to call everything so in this token if you look at like the world source code you have a transfer function possible function and like some additional function so the further is just going to call everything and everything external or public like everything that a user can call okay the question is if you have a very large token or very large contract you have a lot of function so here you can take different approach either you want a kina to call everything and you just do nothing and you let a kina run which might work no it depends on what it is if you know that some function are more important and you want to target you can change in the configuration option from a configuration file of a kina and tell him call only this function or don't call this function so it depends on what you are trying to look if you want to increase your confidence you should call everything if you think there might have an issue in a specific function and you want to focus on that you can you can blacklist or whitelist okay so the question is can you define the order of call you can define the order of the initialization but not after that I think there was another question no okay so the question is can you have like a better log because obviously this is like a simple example and when you do random you know exploration you might call a lot of functions that are not necessary for what you are trying to to call right and the answer is yes so a kina does what we call shrinking where once it found a way to break the invariant it's going to try to reduce the tries so it's going to continue to further more or less on the same you know iteration and trying to reduce like the size of the of the trace okay then we have the second exercise same instruction so on the same repo just call exercise two it's on the same target so you're going to try to have an invariant on the same token the first invariant was that no user should have a balance above total supply here as we kind of hinted before this is a posable system so it's a system where the owner can pause or unpause the system and what we want to verify the invariant we want to have is that if there is no owner and the system is pause can someone unpause the system and this is what we're going to try and yeah let's take 10 minutes for this one I'm going to show the solution for the second one so it's the same target that for the first exercise but here we are going to focus on the contract that were inherited by the token you have two token I have two contracts right ownership and posable and here you have a system where you have an owner and you can pause or resume the contract and what we want to check is that if we drop the ownership and we pause the system is it possible to unpause it so here we have a bit of initialization to do right because we want to drop the ownership and we want to pause the system we are doing this in the constructor so we are calling pause and owner from no one the system as the contract is deployed it's pause there is no owner the invariant isn't just if the variable that's not tracked the posable state of the system is true and this should hold right you pause no ownership it should be always paused and Akina tells us that it actually failed and the reason for that so this is kind of like an old bug that we are really frequent and common in all versions of solidity there was no constructor keywords and the way you were doing the constructor that you needed to have the function name which was a match with the contract name and here you have the contract ownership and a function owner and because of that the function owner is a public function and anyone can call it and become the owner this does not work anymore with more you know modernization of solidity but it's the type of bug that we are finding a bit more a bit too much in the past something which is interesting again is that you know we did not look at the ownership contract we did not look like at the implementation itself we just define an invariant and we let the one of as a further one and he found the invariant for us so now bring the question on how to define invariant okay okay so the question is is defining invariant part of the auditor you know work yes like we are using Akina in our audit and during our audit we are going to define invariant and something we are going to do that we are going to discuss with the developer because the developer know you know better than us what the system is supposed to do so we are going to have this collaboration with them to understand what the system is supposed to do and to define this invariant now to the question how to define invariants because you know like if you have bad invariants it doesn't matter what you are doing you know if you are using further if you are using like formal method if your invariant are not good you are just going to check for something that you know it doesn't matter the best approach to write invariants it's not to start with a tool it's not to start with you know writing down solidity invariant is to start with English open a file a markdown file or whatever you know format you like and write in English what the system is supposed to do start simple start with invariant that you know you know how to start with things that are not broken once you have five or ten simple invariant write them in solidity and once the further on top of them if the invariant are all holding then you can go back to thinking about the inventor of the system and you know go more deeper into into the environment themselves if something is broken then look if the invariant is incorrect or if there is an actual bug and iterate go over yeah in in our experience when we work with with clients when we ask them to do step one and define the invariants they are actually they they realize about bugs so it is already a good a very good thing start thinking into that even if you don't if you're not testing and okay okay so if i understand correctly the question is that can we connect this to mainnet okay okay okay how to use it without the contract so you can just in the constructor deploy the contract or something that we are not going to cover here but we have a tool which is called itino which is basically going to take your unit test take like your your suite and we play them in a kinder so for example if you have like a complex integration with like you know you are deploying in your unit test 10 different contracts you are deploying like a mock of your new swap or whatever you know you need and you can replay this in a kinder so that everything is going to be set up yes yes i i i think there is some a little bit something else there that you want to what if you want to define an invariant on a uniswap contract right that you that you are using that your contract is is is using so you will need to know how uniswap works in order to put it in your invariant like if i'm swapping something then i'm getting something else right and in that case you need to realize that it's difficult to write invariants with other people called right despite despite this is working and everyone is is using it but every time that you use a third party contract then you have you are importing some risk and you need to completely understand the other contract in order to know what is going to the effect in your own country so the risk case was we have a special uh i'm from gearbox protocol and we work on composable leverage and we have adapters because we just provide leverage for some other contracts so when you combine gearbox with uniswap you get immediately um margin trading and in this case we have adapters and these adapters incorrectly parse path to make check after uniswap however they integrated and make code to exist in uniswap and guy who were on immunify problem he write a small test and this test show us that if you really add some additional part of cold data it could be interpreted incorrectly so we have a two different ones our system could be full check not that balance we should be checked and in this case it was a fault of the system and the funds could be drained and in this case i think we could find some fuzzing testing to really provide any information but this test should work with uniswap because we behave in different way and we shouldn't cover that because we run with mox and mox of course was created with the same bug so mox was okay but real implementation total different and i definitely believe that fuzzing should found these mistakes yeah yeah definitely definitely when you are creating a mock you are assuming how everything works and if your assumption is not precise enough it will it you won't be able to detect something and we as an auditors it's common that we have two audit contracts that will interact with other contract let's say compound so when we go into compound we have all the documentation and says well compound work like this or like that when we look at the code and see some things that are not documented and we go back into the developers and look look if your if your contract is doing this or that it will revert and you're not testing for that so it's when you when you are using a third-party contract you are importing the risk so either either you have really good tests or you even make sure that you understand everything otherwise it will be difficult to catch the the bug but yeah i think this type of bug can be found with fuzzing like the most like the difficulty is going to find an initialization that makes sense and an invariant that makes sense that's why that's why you know we are kind of putting like the emphasis on to defining invariant because this is like the key component of this technique yeah in terms of speed is it okay to put all deployment script into constructor because of course if you deploy such a huge system and you deploy some contracts from external repositories like uniswap and so on it requires time and of course when you want to test million operation if you redeploy each time hold the system it could require hours or days i know yeah so so so when you use a kidnapped you deploy it's only once and then the when your test finish will go back to the states after the contract is deployed there's no need to redeploy it and that is why we have we we ask the developer to have fixed amount of parameters in the deployment right on the on the on the constructor otherwise we will know what what we should deploy yeah i think there is a question over there so um i'm just wondering as you uh once you've kind of defined your invariants um and i imagine you guys in your audits you you run through these and um basically i'm at trying to understand when you guys have confidence that yes the this is a good invariant uh from and and if there are any metrics that you guys use uh internally like i see it's outputting unique instructions and unique code hashes those sorts of things that give you confidence in in what you've done so in practice you know you can look at the coverage but usually coverage is not a good indicator and in fairness like you know when we were that we do this in a time box manner so we have two or three weeks to do it and we are going to do our best in two or three weeks um that's the best that we can do yeah it's it's it's it's tough we there's no there's no like a silver bullet for this uh it we when we do a report we list the invariants that we test so it's it's clear what what we tested and what and everything else was was not tested with tools so we would perhaps did manual review or use all the techniques likes later to check some some other things but yeah unfortunately there is no good way to to define this but perhaps i i personally think that talking with the developer early on the invariants it's a really good thing it's usually the case that we think an invariant let's say that some some value cannot be zero and we go into the into the client and says is this an invariant we don't know because we have not designed the system and they don't know and if they don't know that that's an issue right we should we should absolutely know what is the behavior of the system and and if we don't know if an invariant should should if something should be an environment or not then we should go back and re-discuss that and yeah security in general is not binary it's not you know yes or no it's really a matter of how much weight force you want to put into it and more with social put more confidence you will have thanks i have a question actually more related to the earlier question which is a big class of bugs that's been occurring recently and for a while now our re-entrancy bugs right how do you deal with finding um violations of invariants that correspond to external contracts in that way okay so this is a really good question and in my opinion the best tool to find re-entrancy is static analysis so the question is more to find which technique you should apply for which problem and for things like re-entrancy and static analysis is just going to be better you can use further you can create like re-entranced callback and things like that but in practice static analysis is just going to outperform any further at this work uh that's why like for any any class of vulnerability which is kind of a pattern based you can use static analysis and it's going to be better in my opinion and uh one more question what's your addition for example we have a complex system and we want to make a classical fuzzing with a kidney and it seems that to really cover many cases requires a lot of computational power so maybe can you advise some cloud provider or how to do to run it maybe for a week with very powerful computer to get something achievable because of course these pretty simple contracts could be found on memcbook but if we go a little bit further many contracts many setups maybe it requires more computational power so Koso do you want to talk about a Kina pad so the question was if you want to run a Kina on the cloud or on a lot of you know a large like system how can you do it yeah yeah so so the first thing that you should know is that uh it can if you have a very large contract it can take some amount of memory so first first thing get a good server with with a good amount of of memory and CPUs um so the second thing is uh every um we have a python companion tool called Kina parade that will run a Kina in any number of uh any number of uh uh concurrent um uh instances so you can run 10 at the time uh but we are not only going to run it 10 at the time but we are going to randomly shuffle parameters because in some cases there are some there are some issues that can be easily found with let's say three or 10 transactions and some other uh issues that are going to be more easily found with two 200 transactions in a in a in a row right so what we do is we run the tool in different with different random parameters and in different let's say generations so we run we run a Kina for an hour 10 times then we save the corpus and you can get you you can see all the all the all the code that was covered and then we start again but taking the the the output of the previous generation so you turn it over and over again so you can see how your code is explored right or if there's some part of the code that is not explored with 10 different uh instances you can go back and say no i need i need to change this because it doesn't depend on the on the actual execution so we can we can give you the link uh for that it's just a python tool so it's it's it's easy to use and yeah it's also open source like everything we are doing is open source okay so yeah like it's really about spending time and thinking about our invariance and start simple like if the first invariance that you are writing leads to a bugs there is something wrong about your approach uh you should not have like simple invariance that you know are going to work the system so we'll start simple and iterate over them okay to give you some example let's say you have an arithmetic library what invariant can you have um you can have commutative properties a plus b is equal to b plus a you can have identity a one multiplied by two should be uh true our inverse if you add something by its opposite it should be zero this is not always true right but depending of what you are building this might be like the type of property you are looking for for token we already talked about the first one no no user should have a balance above total supply let's say you want to look at the transfer function and let's think transfer function what does it do i'm transferring token to someone so at the end my balance should have decreased by the amount and the receiver should have see uh its balance increased by the amount and let's say you try to write something like that what you might quickly realize is that what happened is the destination is uh myself so if i transfer token to myself my balance is not going to increase or decrease quite hopefully so this is an example where you might try to define an invariant on transfer might seem simple you might write like the the thing in solidity and if you do that they cannot going to tell you that there is an edge case where if you transfer to yourself uh like the invariant is going to be to be broken and in this example if you go through this um it's not the code but which is bad it's an invariant that was bad so that's why having this iterative approach is really important because sometimes you are going to make assumptions about your system and you might actually be wrong and as the system gain and complexity it's more and more likely that it's going to be more difficult to refine the invariant something else which is also important to to consider is um returning false reverting for example an invariant you can have is that if you don't have enough funds the transfer function should either revert or return false depending on how you implemented the token once you have this list of invariants usually you can split invariant into two categories function level invariant system level invariant function level invariant are usually stateless there are things that you can just you know look at a specific function and try to see if it holds so arithmetic you know invariant are mentioned are stateless and are function level invariants here you can craft simple scenario just by calling the specific function then you have system level invariant system level invariant are usually more complex but they are also more powerful and here you are stateful you are going to change the state of the contract and you are going to try to see the invariant hold no matter of the state and here it's why it's important that it cannot call in all the different functions because it's actually what you want to try the balance being below equal to the total supply is an example of a system level invariant for function level invariant one thing that you can use is a different modern inkin that sort of calling akina underscore something we support a session so you can just create function put a session and try to see if it holds a simple for system level invariant as we kind of already like discussed it might be more complex depending on the initialization of your system if it's a simple initialization you might be able to do everything in the constructor if the constructor is too large for like the bytecode size or for whatever reason you might have to split it and here it's where you can use a key now it's in author okay yeah all right so let's let's see this particular piece of code let's take half a minute to read it it's it's basically a by function that will call an internal function valid by uh so what we are going to do is we're going to think uh what are the type of invariants that we can have here and what what is what will they are going to test and what type of guarantees we're going to get from this so let's let's take a few seconds for this yes so we have question okay so the question is uh testing timestamp dependence or clearly not not the case here for timestamp depending code uh i can now when it runs it automatically increase either the block number or the block timestamp inside some range right because it happens that some code will fail uh when the timestamp is increased into a really really large number but yeah a hundred years or like the end of the universe so we don't care if if the smart contract has a bug that can can only be triggered in the end of the universe will be the least of our problems um all right so yeah any any idea what are the type of things that we can test here yes the first one we can understand that how much token how many tokens we can get as a result follows our expectation when we're sending message value because as I can see it's a hard-coded rate is around 10 so basically it's a pretty simple formula we can change different value we put into the function we have totally prediction how much we can get and then we can try to verify that it works yes exactly so the the the property is related with the amount that we can get even the number of uh way that is sent so um yeah uh we can so the first thing is this this code will depend on the state uh we don't have the min function so we don't know what is what is inside however we have the valid by function that is actually abstracting the thing that we want to that we want to test so we will start with valid by which is a pure stateless function yes so we were thinking about invariants here related with the amount that we can get so this is this is a very simple without going into specific uh this is a very simple invariant if the amount if the wayside is zero then the user should receive no tokens at all right so so that's it's it's even simple than thinking how how much user should receive but it's a concrete case and it's it's it's kind of a corner case so it's it can be important to test all right so how we can test this so there are a couple of ways to test it this is um this is one so we we can write a function that will take uh one parameter so I can now put any any number there however we're going to restrict the number uh the input of this function to be non-zero uh then we're going to execute valid by uh and then we want to know if a kina can reach can reach the um the the statement after that because valid by will revert if if the if the inputs are not uh the one expected um so we want to know if we can get if we can get tokens despite sending no um no value right and so perhaps you're wondering what if what if uh this amount is zero uh then clearly this code will not do anything interesting or she was just is going to revert uh when you're writing tests you need to so you can put any any amount of requires or preconditions or people usually call it however if you put uh if the preconditions are too restrictive and your uh your function reverts most of the time it means like you're not going to get value from the execution so every execution reverted in a test in an invariant let's say that every every case that you don't uh that you don't use it's going to be uh an execute an execution that you waste so in this case you only waste one one execution in a range of in the full range of uh units 256 so it's not a big deal but if you have if you put a lot of requires that only a very small small amount of uh values will satisfy and it will be difficult to get randomly or even even with the with the techniques that we use you will need a slightly different approach but yeah we will we will um and go later into that so any any questions yes yeah so as you mentioned in this case that we are basically sacrificing only one one case which is when it's zero but is it going to run over the whole range of UN 256 because that's a really large range and and it doesn't make sense to test all of that in some cases yeah exactly so it won't it won't there there will there is no tool in the world that can run for all the all the range uh it is always either either uh symbolic I mean you can do it symbolically but it's it's not it's not going to test all the values it's another thing um and uh fasting techniques are going to sample let's say randomly from the from the input or with hubs right um it's in the case of a kid now since we are going to compile this code and it will run through or study analyzer we will detect some interesting values in this case um 10 for instance 10 10 is an interesting value there it's a it's a constant that is going to be used somewhere so definitely we want to we want to test with that constant okay so let's see what happens if we if you if you run so this is uh so again i will run a number of transactions it will eventually detect that a certain free token has an assertion failure however this is going to be in a context of a hundred of transactions random transactions that perhaps will do something that is completely unrelated but what we'll do is input minimization input minimization is a very old technique referred to testing in which you have a list of bytes that affects a bug you want to remove that bytes one by one or in some random way in order to get a list of bytes that will still trigger the bug but it's going to be minimal or or either a local or a global minimal depending on the type of tool that you're that yours so in this case uh we can uh a kid now will try to minimize any any uh parameter here we have only one parameter and the parameter is actually is actually going to be useful triggering the bug if we have more parameters they are going to be minimized towards zero so if you have units then it will be reduced until zero so zero is the simplest value this is arbitrary to find on the on the code you can you you can change it if you want and but in this case the parameter cannot be zero because if it's zero the test will pass right so in this case the minimal amount is one it's not guaranteed that you will always get the smallest uh set list of transactions to trigger uh this is an uh an mp complete problem it cannot be solved on on on linear time so it's it's all going to be always a sample but in in in practice uh even randomly sampling removing transactions or reducing the complexity of each uh value will will give you good answers all right so a little bit about in the apis maybe maybe we can just explain just explain why it's happening yeah so yeah so the issue here is that if you send one deserter token what's going to happen is that you do one divide by 10 and one divide by 10 is zero uh because you are winding down and other results um the required amount to be sent is zero so if you ask any number of token below 10 you are going to get them for free and this is okay again an example where we define an invariant we don't actually look at the formula we are not looking at how this formula works we just define an invariant that if you don't send eater you should receive not token and by doing that a kina can find you know how it makes an issue and we are actually easily using a kina for wanting for you know mistaken the formula and and and so yeah uh yeah yeah i guess that this function is for testing purposes but um in a real situation way sent is not part of the of the signature of the function right you read that from the message yeah yeah yeah yeah this could be like like deeper into the code right so how do you do in that case you read that into the asserting function how you do that so yeah so uh uh if i understand it correctly so this this could be an internal part and you can have like a lot of code that would like gets the value from them from the message value and then do something else i you can do that uh yeah i mean i mean is this depends on on your code here we are we are testing an internal function right using using some some defined yeah yeah and if it was using message that value uh a kina can also yeah yeah yeah exactly so uh yeah so uh properties can also take value so and if you have a constant in your code saying message value should be uh 42 42 42 it will use that constant eventually so you should be able to hit that particular that particular case uh given the fact that this is random something of course but can you uh moving back to the previous slide to the function what i'm really wondering because we talk about this error however tokens last 10 is uh so way less than it's so small amount and the same problem which could be here it's overflow if i provide a huge amount of ease because we have a multiplication to decimals it could be done on the other side we all of us know that the quantity of s is limited you can't even take a flesh loan and get more ease than it could use so what is the best practice follows this formal execution when you write fuzzing test or take some real examples as limitations as there is no much as the room as it exists at the moment and we know it's a deflation so we can simply assume that in the future nobody couldn't take so much to get our flow here yeah yeah exactly so yeah this is is an interesting question and it goes into the fact that what are your assumptions on the test right if you if your assumptions are like i have this token with this limited supply that should never go over something then you you can just say i require that the that the value send cannot be more than the total supply of something right but in the case of eater is a bit more tricky and and in fact in in in a kidney what we do is we have we have externally on accounts that are simulated we load it with eater every at every transaction because you can have a very large amount of eater you can take a flat loan you you probably cannot have as enough eater to overflow you into 256 because that will be a real issue for the avian in itself but um we can we can define in the kidney config which is the maximum amount of value that we send per transaction right so if you put like i don't care if the attacker has more than than 10 000 eater because that will mean like they can do other things then a kidney will happily uh take that limit and will never put something more however it's still the case that over several number of transactions the accumulate number of eater can go over that that value so you should you should be careful with and there is also an approach that you can take here so basically you are building an invariant and the invariant as a result has like a value so either you start where the invariant has really limited three third like really with train one and you try to see if it holds if it's waking with like you know like you with one eater okay it's already waking so you can continue like this if it's not breaking then you can increase the three third time from time to time or you can take the opposite approach you define an invariant where the three third is really large it's breaking because it's really large and then you decrease this result so either you start really limited and you know depending on the result you remove the limitation or you start without limitation and you reduce like up up to the point where you have kind of like a value for which you feel comfortable with yeah yeah and and that is also related to the fact that do we want to have false positives or false negative what is the risk if we start with very large values we we could have false positives but and if we start with very small values we can have a false negative but which are the ones that are going to cause you more trouble that is something to to think about it because if you miss one false positive your your protocol can be you know destroyed and if you miss one one false negative then it's could be okay right hey there in terms of fuzzing mutation do you do any clever things like say this function has a constant of 10 so would you then see the constants in the the function and use that as input yeah we will use the constant from the function and we will first I want the constant also like if you see 10 we are going to use 10 we are going to use 9 11 and you know like around yes there are some there are some techniques I can show you a little bit after a few lines of the code of a kind of that shows all the mutations we have mutation interesting mutations on the list of transactions in which we shuffle we do we do like splice as well so we take a list a list of interest transaction and another one and splice it in a random position so there are a couple of fun things to to look at but yeah I think we should move on a little bit all right so as I was saying we have this failing even if you don't understand what was what is the failure well that is that is a different there is a different beast sometimes you you you have you put your your your invariant and your invariant fails and then you start the journey to understand why it fits so that's we are not going to talk about you uh some people like to re uh rerun rerun the failure into a unit test to make sure step by step what is what is going on but yeah that's that's a completely different type of beast that is related with what happened after and how we can fix how we can fix the issue all right so a little bit about akina apis this is a topic that it's it's it's still an open debate in some in some cases so what are the best way to test to create properties so akina supports a couple different ones it's about boolean properties in which a function is executed um and then it will shoot it should return a boolean to a false and if the function reverts for any reason that is the same as returning false right so if we go back a little bit we can see over there error and recognize opcode that is going that is related with the assertion failure this that is how um uh all version of soliti used to have um these uh assertion failures but if you use boolean it will just say return false right so you know exactly how how the property failed uh or it could be a revert so you you say over there you see over there it reverts okay so either you do boolean properties which are the classic way to define invariance and these come from uh uh some some very old techniques in particular quick check which is which is a uh property-based testing tool for haskell and a couple of other languages which was an inspiration for for for this then you have assertion failure so every time you see uh uh every time uh the the assertion is called with false that is uh that will fail however if in the context of your function you see a revert that will not make the function the property fail here we can see that if valid by reverts then the this again will not report that because we are using assertion mode right so you you should be careful if you are using if you if you care about reverts uh you should have to uh either use the um the type of um um the boolean type or what you can do is if you care about reverts in valid boy and if it's an external function you can do a try and catch and you can check which type of reverts and you can even fail in in some type of reverts and not in other because you want the user to get a good message of of reverts and perhaps other type of reverts you don't you you want to know so and finally we have the DAP and Foundry API in which um you will uh you will have um a function that if it reverts uh it's going to it's going to fail and otherwise it's not and I hope the Foundry team agree with us and it's uh this is currently implemented um all right so there is testing modes in the in the um in our repository so we can um uh you can go there and it's it's playing a little bit more this is very high level overview so yeah yeah yeah so so so yeah uh I kidnapped something return bull boolean uh it's easy to define uh no side effects that that that that is so interesting when you use boolean properties all the side effects will be reverted uh before the execution of the actual invariant right but if you're using assert the side effects so everything you change in the blockchain will will remain so this can be really useful for testing some some complex code but yeah we're not going to um yeah assertion is can be simple to to um uh define um and it's it's it will also um it will also be easy to see on the code coverage if it's not if it's not covered or or not however some code especially some old piece of Solidity code it's going to have assertion as required and that is a really bad thing you should not be doing it use require if you want to if you want to actually um uh have a precondition in your code and use assertion for for testing okay um yeah and finally we have the the Foundry and app uh compatibility the only thing well uh the thing that we don't support is pranks so we don't like to prank people so we don't we don't support pranks however we support some of the some of the um uh HAVM yeah the HAVM the original ones uh cheat codes you should be careful using it uh we uh we know that there are some uh catch with that especially related uh with what the Solidity compiler expects and what you are doing in your in your transaction so please be careful because you could have some um um some some issues so we in we rarely use cheat codes uh we try to keep all our code as close as the Solidity possible so you can easily port it to another tool there's there's little again the specific uh but yeah we are also open to discussion if if the community agrees that we need a specific cheat code or we need to avoid some specific cheat code then we are open to discussion all right um so exercise four we're going to deal with um um one of the um one of the um uh exercise for uh damn vulnerable the fight so how many of you know this uh um amazing uh ctf um ah yeah sorry you you yeah we can skip it yeah yeah we can yeah we can skip it this is um this exercise was exactly the same as the first one but instead of using function uh akina and we were using assertion it's exactly the same invariant exactly the same setup but with a different ip just as an example yeah so we will go into a more interesting example um but before that um there is uh something that you will need which is called the multi abi mode um so we usually usually testing tools take a specific contract as your main contract to interact so in in in the default mode akina will only target a specific contract that you can put in on on your command line or if you have only one contract it will use the first one right but there is something called multi abi that would call every every contract that is deployed after the constructor that you have abi right so if you deploy something in bike in bike or directly and you don't have abi akina won't be able to call it because he doesn't know what is what is there but if you if you deploy a couple of tokens and several contracts and you use multi abi akina will call any function in any deploy contract uh after after the the end of the constructor so we will need these in order to deal with the next example because sometimes the the the the back that you want to detect it doesn't depend on the state of one contract it depends on the state of many contracts and in that case you can you can be surprised by the fact that changing the state in another contract can break your your property and definitely want to avoid that okay so uh again how many of you know about them vulnerable defy okay a good number and did you actually well this these are the first exercise the first the first two uh so i hope that uh uh people know so if you know this this how to solve it you should uh it's going to be even easier uh for you what we're going to do is we're going to take a look of this um sample so yeah i assume that we already you already have the code so um is the exercise the the naive receiver one um so what we want to do is we want to be able to drain um the the funds in flash loan receiver right just to give like a bit of description of the of the challenge um here you have two contract you have the naive um receiver lander pool which basically yeah yeah yeah it's it's over there um i can argue you to take a flash run for a fee and you have a second contract uh which is a user uh contract that is going to interact with pool and the contract is going to be the user contract is going to be deployed with some funds inside and the goal is going to try to see if it's possible for this specific contract from the user to be drained exactly so what uh what we want you to do is um uh we we want you to review the the um the exercise if you have already did it it it required you to send some some some some number of transactions so in this case we are going to prepare everything for a kid not to rediscover this without telling it what is what uh how how it can be uh uh solved so we will need two things we will need to initialize the um um the code to have um to match what the what the initialization is it's um is actually showing us let's see okay so this is the flash loan uh the part of the other flash loan so the the interesting part of all this is we don't have to care about specific details in the code we want to we want to um give a kid now the same scenario that we have on the on the actual uh challenge and we want to know if it can actually find a way to drain the contract so yeah we can uh we can see uh how the receiver function um works here but um yeah the the interesting part perhaps is the initialization what we are going to do is we are going to deploy in the constructor of our um test we're going to deploy the the contracts that we have here and prepare everything and then we're going to use a suitable um uh invariant which should be really simple you don't have to uh overthink and we want you to uh run a kid now uh with that in order to see if it can drain the contract uh with that of course with the suitable um uh with the suitable invariant so yeah we will take um 10 minutes i think yeah we are running out of time but uh let's take 10 minutes so the first step is really like this is a initialization from the test case of the contract the first step is really just to reply this into a solidity constructor and then to avoid the invariant yeah so happy to take questions or any technical issues let's let's go actually we are going to do exactly what is going here but in solidity yes there there is an alternative way to do it but the solidity approach is easier to to avoid because the other tool is called itino that can replay this in a simulated blockchain like an hash and then send it to the to the tool so uh in that case you don't need to replay it on solidity manually but you are still are still know still need to know where is everything deployed right so if you want to interact with the pool you need to know where is where is which is the address of the pool and you can you can then uh uh i can i cannot actually call the poll functions uh automatically because it has everything but in this case we're going to go into a simple route that is going to uh take us to uh rewrite it's just a couple of and this kind of much what we are doing during the day we might look at the existing unit tests we understand initialization and we might translate them in solidity if needed or to have them credit the invariant so yeah the the previous exercises are on the ec side this is a bit more difficult but still shouldn't require more than a few lines um so yeah but please let us know if you if you need some some hints there are some hints in the in one of the one of the branches in the hint branch but yeah hey there uh hello yeah sorry no worries um is there any thought or uh current support for mainnet forking or um state forking of some sort not yet yes so uh not yet havm has support for that but that will that will need us to put um that requires to do like input output on on transaction so we need to check if that will impact on the actual um on the actual um performance on the code um so yeah that will um yeah um i have kind of high level question so i'm i'm trying to think about what the limitations are in terms of expressing properties as invariance right so for example let's say we have a temporal property that we want to express like we have a wallet contract and we want to be able to say that any user who deposits their money is eventually able to withdraw it is that just like a fundamental limitation of a kid now or is there a way to so if you if you can so the notion of eventually it's you you need to have some definition on the on the blockchain right so eventually cannot mean like in 100 years right so if you if you define say i will only allow increment of time up to some limits yes you can you can call it as a bounded version of that property but you cannot you cannot use it as a theoretical thing like you know i have a state which i don't know what it is and then i will transition to another state which i don't know what it is and in that case you will need to know if the original state was actually possible and and things like that so it cannot work on the on a concrete on on concrete states so you always have concrete state and you need to put like boundaries on the things that you that you can do so if you say a user should eventually receive this amount and eventually means like in a number transaction in a number of blocks or timestamp then yes otherwise it's it's more like a theoretical proof that you're doing and you probably need another type of tool yeah i guess i'm curious even if you took like the bounded case because that's fair that you're dealing with like concrete traces here it seems like that's not or it's not intuitive to me anyway how you would express that it's like one of these invariants so you will you will do something like this you will put a function that says uh and and you will need a uh a state to track down like uh you you do a deposit and you will need a user to eventually receive something right so you will need to have a state the tracks deposit like a mapping right and you have a function that is your invariant so if um and then you receive the address of a user and you check in the mapping uh if if the time between the last deposit is in this range then you were going to check something and if if the time in this in this other range you're going to check something else so it will be like randomly checking in transactions um and with that you will be able to cover uh given the fact that you are going to generate enough transactions and enough time time right so it seems like the answer is it's capable of doing it but it requires some like kind of manual adjustment of the code almost to add state that so yeah you will you so if the if the property that you are testing requires to add states yes you will need to add whatever state is needed it won't be able to track state outside right the only thing that will be tracked outside is the increment between between the uh uh different blocks for instance so that that will be tracked outside and you again I will show you from in between this transaction and this transaction I have 10 blocks that will be tracked uh outside but everything else if you need to do a mapping between users and and the time between certain operation you will need to keep that in a different in a in in solid in a in a different biode I see great thank you all right so we have some other questions hi um is there any feature that allow me to guide the fuser or the domain of values that I want to try um so in order to in order to guide the the faster into a particular state the easiest way to do it is to add um is is to add a small piece of code that will be auxiliary code in order to move the state from your contract into something else like for instance if you have if you have a contract the the protocol that requires a deposit with a particular uh property let's say that uh you have three parameters and you need a deposit that has that has these three parameters in in in the same in the same number or or in numbers in which in which are difficult to find you can add this piece of code but it's important to let the kidnapped to explore freely at the same time that you are adding information so it's it's actually important to know that as an auditor or as a developer you are adding information into into not just using it as a black box right so every state transition that is non-trivial to find or is really really important you you can add it and make sure that it can eventually execute it because it's going to be another another transaction to to execute but at the same time you want to allow the tool to explore things that you don't expect because if you just restrict says I only expect users to do these type of deposits then you can you can be surprised later because there there is a way to break your your property using things that you don't expect great thanks and it comes down to the the previous question where either you start with a lot of requirement and a lot of you know restriction in what you are trying to explore and if it holds you are going to remove some of the restriction or you start the opposite direction you don't have any restriction and you know you go more and more restricted okay so did did anyone manage to at least still still start to create the constructor or or even run it to have some some invariant or even think about the the invariant that you need so in a couple of minutes we'll go over the conclusions and we'll show the the solution but yeah happy to take any additional questions just from an implementation side i'm curious what what actual like virtual machine be used to deploy and execute contracts so we we use habm uh which is the virtual machine written in haskell i know that it's in the process of uh improving and rewriting so it was more from the daft tool into the ethereum repository so we are eager to test new features but yeah um if you use the itino companion tool to deploy a contract it's going to use canash um and it will serialize into a json file and you can then load it into into that we're going to into the solution okay yeah we're going to quickly go over the solution so we can have a couple of minutes for for this so the solution requires first to deploy have a contract with enough amount of ether to to match what is deployed um and then um you can see here uh in we we deployed all the contracts that we need and send um the amount of ether that every contract needs and then what is uh what we're going to use we're going to use a simple very simple property that is going to say that the balance of the receiver is at least 10 ether so we don't actually need to follow exactly what the exercise says about draining completely the contract if we have one transaction that allows you to reduce the balance of the receiver then then then it's then something is wrong and uh definitely uh it will it will eventually be drained so so quickly if you run it you will see something like this um in which well the uh the flush on has a has a parameter that is the the uh the boiler where and we can control this to in order to um reduce the amount of balance all right uh so yeah you want to go into the okay um yeah we had a second exercise on the vulnerable defy but uh we're running out of time the idea was the same we are defining an initialization which was a bit more complex and if you go through this exercise with something a bit more specific is that there is a callback from the contract to the caller so in the kinder test you need to have like also the callback to implement the the flush load but yeah okay um yeah so this is something we kind of touched on a bit during our discussion what about the other tools so there are a couple of other further out there there is dub brony frondery at least this one our open source this file might be a bit better for simple tests and for like you know ease of use for like the first invariant because they are integrated within the compilation framework but in the long run they might not be as powerful as a kinder simply because we have you know use a kinder for a couple of years and we have tune it in a way that provides the best value that we can do it's okay it's finished okay sorry um yeah okay so i hope you enjoyed the workshop we have more exercise in building secure contract something that we would recommend for you is to try to write invariant you know in your next project and actually who is going to try it now on its next project nice thank you and thank you so today we're going to talk about storage proofs i want to introduce this uh yes uh i'm going to present you storage proofs and explain why they're cool how to work with them why you need tooling to work with them and yeah a bunch of other things why is it even possible all the complexities behind the the tradeoffs and so on so a few words about storage proofs why i really believe that they are cool especially uh nowadays so my thesis is that Ethereum is pretty sharded nowadays and with storage proofs we can essentially read the state in a almost synchronous manner which is a pretty pretty nice thing to do given the circumstances um yeah and maybe also let me explain why is it even possible so storage proof is essentially this idea that the entire state is committed in a cryptographic manner using some data structure like miracle trees miracle Patricia trees and so on and yeah we can essentially verify any specific piece of state at any point in time on any domain which is pretty nice and does introduce additional trust assumptions you just rely on the security of like the base chain um so yeah that's like storage proofs tldr where they are cool now a bit of like sponsored section of sponsored section so what we're doing at Hervaldotus so our like goal is to make smart contracts self-aware in a way uh by providing access to historical state uh we like i said my thesis is that Ethereum is pretty sharded nowadays we want to unshard it by using storage proofs and we want to enable synchronous data which because today we do not have really nice ways to make synchronous data access without introducing new state uh new trust assumptions so yeah that's what we do and how we achieve that we achieve that by using obviously storage proofs we use snarx starix and npc uh i will get why we even use all this tooling but first a few words about storage proofs what these are and and so on it's so tricky actually i i need to be multitask okay so uh what we're gonna cover in today's workshop so all the basics required to like understand properly this primitive how to like work with it how you can generate this proofs why they're pretty useful and how actually you can access these commitments i'll get later what we call the commitment in a trust this manner and how we make smart contracts self-aware and enable historical data reads cool so it's pretty uh that's pretty tricky so uh about the background that i want you to have for this workshop so we're gonna like start from the biggest basics so what is a hashing function just a very quick reminder i hope it'll take less than a minute uh like generalized blockchain anatomy how an Ethereum header looks like why Ethereum are not like pretty only like Ethereum focused uh however think that for the sake of this workshop it's the best to like present on this concrete example miracle trees explain me like on five i will just quickly explain the idea how it works and what is a miracle Patricia tree without really going to much into the details um yeah finally no not finally uh the anatomy of the Ethereum state it's pretty important to like deal with this uh with this primitive and finally how to deal with the storage layout cool so hashing function essentially it's this idea essentially it's this idea that i can have a function that takes some input of any size and it always always return an output of a fixed size and now what's also important there is no input there are no two inputs that will generate the same output and you cannot reverse the hashing function so it means that given the output you don't know what is the input and this is that what we call like collision resistance pretty useful primitive like using blockchains uh i will and i think that's that pretty much that i assume that everyone is like familiar with like yeah okay why is it important um so generalize blockchain anatomy so why we call it a chain because we have a bunch of blocks mined together like linked together because each block contains the reference to the parent hash and the previous header contains the reference of the parent hash which is pretty cool and let me remind what the hash the parent hash or the block hash of uh on Ethereum is it's essentially the hash of the header uh pretty important to deal with these primitives and makes more contracts self-overview so accessing six oracle state um so just keep that in mind let's get to the next part so um no i think i'm missing one slide no it's the correct one okay so uh this is an Ethereum block header uh as i said we're gonna go through the example of Ethereum concretely so a bit of anatomy so to access state obviously we need the state route what is the state route is the root of the merkle patricia tree of the Ethereum state we also have the transactions route which is pretty useful if you want to access historical transactions like their entire body um and receipt route so it's useful to access any events logs and and so on and all of these are like root of the merkle patricia tree a merkle patricia tree a merkle tree just think of it in that in that way and most importantly we have the parent hash and with the parent hash we can in a way go go backwards i think that's it let's get to merkle tree so essentially it's this idea that i can take whatever amount of data and i can commit it in a cryptographic manner by using this data structure so on the left side we see a standard merkle tree so essentially all the data goes to the bottom and we essentially hash it you know what the hashing function is now we combine these two hashes together we hash it and they keep doing that till we get to essentially one hash and this is what we call the route merkle patricia tree modified merkle patricia to be exact the data structure that we use in ethereum um what you see here i hope you see on the top we have the state route and essentially the state route is the root of this tree and now how it works and how you should think of this of this it's a pretty complex data structure i don't want you to bother with it today but essentially we have three types of like notes we have leaf nodes extension nodes and branch nodes so leaf nodes contain data branch nodes contain data and extension nodes like on the high level just help us to like sort of navigating that tree but to be honest to deal with the truth you don't really need to understand this part but to like build on the low level as we do obviously we need to we need to deal pretty a lot with that with that part okay so ethereum state how is it constructed most important takeaway it's a two level structure so i mentioned that the state route is a commitment of the entire state but it's not really true because ethereum is okay it works it's it's account based um and essentially the state route is the commitment of all the accounts that exist on ethereum and what an account is made of it's made of a balance like the if balance it's a non-transaction counter storage route the storage route is like the root of another merco patricia tree and this merco patricia tree contains a key value database that holds like the mapping from storage key to its actual value and finally we have the code hash it's essentially the hash of the bytecode so main takeaway first we access accounts and once we have the account storage route we can access its it's it's okay cool so to sum it up like the background so main takeaways given the block state route you can recreate any any state for this specific block on this network and given an initial trusted block hash you can essentially recreate all the previous headers which is pretty pretty cool and important to get the ideas that i will explain like pretty soon okay so this is going to be a workshop it's a short one so i won't let you code but i will show you some concrete examples so what i want you to like go through with me today is how we can prove the ownership of a lens profile on another chain so a bit of background lens profiles are represented as nfts and lens is deployed on polygon i think how do we get to this so first of all the question that we need to answer to our self is how does polygon commit to tml one because if we want to like let's say prove the ownership of a lens profile on optimism we need to know the state route of polygon but there is the tml one in the middle so how do we actually access this on the tml one primarily so uh polygon is a commit made commit chain and it commits to two ethereum uh a bunch of things every some amount of time and essentially on a one we do not validate the entire state transition but we just verify the consensus of polygon and there's checkpoints how they call it essentially contain uh state routes and so i mean not directly but we can access them and let's get to this to this part so this is taken from polygons documentation and this is how a checkpoint looks like so as you can see the checkpoint is made of a proposer so who proposed the block start block and block let give me a second i will get to this and most importantly we have the root hush so the root hush is essentially a merkle tree not a merkle patricia tree that contains all the headers and which headers the headers in the range of start block and end block cool so now if we get back to the previous part we can essentially prove with this commitment that we know the valid state route of polygon first even block okay a bit of hands on so we want to prove that i own a lance profile on polygon whatever so number one we go to the contracts we see a contract we go through it and we see that essentially there is a bunch of logic on top of this ERC 721 this is like the basic ERC 71 as you can see it's an abstract contract and it's slightly modified instead of having like a standard mapping from like token id to its owner we have like token id to token data token data is the struct this struct is 32 bytes in total 20 bytes is the actual owner and the remaining 12 bytes represent when the token was minted okay but how do i actually prove it oh and also very important thing when dealing with storage layout we have something that is called like slot indices so each variable has a given slot like in the some sort of meta layout i call it like that it's probably the right way anyways this mapping it has like the slot index two i will get to this part in a second why it's two and we have a mapping from token id so you need to 32 bytes of data represents represents this struct just think of it as some bytes okay so uh i guess most of you use hard hot so i'm gonna present on on hard hot there is a very very cool tool to deal with storage layouts it's called obviously hard hot storage layout this is how you install it it's literally yarn install hard hot storage layout you add one comment to your hard hot config you write a new script that contains literally eight lines of code you run the script and you get this weird table i won't does it what does it really tell you and oh by the way why this tool is pretty useful as you see this contract is abstract so some other contracts in can does it do yeah some contrast can inherit from it and obviously why we inherit the storage layout i mean this does this in synthesis can can get more trickier because it also okay so that's it's pretty hard to coordinate like one hand with another hand even though i'm Italian okay anyways um yeah we know this slot in this index and that's that's how we get it we have a column that is called storage slot and as you see underscore token data is marked as two and that's it okay but what do we do with it how do we get this storage key and yeah that's let's let me check the time okay um so a bit of hands on how do we get the actual storage it sounds scary and it's meant to be scary so we know the slot index the storage index i want to prove that it's like 0x35 and and owns with id 3594 how do we get the storage key we essentially do this operation so we concatenate the slot i mean the key in the mapping which is 3594 because this is the token id as you know we have a mapping from token id to token data token data contains the okay so we concatenate this with the storage uh index we hash it all together this is the storage key that we have uh if you're interested how to deal with it for like more complex mappings and like layouts like the solidity documentation it's explained pretty well so now let's to make sure we got the proper storage key let's just check it how we can check it super easy but just make a one-eater pc call to get this storage at some specific key is the if get storage at so the parameters we want to access the storage of what of the lens hub lens hub is the contract that essentially is the representation of these profiles and its address is 0x ddd for and so on and the slots oh is it better oh it's much better and the slot the the storage key is 0x1 so essentially that's the hash that we got and the result is 0x000 and we know that it's 32 bytes of data where we have 20 and 12 so let's split it into 12 and 20 bytes and what we have is some number like you can see 0x a lot of zeros than 62 till d and this looks like a small number so apparently it is a timestamp and the second part is like 35 57 and it's literally our address so we got it correct we have the proper storage key cool but how do we actually get to storage proofs so there are standardized methods in like the json rpc standard for Ethereum clients and this method is called eth get proof which essentially given the contract address so better call it account address in this specific case allows us to generate the state proof and the last argument I mean the sorry the second argument is an array that contains all this storage storage keys if we want to prove there is another argument which is 0x1a it's essentially the block number for which we prove the state yeah let's call this method oh by the way you might have a question how do we deal with this method on non EVM chains because for example on some specific rollups this method is like not supported actually it's not a big deal because if you think of it we just need the database and on top of this database we can literally build this this method we just need to know how the storage is constructed okay this is the proof it looks scary it is scary this entire object is four kilobytes of data and now I mentioned before that the state is like a two level structure first we have a proof for the account itself and then we have the proof for the storage I mean for the actual storage slot it is scary it's meant to be scary one proof is like more or less 600 bytes 700 bytes it really depends like bigger the storage is than bigger the proof is and also more accounts we have than bigger the account proof is so that's a lot of cold data if if you can imagine you can imagine uh and yeah that's that's pretty bad why because we need to post this proof on the chain so it's a lot of cold data but okay let's let's try what's going to be the cost on like an EVM chain that's the cost it's like 600k of gas that's a lot that kills almost every single application that you want to build on top of this nice primitive so it's pretty bad and why is it that bad so I explained on the high level what merkle trees are and merko patricia trees are only if you're going to use merko patricia trees and essentially there is a trade off that when using merko patricia trees the proof is slightly bigger it's like harder to decode it because actually we need to do some a bit of decoding there but we need to do less hashing so this is a trade off but depending where we actually verify this proof might be more feasible to verify like a proof that is based on merko patricia trees or merko trees okay but there is a solution and the solution is what if we snarkify such such a proof and we verify this proof inside the snark why is it cool because we can like let's say that i'm gonna verify this proof inside the graph graph 16th circuit um and yeah the verification cost more or less like 210 k gas the proof is like way less than 600 bytes so it's good so essentially get rid of the cold data because the proof itself can be the private input to the circuit um yeah we can like use multiple proving system depending on the on the actual use case and now why is it like very very cool so first of all it removes cold data second of all it allows us to deal with very unfriendly hashing functions or the evm these the ones that we don't have pre-compiled for like let's say Peterson um so it might be like super expensive to verify such a proof on the evm because first of all that's a lot of cold data and the hashing function is pretty like unfriendly but what if we can like do it inside the snark and just verify snark and yeah so another benefits this really really helps in obstructing the way how we verify this proofs because you don't need to have like one generalized verifier for each type of of proof but you can essentially obstruct it behind behind behind the snark which is which is great these numbers were taken from a very nice article written by a 16th like a bunch of a few a few months ago um yeah and i think that's pretty much it let's get to the next slide so synchronous cross-layer state access so how can actually a control deployed on some layer access the state of another l2 or l1 so i mentioned that we always need the state root but because all of these systems have a native messaging system we can send the small commitments like for example the block hash to like l1 usually it goes all through l1 and and yeah we can like unroll it or send the state through directly and also we don't need to rely on messaging but we can for example rely on the fact that polygon is like a commit chain and all these problems like commit from time to time they're like batches and and so on so this is like pretty important and we sort of can get the commitment from which we'll recreate the state directly on on the one and then send it to another so if let's say polygon commits on l1 i can send this commit from then to start and then start to do the actual verification cool so now how do we actually do that so let's break the entire flow into like smallest pieces so the flow is the following we need to have access to the commitment which is either a block hash or a state root and again we can get it or either by sending a message relying on the fact that is this chain commit so in a sense it's still a message we can relate in an optimistic manner or we can go even more crazy and verify the entire consensus okay so this is step number one we need to get the commitment step number two we need to somehow access the state root so the commitments of the state from like a previous block or the actual block because keep in mind that these commitments are only block hashes and with block hashes we can recreate headers but we cannot access the state okay so once we have the state root we obviously need to verify this state slash storage proofs okay and there are multi to do that all of them come with some tradeoffs and let's go through all these approaches so approach number one messaging so I can send the message from let's say optimism to e-terminal one I can get the opcode I can get the block hash by just calling the proper opcode and and I get it take some time but still I get it this is approach number one so we rely on the built-in messaging system which is I think fair because the security of it is equal to the security of the roll-up and if you're deploying an application of this roll-up it's a fair assumption to do so yeah it doesn't oh the now about the downsides so the message must be delivered so it introduces a significant delay especially when dealing with the weave draw up period in the in the middle and it requires we it requires interacting with multiple layers so first you need to send the message and then actually you need to consume it so it's it's not ideal but the trust assumptions are pretty occasion another approach consensus validation by the way this like gremlin is supposed to verify a bunch of BLS signatures I hope it's self-explanatory okay so maybe a few a bit of an intro right now we have POS as the native like consensus algorithm which is pretty great because verifying the consensus is finally doable because before like verifying the hashing function eth hash which was used for approval work was very memory intense so not possible to do inside the sark on chain directly so it was almost impossible to do so so now we also have this fortress rule called LMD goes which is implementable but doing all of this like directly is pretty expensive so we need to ideally wrap inside the snark but there is another downside so a few words about the trust assumptions you well you verify the consensus directly so it's it's fine you do introduce any trust assumptions not really but the biggest downside that generating the proof actually takes some time so to be honest this approach is feasible but comparing to messaging like quite often is like almost the same and you pay a lot of improving time and requires like having more advanced infrastructure okay last approach that we actually use is something that we call like an optimistic ruler based on npc npc stands for a multi-party computation maybe before I explain how it works let me explain the the image I hope it's self explanatory so it's an npc protocol we have multiple parties there's multiple parties attest something then we have an observer that can challenge it and then we have finally the commitment given to a specific chain in this case start net once everything is fine how does it work so we have a set of trusted relayers validators however and they attest that a specific commitment is valid so how does it work if we want to get the commitment aka the block hash of block number x on start net then instead of sending a message that would be delayed with a like slightly delayed we can essentially make an off-chain call just get the latest one essentially relay this message directly to start net but it comes with a few downsides because while we introduce some trust assumptions but still it's okay okay how does it work so it works in a way that we have a bunch of off-chain actors who essentially make these calls and it works more or less like a multi-seq but the reason why we have npc is because more validators you have then obviously more securities but more validators you have in a like standard multi-seq approach you have more signatures so more in a way decentralized it is then it's more expensive to verify because you need to verify multiple signatures and you need to like pause the signatures it's a lot of call data such approach is not feasible on chains where call data is expensive so at one optimistic roll-ups and yeah okay so how does it work what is the actually npc part doing the npc part is very simple it's essentially signing over like a specific curve some specific payload and the payload is the commitment itself and that's it okay so this is how we actually attest but now how why this approach is called optimistic and why it's still secure so first of all we just posted some something on the actual l2 and as you may know we can send messages from l1 to l2 and such a message can contain like the proper commitment so essentially even if the validator set will lie l1 will never lie so you can just challenge such a message and now to participate in verifying this validators it's super easy because literally two rpc calls one call is gonna check the actual commitment on the actual chain and the other one checks like what is the claimed commitment if you disagree you just send the message it costs roughly 60 k of gas and that's it everyone can do that um and again the fraud proving window is pretty short because it's essentially how long it will take to generate like the proof of consensus if it's possible or how long does it take to deliver the message and what is pretty cool in this approach it's not the gas intensive and we verify just one signature so that's about this approach let's make a recap and let's identify the tradeoffs so we have three approaches the first one is messaging the second one is validating the consensus and the third one is having this optimistic layer so i categorize it in four categories the first one is latency the second one is the gas cost the third one is trust and the last one is what is the off-chain computation overhead why do i even enlisted because if we do some sort of proving then obviously it takes time because we need to generate the proof so messaging in terms of latency we are quite sad because well the message needs to get delivered so once the message gets delivered to some specific l2 l1 will be able to generate already new blocks so we don't have like access to the newest values in terms of gas cost it's not bad but it's not perfect because we need to interact with two chains at the same time so first we need to send the message and consume it in terms of trust we are pretty happy because we trust the roll up itself and it's a fair assumption off-chain computation overhead we're very happy because there is no computation to do off-chain verify the consensus so in terms of latency we are sad because we need to generate the proof that we done it it takes a bit of time in terms of gas cost we are I would say sad because we need to verify the actual zk proof which is way more expensive than just consuming a message or verifying a signature in terms of trust we are happy because we verify the consensus itself and computation overhead it's significant right because we need to generate the proof final approach this optimistic layer so in terms of latency we are happy because we simply make a claim and we post it on the other chain that's it gas cost we're very happy because well we just verify a signature in terms of trust well we are not that happy but also not that sad at the same time because it's still can be challenged in an optimistic manner using a fraud proof computation off-chain computation overhead we're pretty happy because we participate like an mpc protocol so essentially the overhead comes mostly from communication not computation itself cool so this is part number one these are the three approaches obviously I'm not going to say which one is the best because all of them come with some tradeoffs okay accessing the headers I hope it's self-explanatory because we literally unroll something from the trusted input and the trusted input is again a block hash for a specific block x and if you follow the initial slides it's essentially each block we given a block as you can recreate the block header and knowing the block header we can access the parent hash and by knowing the parent hash you can recreate the previous block header so essentially go till the genesis block so given this very small input we can essentially unroll the state or whatever was present on the chain from this block till the genesis block okay so as I said I'm going to explain everything on on the example of ethereum and today all the block headers together are like roughly seven gigabytes of data so it's quite a lot but okay this is how we actually do that this is the high-level concept and what are the approaches so the first one we call it like untrained accumulation it's essentially we do this procedure this computation directly on the chain so we provide all this properly encoded block headers inside the call data and the block hash that we might receive as like the trusted input by sending a message relaying it in an optimistic manner or validating the consensus and yeah like recursively go through all these headers and verify them but there are many many downsides because first of all it's very cold data intensive it's very computational intensive and now we can store all these headers on the actual chain but you know even storing on an l2 storing seven gigabytes of data is still a significant cost because the state on an l2 is reflected as cold data on l1 so it's still expensive either way but the cool thing is that I have direct access to like state rules or anything that I want to access next approach is unchain compression so we can still use the same approaches previously so literally unroll it and process the seven gigabytes of data but instead of like storing then we can just update the miracle tree it's a nice approach but comes again with a few downsides it's very computationally intense because if we have like millions of headers we need to perform millions of hashes on the chain that's that's expensive but at least we we save on on storing data and also we need to update the miracle tree which is which is another cost last downside is that we need to index all the headers that have been processed why we need to index them because if I want to update access a specific block header I need to provide a miracle path because as we update the miracle tree and we just store a root in the contract itself then I need to know the path right so I need to index the data and essentially once I it's the moment that I want to access that I need to provide a miracle path this approach is okay it's I wouldn't say way better than the previous one but it's way cheaper last approach so there is a very cool primitive called miracle mountain ranges love it and the idea is let's do the same that we do previously inside the snark so we can provide this tremendous amount of data as a private input to the circuit and essentially do the same computation like unrolling inside the circuit itself and now we have a public input which is the block hash so essentially the commitment from which we unroll it so the trusted input the public input can be literally asserted when we do the on-chain verification and why we unroll it we can accumulate inside a miracle tree or a miracle mountain range why a miracle mountain range is cool because well let's imagine that you want to have like seven gigabytes of data processing once in a while like the proving time is going to be horrible and why would you even like prove these commitments for like the entire history like do you really need that probably not so let's chunk it like into smaller pieces and miracle mountain ranges are a pretty cool primitive that allow to do this to do to do this to give you like a bit of intuition how does it work it's essential think of it as a tree of trees yeah so once we do all this proving like of chain we simply verify the proof on chain as you know like very fine the proof is way cheaper than doing this directly on the chain and still we just provide a miracle path and that's it we essentially have access to any sort of data we want let's do a recap again so approach number one on the accumulation on-chain compression off-chain compression three categories prover overhead gas cost storage cost actually gas cost should be computational cost okay so prover overhead on chain accumulation do we prove anything well not really so we are happy on chain compression well we still like need to update the miracle tree i think actually there is there is an issue here so i'll just skip this part off-chain compression we're very very sad because well we need to prove actually significant computation so the proving time is significant okay now in terms of gas cost the third approach is horrible because it just costs a lot because we do the entire computation on-chain compression well we're a bit happy because we just do a bit of computation but still it's a lot of cold data a lot of computation but lost at least not so much storage storage cost uh oh sorry gas cost in the second approach while we just verify approved so it's cool um okay storage cost for the first approach well seven gigabytes of data it is horrible so we are very sad on-chain compression uh sorry storage cost for on-chain compression we just throw a root of the miracle tree so we are happy and in the second case we're even more happy because we again we just essentially keep updating the tree and we don't even need to post a lot post a lot of cold data because the cold data we post is literally just the proof so we're very very happy but again i don't want to say that all of the one of these approaches is the best one because as you see there are trade-offs and yeah so this part is actually pretty easy so as you know as you might not ice here i was explaining like the second step when it comes to dealing with storage proofs and now there is the last part which is essentially verifying the proof itself so approach number one is verifying the proof directly on the chain approach number two let's verify the proof inside the snark and then verify the snark approach number three let's verify multiple proofs inside the snark and then verify the snark we can aggregate multiple snarks together and so on but obviously there are some trade-off especially when it comes to proving time and yeah so now why the first approach is feasible on on zk-rollups for example on stark net cold data is very cheap and what we want to avoid in this specific proofs is cold data so this approach is for example feasible on stark net but for example if you want to verify like a proof on optimis where cold data is very expensive you want to reduce it as much as possible so for that reason you might want to use a snark and finally if you have like many slots that you want to prove why can you just verify them inside one snark you're gonna pay improver time but you just present one proof at the end so this approach is cheaper is the cheapest one but only if you have multiple actions to take so there are trade-offs so let's identify them categories prove or overhead latency verification cost so verify the proof directly prove or overhead doesn't exist latency doesn't exist because we don't need to prove anything verification cost well it is significant because we need to post cold data and we need and we need to do the actual computation so like going through the entire path and each step in the path is one hashing function oh and also let me get back to the previous slide i forgot this is very important why wrapping inside this wrapping inside this hark is pretty important if you're like dealing with a storage layout that is using a specific hashing function let's say for example peterson peterson is not available like on on the evm like you just need to implement it's not the pre-composites it's going to be costly but if you do the inside the snark and peterson is pretty snark friendly snark friendly then well you just verify snark on the one and you abstract it so it's going to be way way way cheaper but again there are trade-offs let me get back to this so i went through the standard market literature tree snarkified proof prove overhead it exists so we are not super happy latency we're also not happy because we actually need to spend time on on proving this this thing verification costs we are happy because well we we just verify a proof so it's fine and snarkifying multiple proofs the prover overhead is still there latency is still there it's even bigger because it takes a bit longer in improving time and verification costs we are super happy because essentially we can mutualize the cost of verifying multiple proofs by just verifying one single snark proof okay went through quite a lot of things let's put this all together so let's imagine we have three chains and we want to have interoperability interoperability between them so we have chain z chain x and chain y so it all starts with a message aka commitment we send the message in order to get the commitment so let's say that we send the message from chain z to chain x because on chain x we want to access the the state of chain z so what do we do once we have the commitment we literally recreate all the headers using one of the three approaches and once we recreated the headers to the point for which I want to prove the storage I just verify a proof and again for verifying a proof there are multiple approaches but now let's say that on chain y I want to access the state of chain z and there is no direct communication between chain y and chain z so it must be routed through chain x by the way I'm like talking about this in a pretty abstract way by chain x I just mean it in a later one yeah so from chain x I'm just going to send again the commitment about chain z as a message and then simply recreate all this all this headers as you may not notice it's pretty redundant because we perform the same computation on two different chains and we don't need to do that especially if you use like the third approach which is generating the proof on chain but now there is another problem how do you actually know what you should do like you need to be somehow aware of what is happening and for that reason we introduce an api we don't expect like developers to deal with all that complexities choosing the right approach for the direct thing essentially right now our apis optimizes cost-wise soon we'll be able to optimize latency-wise and yeah and essentially that's it that's about our api I highly highly encourage you to check this out and yeah like a few final words about the api it acts as a coordinator it optimizes the costs it optimizes the cost because we can batch multiple things and once the job is done you get a notification like via a webhook via an event like whatever you want so essentially you're not you don't need to be like an infrastructure maintainer and you can just focus on essentially building on top of this primitive and I think that's it questions so the api essentially is a rest api for now we'll also have a json interface we have off-chain entry points so we can request the data like by making an off-chain call like calling a rest api or like calling a json or pc method or if your smart contracts like wants to access this data then you just submit an event we're going to catch the event and later on like after a bit of time fit this the specific data inside this smart contract so we have like a bunch of interfaces and by the way speaking of like the off-chain entry points once the entire like work is done on our side you can get a notification it can be like a webhook we can like send you a bit of information like using a web socket it can be essentially whatever whatever you want oh yeah so that's actually a great question so different chains use a different like storage i would say architecture they might commit to a miracle patricia tree miracle tree maybe even verical tree and obviously like i said having a generalized verifier is like pretty it's not a clean approach so we essentially abstract it by using a snark and inside the snark itself we just do the proper work like you know we go through the through the tree like through the through the through the elements of the proof and then we can like use a specific caching function so for example now posidon posidon is is pretty popular i think that scroll uses posidon and also zk sync uses posidon on the evm like performing posidon will be pretty expensive so for that reason you cannot verify the proof directly but what you can do you can do the entire verification inside the snark and then on the one you don't really care what the snark is like doing you just just verify it so that's how we actually do it deal with it if we need to have it obstructed we have it obstructed if we don't then we just don't oh yeah that's that's actually a good question because i think it went super technical uh so actually what we do at Herodotus every two weeks we have some internal hackathons and right before the merge we build a proof of concept that we call the merge swap and essentially we allow anyone to down their proof of work if on proof of stake and the way how it works we literally build a bridge on top of this technology and the bridge works in a way that you can lock your if proof of work inside the smart contract on if proof on the if proof of work chain you can prove that you've done it on Ethereum proof of stake you can once you the proof is verified you can meet your year c20 token and you can do whatever you want with this token and then if you want to withdraw back to Ethereum proof of work you just burn it you prove the fact that you burned on the other side and and yeah that's it also in terms of uh other use cases i think that that cross-chain collateralization is pretty cool because this is the place where you want to avoid latency as much as possible and you want to be as synchronous as much as possible and essentially that's that's what we do here because our latency comes only from from the proving time but again using some optimistic approaches and so on there are a lot of things we can do here i hope answers the question okay i think that's it i have like three minutes so guess we can wrap it up and yeah thanks