 Alright, so next up we have Marius van der Weyden who is on the geth team and works on fuzzing and testing and some other things. I know Danny Ryan likes to take all the credit for the merge, but these guys played a crucial role as well and he's going to talk about some of the bugs that they found and maybe some of the things that could have gotten horribly wrong, but didn't. Thank you. Yeah, I was actually surprised that so little got wrong with the merge and I'm really happy to stand here today and say that we didn't kill ETH, but we tried to and this presentation is a bit about some of the strategies that we used to kill ETH and it's a big part of the presentation is about some of the bugs that we found, some of the interesting issues that have popped up during the merge testing and also before that during our normal testing. So today I'm going to talk a lot about consensus issues. Consensus issues are differences between the implementation and the spec. I think that's the big thing about Ethereum. We have a specification and we have multiple clients implementing the specification. So for the execution layer, we have four different clients that implement exactly the same thing and we need to make sure that they do exactly the same thing because if one implementation makes a transaction that the other implementations accept, then the chain will split into two and we want to prevent this. And so over the years, we developed a bunch of testing strategies for exactly these problems and the newest one that we developed for the merge are shadow forks. Today we take a copy of the blockchain, we configure some of our nodes with the new rules and at the merge transition, the new nodes create their own chain which is in parallel to the main chain. What is really interesting about it, both chains share the same state. So all of the transactions that are valid on one chain are also valid on the other. This means we can run tests with main net load which is sometimes very important to test for performance and also later on we will see a bunch of issues that came up during shadow forking that we wouldn't have found otherwise. Another big part of what we do is differential testing. Because we have these different implementations, we have a really nice property that we can verify them against each other. So we create an input, we give it to the different implementations and then we collect the outputs of these different implementations and verify that everyone does exactly the same thing. And there are different strategies to generate this input. We have for example the Ethereum tests which are just static test cases where we know the answer for the test case. But we also created fuzzers for creating just random mesh transactions and contracts to test the different implementations. Something new that we also did during merge testing is create malicious nodes. Malicious nodes are basically forks of client software that changes some of the rules. And so malicious nodes can insert bad transactions, change header fields and really big values or no values and can try to break the other nodes on the network. These malicious nodes, we have them for the consensus layer and we have them for the execution layer. And malicious consensus layer nodes can for example double votes or vote on two conflicting blocks which would mean that they can be slashed. Or use fake signatures, send something, some weird network packages and just in general try to cause mayhem on the chain. And this is a non-conclusive list of some of the testing tools that we have built over the years. For example Go EVM lab is a toolkit for EVM testing that we use to create a test case executed on a client and collect the output from the client. We have Hive which is a really nice continuous integration regression testing suite where we run these Ethereum tests which is a set of 48,000 test cases every night against the different implementations and verify that nothing has gone wrong when they update something. Then we have these malicious nodes both on the CL and the EL node, the CL and the EL layer. And then we have some companies that we work with, Cortosis and Antithesis for fuzzing and test nets. These are bigger test nets that we set up, run one time through the transition and then verify that the transition was correctly and then we scrap the test net and start fresh. And then we have some kind of one-off tools like TXFuzz which is sent some interesting transactions. It's not really for finding bugs but more for creating load on the network. So if we create a test net, one of the first things that we do is use TXFuzz to basically hammer the nodes and see if something breaks there. MergeFuzz was a one-off tool that we used to fuzz the Engine API and we have the beacon fuzz which is a tool that has been around very long and fuzzles the different beacon clients and found a bunch of interesting bugs. So the rest of the talks is just talking about some of the interesting bugs that we found over the years. This bug was found by Martin right before the merge so whenever we have a consensus upgrade, whenever we have a fork, we actually ramp up testing, we take out all the old tools that we have and try to break it. Here the BISU client had a bug in the gas calculation. As you can see there, the gas goes down and at some point there's an underflow and the gas goes extremely high. This is a consensus issue because other clients will compute this correctly and on networks where BISU is the only client, this would actually be a dose vector on the network because the nodes will keep running the transaction and never stop. Well they stop once this amount of gas is run out which is basically never. Then we have the death of Kintsugi so we set up the Kintsugi test net which was the real big test net that we had where we also invited the community to participate in testing and on this Kintsugi test net we set up the bad block generator from the execution layer and it changed the blocks so for example it created a block with invalid extra data and this actually, I think it actually set the extra data and Nimbus was unable to sync because of that and then we had the TX fuzzer running, the TX fuzzer created a transaction that had the revert opcode in it and this triggered a consensus issue in Ethereum.js and so there wasn't a consensus issue there. And then we had the three way consensus split which actually broke the network and this was a split between Gath, the Tico-Gath combination and BISU in another mind here. So the three way consensus split basically we had the fuzzer replace a block hash with its parent hash, this block should be rejected because the hash doesn't match what we give it within the engine API. This check was actually in the spec but some of the clients didn't implement it correctly so BISU did not have this check, NetherMind actually had the check but it also cached the payloads by block hash and so we looked up the block hash, the wrong block hash, saw that it was in the cache and then the implementation just assumed that the payload was valid. So this split the network into Gath and NetherMind BISU with NetherMind BISU being wrong, Gath being actually the correct client and then we kind of thought okay we're going to fix it and we're going to keep the network running and try to get the network back again and but during this time the bad block generator actually created another issue, it created the block where the block number was set to one and in Gath we have a cache to check whether we need to sync and one issue is with TIKU they executed all forks of the blockchain there were like 30 different competing forks, executed them simultaneously which flushed the cache and so we queried the database by parent hash and block number minus one, block number minus one was zero which failed and triggered another sync cycle and because we were trying to sync to the Genesis block that actually violates some preconditions so Gath panicked and the TIKU Gath node shut down. So after we had this three way consensus split we decided at some point okay it's just too hard we're going to deprecate the Kintsugi testnet. I already talked a bit about Hive, we use Hive to execute a bunch of tests and most of them are spec tests so we for the specification we create test cases and Hive found an incredible amount of bugs so shout out to Mario if he's I think he's sitting here yeah and so for example we had a division by zero in the exchange transition configuration call some rules around the time step didn't really work and so Hive really found a lot of issues that we have between Gath and the spec and then we had the testing the merge effort which was really nice over 400 people got involved with testing the merge they sent transactions on the testnets they set up nodes reported issues and the most important thing they created documentation so we should be better at this we should be better at creating documentation but we are unfortunately not so we kind of rely on the community to educate other people in the community and they also found some interesting issues in Go Ethereum that were kind of related on the usage of it and I think they also found a bunch of other bugs so the community found a bunch of other bugs in different implementations and then we had the ShadowFox the cool thing about the ShadowFox is we are actually testing with the real network and the real data and so one thing that on the on the first ShadowFox we saw that the gas limit was quickly dropping and the issue there was the default gas limit in gas is actually was actually 8 million and we never caught that and we caught it for the first time during this these ShadowFox because on mainnet the miners back then voted to increase the gas limit another issue was we had a memory blow-up during re-orgs there was some weird issue where it would re-org the node 600,000 blocks and this like increased the memory until the node crashed. Re-orging 600,000 blocks is not something that we are able to do but finding this issue actually showed us that the re-org procedure was taking up to too much memory and so we changed it and then we had the base fee engineers basically the engineers between the execution layer and the consensus layer is different so what consensus layer clients need to do is change the endianness of the fields and prism used the wrong endianness thus it created bad blocks when the base fee was over 255 because one byte you know yeah I would really like to get you guys involved testing the merge was a great way for the community to be part of this and we have a bunch of new upgrades a bunch of interesting stuff that we also want to test so test the search test the verge and eventually we also want to test the purge and this community efforts makes me really happy that so many people are interested in this are interested in contributing their time so if you if you want to become part of this I want to give a special thanks to all of these clients all of these client teams and people involved in testing and testing the merge and I want to give a special shout out to Martin I hope he's watching right now for keeping us and ethereum stable we're also currently hiring within the ethereum foundation to increase the testing efforts the testing team so if you're interested in getting a job at the foundation and and help us test the new updates contact Mario and yeah so for when I inevitably have too much time on my hands I have another eight minutes I have a bunch of backup slides with more cool bugs and issues that we found you might remember this one the bug that took down in furor basically the memory returned by the return data copy opcode was shallow copied and so when we modified it broke something and we actually found the bug rather quickly and fixed it but the fix wasn't announced we created a new version of geth and we kind of forgot about it and someone actually found that bug and triggered it on mainnet because they saw that only 1% of the nodes on mainnet actually still have this this version that that's broken the problem with was that these 1% of nodes were actually in furor included in furor nodes so for three or four hours in furor went down that's why you shouldn't use a centralized RPC provider but that's a that's a that's another issue go run your own nodes and yeah we kind of had to take the blame for this this one was also a really interesting one so we have fuzzers that run on OSS fast continuously and at some point we got an email hey we have a panic and whenever we get a get an email about a panic that's usually pretty bad because then someone could just send a transaction to the network and all of the gas nodes would go down and that's not great so the issue was that you can see the comment that says choose shift equals B minus one but the shift was actually B not B minus one and so that's the fix for the for the issue and so we triggered this bug within the mod X precompile when the modulo was extremely large yeah then we have the some bug that we've got via the bounty program so we have the bounty program if you find something and instead of triggering on it on mainnet send us an email we especially pissed if you send us an email and then go trigger the bug on mainnet so send us an email first talk to us and you will actually get a bounty the bounty size was increased to 200 250k and we actually quadrupled it for the merge so there's a lot of money to be had if in funding in funding consensus bugs here when when we input in 256 min 2 to the power of 255 it would the implementation of nevermind for the for the modulo operator would actually negate the result wise resulting in 2 instead of minus 2 so all other nodes computed minus 2 nevermind nodes computed 2 and so they would spit on a split on a different fork and this one is a DOS via malicious snap request that was found by by Gary and Martin from the gas team and here the issue was that someone could craft a weird get try node packet that's that's a network packet that requested a missing try node and you could crash a the geth node with it these kind of issues are not that bad because you only crash one node if you if we have crashes in if we have crashes within the EVM those are usually worse because you can crash a bunch of nodes and actually during investigating of this we created a fuzz of for it and the fuzz of on the second panic here we have one I actually wrote my master thesis on fuzzing EVM stuff and created this fuzzy VM program and fuzzy VM found an issue in the copy op codes called data copy code copy copy return data copy and the issue was these op codes consume three items from the stack destination source and length of the data to copy and nevermind actually halt execution if length was zero so you can see this is one of the tools we have where we can where we have two different executions we have two different traces of execution and we can compare them and you can see that here the sorry here the the execution just holds and that's that's a consensus issue and then we had another issue found by fuzzy VM was denial of service in in in BISU basically when you called the mod X pre-compile BISU would read all of the parameters even if baselength and modulo's lengths were this zero and so you could put a really really big exponent and this could crash BISU nodes and this is the newest one that we found right the 110 22 bug was actually one to get one 10 22 was actually the version that we wanted to use for mainnet for the merge and it contained a regression that could corrupt the local state basically whenever we we have so we have these try nodes this is where all the stuff from the all the contract state and accounts are stored and when we modify them they need to be flushed to the disk in the same order as they are inserted into what we call the dirty cache and so you want to you want to insert if you modify D B and A you want to insert D first B a second and then a third and what we did was actually insert B first and then insert A and then we saw that we should insert D but we saw that B was already inserted so the code assumed that everything was correct and we never inserted D so we ended up with some dangling try nodes that's it thank you very much for coming and thank you for helping testing the merge and I hope to see more people test the merge in the future thank you