 If you're in the wrong room this room is talking about testing the merge with parallel universes My name is David Searle and I'm the head of Amir for a small startup That's been working with the aetherian foundation for just over a year now Who likes bugs okay I brought some bugs along with me today So you're gonna have to be good like who would like a bug because I've got some bugs I'd like to give to various members of the audience anybody you'd like one. Okay here we go Just after little good at catching We've been spending the last year kind of really stress testing. This is why you should start with the front. I've got three more to go. There we go. The talk's about testing Ethereum and it's basically looking at using a pretty sophisticated and unique testing methodology which is based from a company called Antithesis, which I belong to. The company is involved in using deterministic simulation techniques to allow us to basically explore a whole host of iterations of complex, distributive technology that's running in a simulated environment. So let me just get my little clicker. I brought a few stars along to the show. Obviously, I'll just explain who I am. Little did you know that Dr Strange and Shale are also going to participate in this presentation. I've got about 25 years' worth of experiences in the tech industry. For the last two and a bit years now, I've been working for this business, Antithesis, and they've basically made me into somebody that is a bug hunter by a trade. I never thought testing could be fun, but there's nothing like hitting a segfalt and actually getting a real kick out of it. That's what I do on a day-to-day basis. This is my house in the UK. You can tell from my accent. As you can see, I've mostly got a few GPUs I've got to sell. If you're interested, please let me know. No joke. The day of the merge came round and we had been doing a lot of work with all the various clients, all that hard, arduous testing, making sure that we caught every last bug. What was going to happen? Was it really going to be smooth sailing? I think we saw this morning, which is great news. I was wondering if we were going to see this. You have to press the play on the video. There is sound as well, but I couldn't believe this when I saw this, but this is real. It's pretty bad. Not my front garden. You don't know it again. Testing is really important. This is where we talk about it, but the merge was a significant event. That's why we had so much celebration across the world about what was really going on when the merge took place. Congratulations to anybody that was involved in the merge. It's a phenomenal job. I think we can hold our heads up high there. I say testing is hard. You only have to use Cura to ask the question what makes software testing difficult. It's a bit of a high-level question. Apologies for that. These two really struck me here. The first one, fundamentally, this is an impossible requirement because absence of evidence is not the evidence of absence. Often we run the simulations in parallel universes hitting a whole host of code, and it's clear. You're like, well, is that good? Who knows? We can only really understand how much coverage we've got and it may be good, maybe bad. Who knows? That's a phenomenal requirement. The second one is about understanding... Testers have to think about all those possible scenarios where issues may arise and ensure that they are handled by the code. That's just not... How do you approach that with a distributed architecture like we have with Ethereum? We've got a whole host of different clients using different architectures. We've got the execution layer, we've got the consensus layer. It's amazing to see it actually working, but the complexity of that is just phenomenal. If we then look at what Ethereum has done to approach this task and this is not looking specific at antithesis, we are a part of the equation and I think very much that we're complementary is that Ethereum is using unit testing, test nets. We've got a host of shadow forks and test net mergers that occurred. Some technologies that have been used called Hive and Cotosis that do similar heats to the equation. We then have antithesis that is then doing this deterministic piece which allows us to really get into the detail of all the different iterations that are possible. Then we have a host of different fuzzing technologies that Ethereum is using to really try and isolate and hammer areas of concern. Throughout the entire year, a huge amount of testing has been done and on top of that they've also used static analysis to help with doing code audits and making sure that we catch things before the merge was to take place. I'll step back a little bit in terms of the testing as hard statement. It is a phenomenal obstacle to try and get over. How do we find every iteration and how do we find those hard-to-reach bugs that may not be common ground, but if they do occur, will be catastrophic? If we just look at the right-hand side, this is a representation of a distribution network. On the right, we're going to call this node using Lighthouse and Geth. Lighthouse and Geth are running on this node. They're running on an operating system, leveraging CPU, a file system. They've got different processes happening. Inside those processes, we've got different threads being used as well. Just in its own node, there's a lot of complexity about what's going on. We can test a node, maybe a combination of Lighthouse and Geth working together, making sure that we're pushing the boundaries. If we don't think about how we then have a network and we have a whole host of different nodes running together and may have underlying pieces like databases and other pieces on top of that, the complexity of all those different communication channels operating just means the search space for hidden-away little gems. I call them gems, but they're bugs, basically. They can hide away for a long time. You may never find that search when you're looking at your testing. You may never find that particular condition where the network was slow and there was a bit of thread pausing going on in a certain node. The number of things that have to happen could have a bug manifest. This is really where we step in, because we then have a complete picture, and this is what we build inside our simulation. We have the ability to run the entire collection of consensus clients and execution clients in a simulation, run it, and understand exactly every way in which we can actually see outcomes. That's pretty intense. Fundamentally, how do you do that in a deterministic fashion? If we find a certain situation where lighthouse crashes, can we represent that again? Can we replay it? It's the first thing you do, right? Can we reproduce it? Often bugs are not reproducible. They might try something different. They might move around. How do we debug this? This is again where we step in. What do we do? We leverage the ability to auto-generate networks inside the simulation environment and stand up all the containers that were necessary to bring up Ethereum. We then were able to then use network faults and other types of faults to inject chaos into that running environment. Obviously, it's great to see the software handle that. It's designed to handle these types of things. Under duress, we see things going wrong. We basically use fuzzing technology to basically hit the entire system. Not just individual pieces of software, but the entire system is fuzzed. That allows us to basically deterministically replay the complete orchestration of a situation. Not just one particular application, but we know that in this combination of different clients, under these network conditions, something goes wrong. We use strategies inside the system to allow us to seek rare events. When I start looking at the numbers of what we've been doing over the last year, we hear a huge amount of code edges. A huge amount. How do we find and allow ourselves to seek out those edge cases that if they do happen, again, can be pretty catastrophic, but get to them and uncover them to help the client teams debug and fix. We have the ability to have all this wrapped up in a tool set that is available to ourselves and we share it to the rest of the client teams and EF, and that's been really useful. If we look at it, we have the individual testing going on, which is fantastic. We continue to promote that and say keep going with that. I think we can all say that they've been tremendously successful and useful. Then we have our parallel universes. We have the ability to run not just one simulation, but we're running simulations literally every day that are generating and using, exploring the various search spaces that exist across any number of different branches inside the repositories. What does this mean for us? There's a little help from Dr Strange here. Hopefully you've seen it. It's not playing. This is what I look like when I'm doing the work. What was that? I went forward in time to view alternate futures to see all the possible outcomes of the coming conflict. How many did you see? 14,605. For us, winning is finding these edge cases. Winning isn't like, oh, all tests passed. We want to find those intricate combinations and iterations where we've covered 14 million different scenarios and we've got, maybe not more than one, but we at least have one. We can hold a hat on. That's an example of what we're doing. We think that represents pretty good analogy. If we move forward, we've seen it again. Here we are. This is what it looks like in our world. This is an example of an output of just one run. We ran this 13 hours worth of wall clock time which is if you look at the wall and there's a clock on it, it will last 13 hours. It actually allows us to exhaust 536 hours worth of testing. You can see here, we're talking about an enormous amount of edges being seen here. The branches are kind of how we get decision points. We get like an if then else statement inside the code. Very simple example, but just take it. We can basically branch off at that point and exhaust both avenues and actually see what happens in both situations underneath those test conditions. That's an example of a branch and obviously the code edge is seen is we've got the instrumentation happening so we can see what kind of functions and pieces of code are actually being executed. Pretty insane. Just one run and we've got 180,000 edges that we're seeing across the entire network. It's a busy network and obviously with all the different clients happening, there's a lot going on. We see ourselves as a complementary piece. We've been looking at this as a great example of a project where we can basically work alongside all the other pieces that EF are using and different client teams are using. We're just another layer of a growing list of strategies being used. I think that's testament to the approach that EF have used to really hold on to this resilience. Making sure that we take advantage of every way we can make sure that nothing untoward happens in the future. More and more upgrades coming, I'm sure. We've only got 4844 around the corner. It's certainly something to look at in terms of making sure that we're involved at every step of the way. This is a very simple illustration of what we've been doing. We've been building all of the clients giving them in place and actually establishing Genesis. We have been using this Genesis block and then moving it forward towards the merge in our world. We would have the merge positioned. We obviously start fault injection. We establish a chain and then we start fault injection to get to a point where we've got a whole host of different faults. You can see them here labeled below. Anything from partitioning that's a great one obviously to encourage forking of the chains. We've then also got delays happening, drops of packets. The nodes themselves we can stop. We put this thing through its faces. We stop things, we pause things, we kill things, we bring them back up. In some respects it's quite realistic that things do get rebooted and violators come back up again. We have all that happening inside the simulation. From that we then have any number of threads being paused, released. It's busy. Every type of this configuration of faults that we put into a simulation is again completed to deterministic. If we know that this ended up with a segfault we know we can literally put into the environment is basically creating a pure function of the entire system and we know we can get the exact same output which I think is a tremendous piece of technology which we have under the hood. What are the numbers like? In one year we've conducted 31 years worth of 24-7 non-stop testing. That has processed and explored over 50 million edges of code. Out of that we've got 45 violated errors some of which were catered for, but 33 of them were logged bugs. They were pretty catastrophic. Things that really would bring either an actual note down so panics, segfaults we'd have nil pointer exceptions we'd have, there's one I'm going to bring up in a minute with a little example but things that you don't want to have like living in your code base and I think that's we would ford those through to the different client teams and they've been able to eradicate them and obviously through the successful merge it's been great to see our efforts put to the test. Who wants an example? I'll give you some. Here's an output of one of the runs this would come through from an email hopefully you can see it, it's a bit of an odd chart I've seen the slide the stuff on the right hand side but down here you can see a whole host of assertions that we've got running across the entire network this is actually flagged up by fail, we've got a fail on a segfault this is something that is absolutely real a legal storage access attempt to read from nil question mark. You get some really interesting ones you get like please report this, please fix this so this is an example and we're like what do we do now? How do we take it to the next step? It's great saying you've got a problem this you can see here is actually for the node that's running nimbus and geth we know there's an issue there so okay fine, have you got any other occurrences on any of the combinations? We do. Let's look at an example of an actual log entry that's coming through this is a unified log the serialized activity that's occurring across the entire simulated environment a bit of mouthful so you can see here on the third column, the blue column here again this is just showing us all the different combinations, we've got Clyde, we've got Prism, Clyde and Aragon, we've got there's Nimbus Nethermines, Nimbus Bessu all kind of serialized and you can see that we've got a concept of time running through so this blown-up piece is basically the stack trace of showing us what is happening and Nimbus is having some issue, it's receiving some JSON trying to figure out it's receiving some JSON clearly not doing too well for it because it basically crashes the entire node so we go okay this is cool this is why I kind of go yeah there's got something of interest here another example here, this is just from Prism again I don't love sort of seeing panics but it does show that there's something of value here that we're really stressing on the entire environment so again invalid memory address okay cool what do we do now do we just send it to the client team well yeah you can and they might I wrote that code I can see what's going on here well she had to do it didn't she if she could turn back time well the great news about deterministic environments is that you can turn back time so we're going to basically jump into the actual environment and actually say okay right when does this manifest okay it crashes was it a second ago was it five seconds ago what happened in the execution path of all the different operating nodes with all the fault injection going on or was it where do we see this actual kind of thing manifest so we want to look back so many seconds we may want to turn on packet capture isn't by default just because the amount of data but it isn't turned on by default we want to be quite selective about where we see packet capture happening and then we can look at the data again we can rerun it again obviously determinism you're stressing a little bit of will it still be deterministic if I suddenly turn on packet capture at a moment in time but what we've seen is actually generally if you've seen something manifest then we can actively turn on things to help debug and work out what's going on there's some really interesting stuff coming through next year which I can't share but really cool so this is what we can do today this is what we've done with the EF is basically look at this mountain of data think of all the stuff that's going on all the different scenarios, branches paths of execution that have happened we have all these available in a big massive data set so we can look at this kind of analysis and look at the X axes which is your time versus the probability of the bug occurring on the Y axes now the top right is where the bug is seen we can then look at the common routes that are happening across all the data and try and then bring it together to see what is happening where does the execution occur where suddenly the probability of this actually becoming a bug happen and learn behold we have a huge jump here from literally 0.05% of the kind of bug occurring to over 50% this jump here in the middle we can start to go okay right we know how many seconds to go back now we can actually replay that simulation and turn back time I won't sing it and see what's going on there because otherwise you're looking for a needle in a hay stack how do you do that and that's really been really valuable in our efforts what's next well merge is great we're doing some post merge testing on some stable branches we've got things in a fairly good order there but we know Ethereum is not stunning still we've got obviously EIP4844 just around the corner I say again thanksjarding withdrawals a whole host of new capability which is brand new product brand new code brand new testing we've got pieces in there around actually one that's interesting is using things like malicious clients what about if I brought up an environment that has a client that is not doing what it should be doing can that cause issues how does that kind of part of the code for the rest of the network handle that client and does it operate in that manner so other pieces here merge, code, clean up there's changes to published pieces of code that are in place now all those changes introduce problems that could appear downstream so pretty cool we're working actively with the various clients and open to really broadening that relationship further so that's the end of that Q&A any questions you showed us with the probability of finding a bug how did you calculate that probability which model you were using or how complex the calculations and that stuff that's a good question just because of the amount of data we've got available we know every path execution that's occurring the outcome doesn't occur with the bug and we have all the different outcomes that do translate to the bug happening and so the ability to do some simple calculation we can see that the path that's trodden and worn if we see this happening then we can calculate the bug occurring there are obviously other branches down that well trodden path where my verge isn't going somewhere else and the bug doesn't manifest so that's how every point in that graph we can calculate if it does contribute or not contribute to the bug how do you isolate unique bugs so it's like the same bug that's a good question we see a huge amount of bugs that do manifest and show a duplication like the one you saw with Nimbus and Geth we can run that literary day in there with the right commit code which is in history but we would be able to see that and that occurrence so we can see very quickly this isn't just like an edge case that's just happened on one combined client set it's a problem across the board so we have all of that wrapped up into our reporting what do you think about playing formal verification in this huge environment do you think a possibility or actually not the best person to answer that question but bring it to us to the end because I've got some people that can answer that question for you please reach out to us keep the conversation going and I hope you enjoyed the talk