 Okay, we are recording. Welcome to this merge testing call. I guess two goals, at least I have for this hour. First of all, I'll get on the same page with regards to what we actually need to do to test the merge. And ideally, second, roughly try to see if we can assign some groups or some folks like to kind of own some of these tasks. Yeah, because I think in the coming months into this is kind of going to become the main thing we end up spending time on so that we want to make sure it happens. Mikal has put together a great doc, I'm finding I guess his view of everything that needs to be done so that's probably a good place to start. Maybe it makes sense to start with like your doc Mikal but kind of go a bit high level right like not necessarily into every specific tests but just make sure that we kind of all agree about like the categories of tests that we need if that makes makes sense and and to see if there's anything that's like missing. I know there were also some stuff we had on the readiness checklist and I'm not, I haven't like cross check to see if like everything that's in your doc is there and vice versa. But yeah, maybe you can start by just kind of walking us through like the buckets of like different testing that you imagine we'll need. Yeah, sure. So I can share my screen. Yeah, help. Yeah. Okay, regarding the, yeah, do you see it. I see. Yeah, cool. Regarding the readiness checklist. Yes, this document is probably not like they are not. They are all linked to each other. So we need to update probably the marriage readiness checklist with what we end up with. Okay, so yeah, regarding this test plan. This, this is a draft. First of all, yeah, there is this back documents. So this is like one of the most important sections here. So what we what do actually want to test. So, yeah, yeah, main categories like unit tests. We have the unit testing consensus backs. So, I've just listed methods here that must be covered with tests that we have these two EAPs, and we have blocked tests and we have retest, retest each tool that is used to test client implementations, and we have the client implementations that they satisfy the requirements of this and that the IP. I think this. Yeah, this kind of test to be implemented with this tool. Regression testing here. I guess that we have those block tests for previous hard forks, and we just want to be sure and I think this is this pretty standard procedure to to run all all tests. Before the four block, and then in our case it will be before transition block, just to, to be sure that the proof of work part of the client works as expected and hasn't been affected by the changes introduced in this EAP. In other words, I think it's like a separate topic. I'm not sure if we want to cover it today. Then, this is more or less certain things, at least for me probably I'm missing something here. So feel free to add. Yeah, feel free to interrupt me at any point and add something. So these two standalone testing is pretty much understood. Now we're the things becomes complicated when we go to the integration testing when we need to test CL and EL in different combinations. Yeah, I think that hive, this is my assumption that hive will cover all or almost all of these things. And this probably is very strong assumption. I don't know. Because I don't have like rich experience with hive. So we can test your engine API. The IP implementations wire engine API interface. So literally just taken EL and run some and try to do some transition process testing, just using the engine API implement interface. It should be doable and I think it's easier than like the next step where we test in the CL and EL fully featured clients and different combinations of these clients. So, like, yeah, like here. So it's like just taken two clients and probably several instances of two client implementations and just playing around with them in hive and right in those tests and errors. Yeah, there is like a huge list of tests and errors and this is not like, you know, it's not the end list. Each test case should be. You can write in the test cases in detail test cases should be done yet. That's all we have system tests, which are test mats, and I don't think they should be covered in this document. I'd rather prefer them to see in other documents. So that's, yeah, and some useful resources are also listed here. This is all that came up to my mind with respect to testing. The other end links like test factories for 4399 for engine API made by Marus and merge mock and so forth also here. So this is what was around recently. That's basically it. So, any questions so far. Any suggestions. Hey guys, I'm going. I'm curious if there's anyone on the call with like expertise in hive, who can like validate that assumption or like talk about how ready it is to handle these these cases you mentioned. Maybe there's not. Yeah, that'd be very good to be justified and to understand what do we need to extend the hive with, or is it even possible to implement all this kind of stuff. I'm not like I'm not a huge expert in hive, but I think all of this can be done. We have, right now we have like, have run static test cases, they run like we run the general state tests. And we also run sync tests, so that like we have two different clients thinking from each other. Also like GraphQL tests and stuff like this. So, Hive should be able to handle all of this pretty easily. But yes, it has to be implemented. Yeah, that'd be great. I think I saw there was a PR to add. Sorry, I was just gonna say I think there was a PR to add these two client simulators to hive that was merged a couple weeks ago. Yes, but we'll still need like full simulator. Yeah. I think that there still needs to be implementation of like simulators that connect the EL and CL clients. I think that just spends up a genesis and we need like simulator that connects to CL simulator that is a CL mock. And this is in EL mock if you want. Did you write these, did you use any of the notes from the interop to write this? Yeah. Okay, yeah. Yeah, it's based on partially based on those notes. I might have missed something while moving those notes from from pictures to text. So you think that we can take. Yeah, I was also thinking about like looking at this PR. Also, the question, can we use merge mock anyhow to write simulator? So I think merge mock will be. So, like, merge, merge mock be a simulator. Basically, I think it can be part of a similar. So like right now, if you want to run the consensus test against the client, there's like a simulator that runs and it reads the test from this Ethereum test repo and converts them into like a genesis file and then takes the block RLPs and directly gives them to the clients. And then at the end of checks to see if the head block is correct. And so after the merge, you won't be able to just simply import block RLPs. You also need to take engine API directives. So I see merge mock is something that would sort of help replace that situation. It has an experience in writing simulators for high. I wrote the transaction simulator. But yeah, I can like I can take a stab at it and see but if someone else wants to want to write a simulator. Yeah. So I guess I can add more on to this. And for prism, we have this end to end test case. It does it basically what it basically it's basically what hype is doing from for consensus later clients. And he launches the deposit contract and then he starts picking notes with bunch of validators. He starts multiple beta notes validators and he does do all these I stay transitions. He pot transitions, and then he simulates slashing conditions at zip blah blah blah. But right now you only supports prism. So we will like to Stephanie support more clients implementations we've been talking about that. And I think that's essentially what hype wants to achieve is on site. So yeah, I'll be happy to like pulling you guys to this repo and maybe we can get it to be compatible and even and even pour some of the work into high. It will be great because our end goal is definitely do more multi client and when testing. And now with this multi institution their client we have to hook up. It sounds like a good time to to basically combine the effort together. Yeah, it seems so. So Mario has also wrote a bit about that around the program to execute static test cases against the execution layer. And this may be also. Yes, this one here. Yeah, basically, that's not as full fledged as high is just very, very simple is just like, it's sense and engine directed one after the other, and just verify the response from the, from the client. But yeah, I'm, I'm sure that high is much more powerful and capable. I think maybe if we can, if I could somehow port the district, the best cases that I have there into high into something that high can process that will be I think even better. I guess I think the way I've been seeing is that we would take a lot of the code that merge mock is and just create kind of a simulator out of it, and then that could read test fixtures like you've written. But something separate to high or what, what do you mean. I would, I was, I was thinking that the, the merge mock tool we would just take a lot of the code from that and add some code to read the static fixtures like you've written and just create a simulator out of it rather than having it being like a separate repository. Just have it kind of become a simulator. That's, that's the best way to do it have, have the static test cases and then see them to the simulator. I guess the only kind of weird thing is is that merge mock is currently sending blocks over that PDP before the merge happens so it's like very much a real world test scenario. I don't know if that's what we want that simulator to be before the merge. So it won't need to. Yeah, I see. Right, because it's, like it, it, if you're running an execution layer client, it is almost as if it's really going through the merge because it is receiving the things through the same types of code pass that it would in the real thing. Yeah, there is a simulator that he is definitely covered. Right. So it's just sense. Basically, this should be, yeah. It seems like the simulators are pretty focused. So there's like some that focus specifically on some dev PDP tests, and there's like the consensus tests and those are done via like the import sub command of clients. And my understanding is that it's just like focusing specifically on we're testing the correctness of the consensus implementation, rather than also having to potentially deal with networking things. So the, the transaction tests, for example, also run via the PDP and the idea is to create like large elections or large blocks or something that they should also like be dropped. But yeah, I think it makes sense to to to do it via the code path that will actually be used. Okay. So it means, if we do it with the code passes that actually used means that before them, like, pre-merge all blocks and why the PDP, then with the way switch, for example, to engine API to initiate the transition to finalize it. This is what you mean. I don't know if that was to me, but I think that's what I mean. Yeah, yeah. So do we need two simulators for this goal or do we need one simulator that can download. I think we need, we need multiple anyway, we need, we need one for like static test cases and for sync, synchronizing different plans from each other should should also be a separate. Under, under static use cases you, you mean deterministic use cases or Yeah, like, like testing. Yeah. So like the, the right exact, exactly this sequence of actions, which lead to exactly this result. So while with the sync, it's like, it's it's not like we use both. I think we use static, like static test cases for almost everything even in sync. So what like sync does is you feed one chain you feed a couple of like predetermined blocks, and then you synchronize the other thing from it. Yeah, it's, I think it makes sense to have like the stuff in high the deterministic, because otherwise you will get like a lot of like with stuff. I don't know. And like, non deterministic testing fuzzing everything else that should not live in high. Yeah, I agree. So going to like test software to prevent trace conditions, why a hive, it's not made for this, I think. I have a question regarding Merchmark at the moment Merchmark support engine director. Yes. Okay. I guess we just need to like big regarding the hive when it just like the engine API is the easiest one to be to start with. Like, create a simulator to test engine API implementation, which will be then used by to test to test the DCI piece implementations. And just try to create a simulator. Why high. This is my main thoughts towards like the step towards the first step of integration testing on the match. Make sense. And the next question is who's willing to start looking into. Yeah, I guess. Maybe wait a friend this is like, does somebody have the bandwidth to look at this now, and like, not distract from basically working on the merge itself. Right, like so obviously if like Mary's or parents spend their time on this, they're like not working on the guess or prismatic implementation for the merge and that's like not ideal. Yeah, so I guess I'm wondering. Does anybody like not working on a client have the bandwidth and like the skill sets to add these simulators to hide. Take a look. Awesome. Great. Who said that. Alex. Oh, Alex. Okay, sorry. So on my plan of things to do I just have been doing other things as well and slowly approaching that same. But this is important that this is important. I go to take a look. I would like to try to port the test cases that I prepared from you to. Yeah, that's, yeah, I can look at this. Awesome. Okay. Like my. Yeah, okay, so cool. I should be so resolved now, right. My another question like that I have is we should probably have kind of like a list of tests and areas somewhere at some place. Right. Yeah. Proceed. Okay, go ahead. I feel like maybe we should link something in the readiness checklist where like we can basically, you know, say like this is like the kind of master testing document and then kind of track who's working on it. Ideally, just so we kind of have it all in one place. Yeah, sure. And yeah, how can the format is not does not suit well for this kind of purpose, I guess. Right, because nobody can edit, like, or everybody can edit it or something. Get an issue would be good. Sorry, what get up issues. Just yeah, I find some repo, you can track it there. Yeah, but this desk are like that's some different parts of the soft bands and the specs are in different rivers and so forth. So, what's the like some kind of meta spec thing, although I kind of feel like that was a little confusing personally. Will we use Google sheets or any kind of this. How about Google XML sheet is links to implementation of the test cases. Yeah. I feel like I have this kind of test sheets. Some long time ago. I mean, you're, you're like wasting the test cases in this way, right? Yeah. Okay. Dimitri, do you want to work on destiny APs? If that includes working on height. No, I don't have that much availability now. No, it's, it's, it's not. Yeah, I'm not speaking about height now I'm speaking about standalone tests like regular block test that you're right that you're usually great with the test. Yeah, for the difficulty of code, we are going to make tests for the random, the random opcode. It looks quite straightforward once they get implemented merge. We just make those tests in our usual format. I thought that perhaps we would need the high upgrade for all existing tests that we have, because consensus change and the script importing blocks would need to execute this not simulator confirmation that this block is legitimate something like this. Hello. So, I, I did implement when I implemented for 399, I implemented a way to pass the random field into a state test and execute a state test with post merge rules. I'm not sure if that's the right way to do it or if we should rather have a new import command that says okay those like import those tests as if they were with post merge rules. I thought we can just use a blockchain test format as the block header is not changed. We just treat nonsense mix hash field, according to yet be and test random opcode that way. Yes. I really need state test format for that. What do you need to switch the software to the proof of stake consensus action. Like we need to, we need to detect that. But, like, I guess that it's not a big of a problem but we have to think about how to do it. Which client. I guess probably you can use like some way I'll be for terminal total difficulty that is from Genesis I don't know. For the get I use transition tool to generate the test it accepts the work flag, through which I tell on which rules to execute. Even comments. And when running, when running on high. That script uses common line. There might something might change that what I was talking about that is because get execute all the tests through high. What do you mean. Yeah, all the existing tests a blockchain test that we have are executed on high on get through the execution script, which runs get common line. And I think there is like Genesis coffee file, which tells which fork to activate. I don't know if that's going to be now after transition, because of the second client, which should supposed to talk to get and authorize blocks right. The validator. That's why I think we need to have a simulator that can send pre merge box and then also send box via the engine API. The engine API is the protocol that is used by CL to communicate to consider by consistent clients to communicate with execution their clients. And this is going to be the natural code path to drop the post merge blocks on execution engine execution their clients. Currently scripts on get on hive do not use it right, they would need to be upgraded. And I looked at the command. I think I can implement the same way in retested the same way high script is doing to get client. It's a slow way of executing the test we have because each test will require a client restart. It might take longer, longer time. I mean, I'm talking right now about executing the existing test that they have after pOS merge. So that they will work on hive. Also, you've been saying that 43, like that the trend of code is straightforward. What do you think about that is 675. Like it's more. I mean, it's always been. For it is easy to implement because random is just read it from block header. Look at the format doesn't change. So test format doesn't change and we just create a test where we put a byte quote of random. And just check that the value is taken from block header. I mean, because in status format, we would need to change that format again for for this random field. Yeah, and yeah, unless someone really needs it, I mean, we can do it, but maybe we just do it in blockchain. Yeah, yeah, I don't think it's like state tests are valuable. Yeah. I'm trying to to create this document with test cases. Yeah, for the random opcode. Yeah, we are going to create this. I'm thinking if it's a good idea to use the same document for all tests. Like probably use different tabs if it's a Google sheet for hive and for other stuff. It's a list of implemented test cases, right? Yeah. And yeah, a list of all test cases with the status is like it's implemented it's in progress. This is started. Yeah, I see. Yeah, I think that checklist for basically like the high level categories, like for example, say, you know, like 3075 tests, 4399 tests, you know, adding like an engine API simulator to hive, and then just link those to whatever repos or issues make sense, so that we can use the readiness like that the kind of high level view, which kind of links backs out to like, you know, potentially a set of issues or just like a tractor issue or whatnot in those different repos because yeah, it's hard to like create a new place where everybody kind of goes to and already have one. So I think there's, yeah, there's a lot of value in trying to, yeah, yeah, I think just trying to move as much stuff as we can there and then have each repo kind of organize on its own. Yeah, I think it's like a good idea and reasonable. So they will be kind of separate, like independent from each other. Exactly, but at least in the in the, yeah, in their readiness checklist we can like link to all of them and people, like, because clearly it's like a different skill set to like do the hive tests versus just the actual blockchain test and whatnot. So I think just linking to wherever those live is probably the way to go. Yeah, cool. I think it's good idea. The issue with this might be that I, I don't know. Like, yeah, right, I don't know yet, because I'm not that much familiar with hive, which simulators are we lacking to implement this, this hive stuff. It's unhealthy. Yeah, I think that makes sense. I think we can just list like that, basically, you know, the high level ones that we put out like for the engine API for basically consensus layer and then kind of, yeah. What about the people who are doing it in hive just making PR to the readiness checklist with like better details. For instance, I'm not sure that mocking EL makes a lot of sense, makes a lot of value, but some of the things like this one could be verified only if EL is mocked with the sum, but I don't know how to do this actually. So yeah, let's not discuss it. This is just one of the examples of like uncertainty. In my case, so how to implement this is this possible. Yeah, what to do. Does anybody have any questions with respect to this document. I have a question about the stay alone part. So, I saw you listed some helpers from the big data guy of the consensus fix. So, I wonder. So we know we only test this case is unit test in PySpec we didn't provide any test factors of this test. I mean for the data guide test. So, would it be useful to create test factors for the data guide. We didn't provide it then because this is not really part of the action consensus. So the client can implement them in their way. Yeah, that's a good question. I think we just need them to be tested and we just need to test this back itself without exposing any vectors for while the data guide. So, I have just parsed all the functions that I have met and the consensus packs merge folder. So, this is, this is how this section like appeared. So, yeah. I got it. Thanks. I have a question regarding the duration engine API study test. So, do they have to be implemented on top of hype, but because from what I heard so far it seems like much more suitable to run static despectors for this. Well, I think the idea with Hive is that it's just like this encapsulated to software that runs all of the tests that we care about against Ethereum clients. And so it feels like it would like like right now we're kind of manually doing a simulator by running the EL clients locally and then running merge mock, and then it sending requests to the EL. I think that could all just live in the hive simulator. And then it would like spend up like a Docker file for each client and then do all of the different tasks against each one of them. Okay, that makes sense. Okay. I think also Hive is, yeah, the one of the upsides of Hive is like it can be integrated in CI, right? So it can be run on CI and run all these tests. I don't know if client implementations integrated it. I guess yes, but I'm not sure. For instance, that does get run Hive on CI. Well, Hive is, Hive pulls the new releases from all the clients and then tests them. So I'm not sure if we trigger that by CI or if this is just done by Hive itself. Do you mean that it's like if there is anything new in the repository? Yeah, it's almost constantly running. Oh, yeah, cool. Yeah, that's great. Awesome. Yeah, so we have this using Hive, we have this kind of automation for, and we can have this kind of automation for consistent clients as well. Like we will have to have this kind of automation for consistent clients because, yeah, because some of the tests are involved consistently as well. Okay. So regarding fuzzing, I implemented a fuzzer for the Engine API a while back. That of course became stare with the new specs. Once the specs are like pretty stable, I'll take another stab at it and let it run like on an EF machine or something. So if anyone is interested in fuzzing or fuzzing on the execution layer client side or maybe also fuzzing on the consensus layer client side, they can talk to me. And then we can we can figure stuff out. So and it's just, yeah, the basic usage is just to drop, give drop in messages and see if clients response, right, if it's not crashed or. Yeah, so it like we have like four different modes, I think, so some some just do random tests that should be like really random tests that that should never work. Some do things like create a payload execute the payload, set the fork choice to this payload. The fuzzer itself can this can choose between those different strategies. And it already found an issue in gas with where it like it kept, basically like a creative blocks, and then it kept setting the head back and and like reogging the chain until Genesis and then then it crashed something. So it's like a really, really nice way for the execution layer clients to test their stuff, because it does a really interesting interactions. Yeah, that's great. And do we do, do we do like even fuzzing stuff. Okay, we do, we do even fuzzing. I'm also doing that. And Perry and I, I'm not sure he's on the call way we plan to like we're doing differential fuzzing, even fuzzing so testing all the different like creating creating a state test and then executing it on all of the different clients and looking at the traces and see that they execute the same off codes that they have the same state route in the end and stuff like this. And we plan to basically increase this a bit because right now it's only one machine running running those tests and we would like to further this a bit. I run in them like to test all the client implementations. Or I running them pairs or. No, I'm like all the client implementations. So it's not all the client implementations. I guess, every gone be soon never mind open ethereal. Those are the clients that are currently run there. If anyone of the clients are listening in and want to be added to it, you need to implement something like AP 3155, which is a way to trace transactions, and then contact me and we'll figure it out how to add your client to this. Yeah, fuzzing is very important thing. One of the tools. Please give me the link also. So have a look at it. You mean the document or the for the test fuzzy. Okay. Also, I should, I should also say that. The consensus their clients to hide. Means that part of resources of consensus their client implemented matters of the teams part of resources of the teams that implement assess their clients will need to be. Of course not full time dedicated but dedicated to to look at hive and if something. Yeah, it will need some attention to the text. I think I maybe this is naive. Ideally, it's like, if we can get people that are not on the core team, they like, do the implementation and then just reach out to those teams if there's like a problem with the client itself or what not. It's kind of like an 8020 thing where most of the work is not done by the client devs but then obviously if there's an issue with like adding lighthouse or adding prism, like, that might be quite team specific. And, and, and like, we're going to need their help, but ideally it's not like the core thing they work on. I don't have people that are not on those teams trying to help set that up. I'm not sure how realistic this is but we can give it a try. I'm planning to focus on working on testing related things for the next couple months. I'm just trying to finish up some unrelated things to hide right now. So if people want to immediately jump in, we could set up like a weekly sink or something, and just talk about it. I don't know if anybody was interested in jumping in now. I would be really interested. Like, I'm still focusing on the implementation for most of my time, but I think that will also shift in the next couple of weeks when the implementation is like getting stable. And so I'm also shifting to testing them. Cool. Also, do people in general agree with the high level outline of the test plan? Anything missed here? I think we have to test some, some more edge cases, things like what I did with the DevNet, like mining during the transition and stuff like this. Yeah, I agree. It's not like it's neither integration tests nor like static test cases. So it's more like. I think it's defined as system test. Yeah. Yeah, it's almost like QA in a way, right? Like we need people like to stand up these networks, make sure that they work and try to break them. Yeah, and also document all of the findings. For example, I don't know. Sometimes I don't get peers and because of that, I observed that it matters in which way I shut down the consensus layer and execution layer client. If I shut down the execution layer client first, then the consensus layer client will go into some kind of panic mode and like try to like send payloads and like not receive them. And then once I restart the execution layer client, the consensus layer client still doesn't 100% work or stuff like this. So that was something that I saw with Lighthouse. But yeah, I think it's important for everyone that runs this kind of software that runs on the testnet to just like document every weird issue that's happening and send it to the teams. Yeah, I got like three or four reports already. And I was a bit like slow in taking care of that, but they should be taken care of. Sorry. Yeah, comments from Barry and Chad. Yeah, I think it's, yeah, it just should be done somewhere. And just again for like, I don't know, clarity, if people want to like write this stuff down and like random places like wherever's most convenient for you but then just open a PR against the merge readiness checklist to like add a link there. That can ensure that like, you know, we have one place where we're keeping track of it but it's, I think it's unrealistic to get people to write all these things down in the same place but hopefully we can just aggregate the links. You already have the link, I think. So these categories, the high level categories are good so we can use, we can start with them for to update the merge readiness checklist. I think it could be done this week. And even though if we don't have this like issue under or any other tracking document under each of the categories, I think it's also fine. It can be added like later once created. Yeah, yeah. Yeah, we don't need everything now. And I guess yeah just we only have like three minutes left but one thing worth noting is it feels like if we get all of this done it's like we've done like the basically the base cases. And then I think it's maybe a reasonable strategy to like start with this and like depending on how well this goes, then see stuff like you know we talked about like stress testing networks with like larger blocks or stuff like that. It feels like all those things can kind of come after but that just ensuring we have like a really solid foundation. And the next couple months is like if we only get that that's already a really good valuable thing. Hey guys, by the way, I wanted to add one thing. Perry mentioned it in the chat. So, by the way, hi, I'm rafael. I'm working on getting like some bigger networks running like with all the client combinations. And this thing can either run like private test name, like in a local area network, or even like exposing the notes to the public internet. And then you could like connect all of them. And the way I'm doing this currently is using my Kubernetes. So with that we should be able like to then create kind of big test nets, like on the go, it's sponsored by quickly fast. So we can have like a full fledged new network like in a couple of minutes. Yeah. Awesome. Cool. Well, thanks everyone. And yeah, let's, let's keep this conversation going on the discord. And yeah, I feel like we have a lot of next steps here. Thanks a lot everyone. Thank you. Thanks. Thank you. Thanks everybody. Thank you. Thank you.