 I'm Terrence, I'm in Core Dev, and as of earlier this week, we're proud of option labs, so I had to change my name real quick before I get in trouble. So let's just start. This talk is about consensus layer interfacing with hybrid PBS. It's not so much your, like, searcher, builder, your typical MEV talks. So we want to understand more, like, what does it mean when consensus layer interface with hybrid PBS from, like, from the, it's, it's, it's tonality, such as latency, file, and censorship. Then we'll also talk about mitigations. Just to reiterate what I said, everything in the purple, we will go over it. We'll have relayers, mat boost, which is a relayer aggregator. We have consensus layer client, execution layer client, and validator client. And this is what we have been, me and my team and other teams have been working on over the last few months. So background, why are we here, right? So there's options, and options are nice. So the first options, which most people use these days, is normal block processing. So as the consensus layer client, you know which validators you're serving. You know when validators are proposing a block. So you will build the block for the validator. And post merge, we utilize execution layer client to prepare payload. So consensus layer client uses execution layer client to prepare payload. They put a payload in the block that passes to the validator, validator sends it, return the block, and consensus layer client broadcasts the block. So here the separation of concern is nice here because like, execution layer client and consensus layer client, they're both like very complicated piece of software, but they only need to know each other through the ng API. So the separation of concern is very nice. The second option, which people are starting to use more and more now, is that I can outsource block production, right? As consensus layer client, if I want to participate in the MEV game, I can utilize the relay network. I can, hey, hey, relay network, can you propose a block for me? And a block are usually more profitable. And also in the background, consensus layer client can also talk to the execution layer client like, hey, can you make me a backup block in the event that the relay network doesn't work? I can still use it. And this is kind of the paradigm that we're heading towards. And it's important to understand what does this mean. So this is today's number. And hopefully you guys can see it. I capture it this morning. There are 55% of the network participation is using MEV block production. It is outsourcing their block. And then out of the 55%, 81% are dominated by flash box relayers. And there's seven active relayers. So let's talk about the first risk latency, right? So when you propose a block normally under your local setup, you have your consensus layer client, execution client, validator client. So how does this work? You ask execution client to prepare a payload. And then you pass the payload to validator client to sign it. Validator client, sign the payload. And then you broadcast the payload. Simple, easy, very easy to reason about, right? But with MEV block processing, it's a commit and reveal approach, right? Why is it commit and reveal? Because as a relay, as a builder, you don't want validator client to steal your transaction if it's in the clear text. So you made them sign it first, then return the signature, and then you give the full transaction. So given it's a commit and reveal approach, there are some more steps in the middle, right? If you get the header, the relay network returns the header, then you sign the header, and then you submit the sign header, and the relay network returns the full payload, which is transaction in clear text. So as you can see, there's two more steps here, right? And besides that, it's also on a different network, right? It's not local anymore. So ideally, you want the relay network to broadcast the block as fast as possible instead of giving it back to you, and then you broadcast it. So hopefully, most of the relays have been published in blocks by themselves instead of just passing back. So for that, you do save some latency, which is nice. So let's ask ourselves, right? Do the additional run-through and latency matter? Because like I said before, when you have your EL and your CL, they are both in the local setup. It's lighting fast. It is reliable, because they just go through hardware. They just go through electron circuits, right? But if you really relay network, it's slightly different. You are actually talking to some, like, infra provider, GCP or AWS, on some regions. And so this is, I was able to capture some numbers. Unfortunately, I haven't been home for the last two weeks, so these numbers are slightly old. But on GRODI, with the thousand validators, there are not many validators. But yeah, just a thousand validators. But given it's GRODI, the network topology may be slightly different. But I mean, I hope we can get a picture here, right? Here, not even using MEV Boost, I'm just talking to the relay directly. It takes about three times slower to propose a block, given the additional latencies, right? So what does this mean when there's additional latency, because when you propose a block, usually how it works is that you run up four chairs to get ahead, then you build a block based on the head, then you get a payload from the institution engine, then you broadcast the block, right? And then at a four second mod, which is one third of the seconds per slot, attestors will vote, right, what is the head of the chain. And if the attestors did not see your block, then your block may get orphaned and that's not ideal. You do not want to lose a block, right? So this is what we don't want. We don't want something that's taking up so much in the middle with the gate header, the second header, and suddenly a blank block, right? So this to me is worrying. And then let's look at some more numbers. Some block arrival latency differences. This is actually capturing on Mainnet. This is my home setup at home with 300 megabit bandwidth. So this is a home setup as possible. And then we capture over 15,000 samples and then the MVB block took about 500 milliseconds longer, right? And what does this mean from the submit attestation timeline? It just means that if the block takes longer to arrive, then attestors will unfortunately miss it, like that, right? If you're waiting in the front for that alone, then you will essentially eat up the time that you have. So you don't want to be the lazy block. You don't want to be the late block because you would get orphaned, right? At the top example, block C was supposed to build on block B, but it built on block A. And then because the block B was late, on the bottom example, block E is not building on block D because of proposed boost or something. It was supposed to be the head, but it's not the head. So block E built on block B, therefore, C and D got orphaned, right? So now the set of numbers, 50% of the orphaned block actually came from Relayer from September 17 to 27. And there's the orphaned block slot, the relays, they're using the validator ID, the entities, right? And that's unfortunate. I mean, you could ask that, OK, well, maybe it wasn't going to get orphaned one or another. We don't know, but still, like, 50% of it come from the relays. It's not just like Sunshine Rainbow and Butterfly, right? This is like, I mean, there is risk to this, right? I mean, we often tell people, hey, we should use the relayer because it's more profitable, but it also comes with risk, right? If you're using a relayer, make your block two to three times more profitable, but you get orphaned 10% of the time. Is it worth it? That's something that you have to ask yourself about. And another risk, I think, with latency is your centralization, right? The whole point of MEV boost is to make validator decentralized because now we can extract MEV at home. So if you give everyone equal access to MEV, you don't want to look at this, right? You don't want people to start realizing, oh, well, if I have better latency, then I won't get orphaned. So I'm going to move by at home, because they can go close to the relayer. So it becomes a negative externality. This is what we don't want. So what's the takeaway with this, right? The takeaway is latency matters, both for hybrid or even maybe for little PBS, right? And latency can lead to centralization risk, which we have known a long time already. And it's actually really hard to optimize towards network latency, right? At the client level, I don't think there's as much we can do. I think on the MEV boost side, there's talks about, like, instead of module on module GSM with module on module SSZ, there's some improvement there. From the relayer perspective, I hope that they have really good network config. They have a lot more peers. I feel like they have a robust infrastructure there. I think that's as much as we can do, because unfortunately, if someone just has a slower internet connection, they want to use a relayer. I mean, this is going to be affected, right? As it's important, we educate these risks. Besides latency, there's also faults too, right? So what does fault mean? Fault here means that, sorry, the stack got a little messed up there. So when you ask for the header, the relay network failed to return the header. So that's the commit phase. Or the fault can happen when you submit the header, the relay failed to reply the payload. So that's the second fault. So we'll focus on these two type of faults here, right? So the first category of faults is just get header faults. When you get header, and then the relay network failed to reply. And there can be categorized as like, you have a male form header, you have a consensus invalid header, you have a payment invalid header, or you have a non-conforming header. So we'll go over them one by one. So what does it mean when like a header is male form, right? It just means that it is syntactically invalid. It has an invalid structure. It has an invalid signature. Can the consensus, can the CL client detect it? Yes, it can, right? Because when you are more sure of it, if it's not the right structure, then, well, you know it's wrong. You also can verify the signature. So this type of like faults we can detect, and then we can mitigate. So this is fine. Another type of fault is just consensus invalid header. So that just means that the block hash is invalid, the transaction is invalid. But for this, we unfortunately cannot validate because we cannot see the full transactions, right? We cannot calculate the block hash ourselves. So that's something that we just have to trust blindly. That's unfortunate, but it is what it is. Then you have payment invalid header. So at this type of fault, it just means that, well, the builder was promised to pay the valid proposal some like, some ETH, but it failed to pay. So consistent client cannot detect that type of fault. This we trust relay to simulate it. That's why relayer is trusted for us. And then there's also the non-conforming header. It just means that when the validator registers, it basically say, hey, this is the guest limit I want to use, but the guest limit is incorrect. The timing stamp is incorrect. The block hash is incorrect, right? Can consistent client detect that? Yes, he can. So those are like what we say the commit faults. So now we're jumping to reveal faults. This is the second type of fault, right? So what are some of the reveal faults? So the payload could be invalid or the payload is unavailable. And keep in mind, there's no falling back for this. Because you signed the signature already, right? So at this point, you can complain on Twitter. You can probably complain something else. But there's not much you can do here to basically make the block as a whole. So a male form payload is similar to a male form header. It just means that it is invalid. The full payload does not match the header. Can the consensus layer client validate? Yes, he can. But hey, it's too late, right? At that point, you lost the block already. There's not much you can do. And it's the same with consensus invalid payload. You can validate the transactions now, because we see everything. But hey, it doesn't work too late. Sorry, they're unavailable. This means that the relay network just went to sleep. When Malaysia decided to turn it off, it did not reply back the header to you. So it did not fulfill its commitment, right? And then you know when it does that, because you never receive the payload. And still, there's nothing you can do there, because you already signed something, your signature is out there. We have this concept of falling back to a situation layer client. Just if get header goes wrong, you can produce with your local execution client. That's probably fine. But if the get payload fails, then you cannot produce with your local execution client. You don't want to double sign. So the return header, the first commit, can fail two ways. It can either fall or time out, right? We do prefer fall better. It's just that you get to respond right away. You can start to produce the blog right away. If the time out, it kind of sucks, because you have to wait for them to time out. You wait for them for a second. Then you lost a second of your precious time there. So let's go through some minor incidents. I don't want to sound like I'm pulling out their fall incidents, but I do think it's important to go over this type of incident so that we can learn as a community, right? The first type of incident, so the first incident is September 16, FlashBots Relay. They fell to our martial deposit for the payload reply. The damage, we missed three blocks. The second one, BlocksRollsRelay. This one is when the relay did not validate the block. And then they replied, the consensus is melted. We missed 88 blocks here. The third one is BlocksRollsAgain. This is consensus invalid payload. And the damage of that is 15 blocks was missed. So this type of things do happen, right? Faults happen. And then it seems like it's mostly happened on the commit and then also the reveal phase. So we need some mitigations, like a circuit breaker. And for example, as a beacon client, I can detect when there's a lifeness failure. And then when the lifeness failure is determined by clients, say if you missed three slots in a row, or the chair misses eight slots or 32 slots, if that type of thing is triggered, then we just default to local execution engine, right? So this is to prevent a dominant relay builder go offline. And it doesn't solve the cases like I mentioned before, just because those are happened maybe 0.1% of the time. For this, this happens all the time. But for this, this is a stronger defense. Then you have your relay monitor, for example, such that you can monitor relay space of performance, such as behavior, and such as behavior, which is safety and lifeness, and performance, which is latency. And people can see how the relays are performing. And people can figure out based on that, do I want to connect to the relay? So it just makes the information more available. Then we have features like be filtering. So as a proposer, you can say, hey, I only want to use a relayer if they give me something that's over this value, right? So there's some nice things you can do there. So what is take away? We're still early, but we need more robust relay. And also, we need a way to hold relays more accountable, right? For example, a simple idea, you monitor all the missing slots and are offering slots live. And then you pull the API, the relay API. And then if you see, hey, the missing slots is coming from the relay, OK, I'm going to just shut out loud, Twitter or something, so that people know, hey, there is an incident right there. It's happening live. You need to turn off your relay, or you need to switch local processing. In terms of just like false, timeout, I think I prefer get ahead of false. Then I prefer get ahead of timeout. Then I prefer get payload false, just because with get ahead of false, you can propose a block just like still. And then it's most likely fine. And I do believe that relay quality will improve over time, just because it's still relatively early-ish. So I mean, they're still learning, and we're still learning, and something that will improve over time. So OK, last section, censorship. That's something that we have been talking a lot about. So as of today, let's see. 49% of the mainnet blocks have some sort of OFAC compliance building, and that's unfortunate, just that's already over half. So we have to understand who is censored. Who can censor? Builder can censor. If you're a builder, you don't want to build blocks. OFAC compliant transaction, you can censor. Or the relay can censor as well. So therefore, it's really hard. It's really hard. So what's the problem here? The problem here is the enemy boost is a neutral piece of software. It doesn't care about censorship. And then the UX of just figuring out how do we defend censorship is still early. Right now, essentially, everyone chooses the relay that's non-censoring. That's it. But then it's hard to figure out who is censoring at a given time. It's like you don't know who is censoring. I mean, we just look at the news, we do Google search, and that's it. We need more information there. So potential solutions, right? These are very experimental ideas that I just have been thinking in the top of my head. You can have some active inclusion, such as med, boost, CR, this. You can have some sort of censorship oracle. So for active inclusion, at high level, how it works is just like a proposal. It's pressed the intent to force transactions into the payload. And the relayer has to present those slight transactions to the builder. And the proposal will only accept those transactions if they are included or the block has been full. So for this, you do require some sort of multi-proof to make sure that, hey, the transactions are actually included, and then consistently the client can do the validation. So what's the downside with that? The downside of that is just more timing, because now as a proposer, at the previous slot, you do need to send the transactions. You want to send the transactions during the previous slot. So you need to figure out what's the timing for that. And now the proposers also need access to the main pool to the institutional layer of state. There's also some latency complexes here, just because now it may take longer to propose a block. And then, yeah. And then you can also have, this is like a poor man version of just censorship, filtering. Basically, a proposer, you just want to enter the main pool for the top N transactions that's based on guest fee at the given time. And then when the realtor replies, the realtor has to show a proof that, hey, this top N guest transaction are included unless that the guest is at this price at the given time. So we don't really try to force transactions inside, but we just want to make sure that the top N transactions are actually getting in. And then here we're assuming that, OK, if someone's doing a sensor, someone will probably use a higher guest fee and stuff. So it's not ideal because you kind of loses this inclusion control, but it's probably easier to implement. Then there's also ideas like censorship oracle. You can introduce a new actor to produce censorship, kind of like relay monitor. And now as a proposer and maybe boost, when I receive a header, I can ask this oracle, this new actor, be like, hey, is this a sensor? If they reply, yes, then I won't use it. But then again, you're putting trust on a new actor, and that's obviously not ideal because who is going to monitor the censorship oracle, and there's always that problem. So there has been a lot more research going on, which I'm really happy to see. Vitalik posted a research a couple of weeks ago, just like, how do we constrain builders without bringing blood proposal burdens to the proposer? There's ideas like using blood prefix and stuff, and I highly recommend you to take a look at that. And then there's from Brunnerby, I think it came like last week, which I haven't had time to read it. It basically enforces the proposal commitments on chain, and then I think that's very neat. I definitely want to take a look at that very soon. So what are the takeaways, right? I think the takeaway with this is it's important to figure out who can censor and then who can figure out the censorship because there are so many actors in the picture. There's proposers, there is MMV booths, and then there's also the relayer, right? It's important to figure out who does what, and then I do think we need to leverage the Builder API a lot more because the Builder API is probably the best thing we have today, it's the best defense. We use Builder API a lot more, we can provide more ways to basically organize defenses against censorship, such as inclusion this, blood prefix, and then there is like a spectrum of solutions and then we're out there, which we're thinking, but the simplest solution typically has like more trusty assumption, so I'm not sure if that's the way that we want to go. So some final thoughts, I think like for me, I think censorship resistance should be the highest priority aside from scaling and withdraw. Like what is Ethereum if 50% of the transactions are censored, right? That's something that we have to ask ourselves about, right? And I do think the hybrid PBS is basically our best two bars to defend against that because it allows fast iterations that's, and then before we shun into full protocol PBS, then we kind of lose this step because everything will be hard for base, right? For hybrid PBS, you have the Builder API, then you can play around with that and then we can figure out like what works best and what doesn't work best. And then I know like people that have been working on hybrid PBS such as Matt Booth, Relates, Builders, they have been getting a lot of bashing and stuff and I don't think it should be that way. I think we should be like working together as one team and to like to basically advance this censorship thing forward. So yeah, definitely shout out to all the teams that have been working on that. I mean, there are the real heroes. So yeah, that's all I have today. Thank you so much for having me and yes. Thank you, Terence. We have some questions for you. Hi, so regarding latency, the numbers you showed were more or less the same as Dani's in the first day. So if you have 50% of the network proposing through relays and 50% of the orphan blocks are coming from relays, I would have taken that as a sign that latency did not matter. That would be the expected. Yes, but I do think that we can do better from the orphan point of view, right? Because now if you look at 20 plus gets orphaned away, if let's say you just have 10 plus get orphaned, then it's probably better than 20 plus get orphaned, right? It just, I'm coming from more like the orphan's perspective because when something orphaned, then it's obviously not ideal because differentiation should be included and there may be some US concerns there. Hey, Terence, thanks for a super informative talk. Can you say a little bit more about the three relay faults that happened recently? Like how were the malformed headers or payloads generated and like, how was that mitigated? How do we prevent that in the future? Right, I would say like the relay landscape, there's still a lot of work to do there because I didn't need more testing, I didn't need spare tests, I didn't need more end-to-end tests. And in terms of the fault, right, the first fault is FlashBot and then they failed to marshal the deposit and that point is too late because someone already submitted the header and stuff, they already have the signature, so that's not much you can do there and then they failed to marshal because they did not test the payload with the full signature embedded. And the second fault, and the third fault basically the second incident and the third incident are basically the same and I think the block relay, just so the block rock relay did not validate the payload which is what they're supposed to do, they're supposed to make sure the payload is validated before they pass it to the proposal but they just did not validate, yeah. Hi Terence, thanks for the talk. I was curious about the out-of-protocol CR lists and the proofs of transaction inclusion. Have you done any research into it? Is it practical? Like, would the builders or the relayers be actually able to calculate the proofs in time? Yeah, there's something still under research, I think like crease from FlashBots does just open a PR so I need to look at it, but high level I think like how it should work is just that say you're proposing a slot N, at slot N minus one you have access to the main pool and then you see some transactions that get filtered, you basically at slot N minus one you basically present those transactions to the relayer and the relayer will also present those transitions to the builder and then the builder will essentially build a block just include those transactions real quick and then send it back to the relayer and then like so at the end after the signature is done when you get the payload right there so they can include like a multi-proof to basically prove that the signatures, sorry to make sure that the transactions are indeed like, are basically indeed included in the payload. I don't know yet, that's something still under research. Hi, Alex Mead, Coin Metrics, thank you so much for your talk, really loved it. Just wondering if you would comment on your opinion about maybe non-public relays, do they exist? Is this a possibility? Just wondering. As for now, I don't know any non-public relay, I only know the seven relay that was presented in mvbboost.org and I also think that if, well if there is non-public relay, it's probably hard for us to know just because if Coinbase is using a relay there, it's threatening MAB, they're not open to public, I mean that's not something that we can easily find out. Why do we think that proves help in that like what he was talking about before, including our transactions, why can't you just, if the things that are not included once you see the actual payload disconnect from the relay, because I mean the relay, even if there is a proof, could always just not release the block and or the block could be invalid or it could be all kinds of faults that make the proof kind of meaningless and what you end up doing is anyway disconnecting from the relay in either case. So just, yeah, can we just make it simpler on ourselves and just, you know, you say please include these transactions if they don't you disconnect from them and next time that won't happen. No, yeah, I think, yeah, I mean I fully agree, I think that's possible and I think there may be a better solution, yeah. I think, yeah. Hey, yeah, apologies I've already missed this before but you said that 50% of the network is making use of MAB boost, do you have more like fine grain numbers in terms of like whether or not, like with the numbers of institutional stakers versus like home stakers, I just know like an institution that I work at, other security considerations, MAB boost is sort of like a centralized actor at the moment. Yeah, there is a site for that, I'm happy to share with you after, but yeah, there's a site that actually actively track like, for example, Lido, Coinbase, Binance, like the top three, like just like what relays are they using and stuff like that and what percentage for each relayer. Yeah, there is active tracking on that but I don't have the data in the slides, sorry, yeah. But there is a site for that. Given all of this, a validator that's deciding whether they should run MAB boost or use MAB or not, what would be your advice to them? I would say, this is a hard question for me personally and me and my team have different debates, I think for now it's too early to be using this type of technology unless you know what you're doing just because we don't have any like public info out there for, I would give you an example just like, for example, for the second incident, right, when the block rod relay has survived, it took them six hours to turn off the relayer which they could turn off right away. But as a public validator, it's like, we don't know, right? I don't have the access information to be like, hey, there's something wrong, I need to, I basically need to shut up MAB boost. People, the best thing can do, people can go to Twitter but I'm not sure that's like the best media for this type of things, right? So I would say wait, yeah, that's my advice. So you mentioned at some point that if you can't reach the relayer, your execution client, sorry, the consensus client will reach out to your execution client and build your own block, it'll default to that. So that makes sense if you can't get the header, right? But you mentioned there were two situations, one where you sign the header and return it and then you can't get a receipt of that. Is that not a slashing risk? No, you cannot get it right. So once you sign the header, you basically pass the header to the relayer, you don't want to use your local block anymore because now it's a slashing risk because now you're signing two blocks. So you don't default to building your own block in that situation? No. Okay, I misunderstood. Thank you very much, Terence. Amazing talk.