 We work on the lighthouse client, so we're going to be talking about reducing beacon chain bandwidth for institutional and home stakers and I'm going to leave it up to Diva who's a bow guitar local and our peer-to-peer network expert to kick us off. Okay so first thing that we need to check in order to analyze how to reduce bandwidth for institutional and home stakers is how does the what does the bandwidth look like right now. We are going to be looking at data from Lighthouse, like Ash said. We work on Lighthouse is one of their production beacon nodes. There are many others, Techoo, Prism, Nimbus, Lowstar, we are the second most popular one and while the data we're going to be looking at is Lighthouse we are certain that this applies to any other client across the network so have that in mind. First thing to notice is that bandwidth associated to a beacon node is proportional to the number of validators. It has attached. You can see that it increases when the number of validators goes up but then at 64 validators it stays more or less the same. 64 is going to be a really important number in this talk. Ash is going to explain further why and why it's important. So yeah, we need to remember simply that we are going to stop at 64 and this is where bandwidth is going to be more or less stable. Now if we look at how many validators does every beacon node on the network has, we're going to have like main three groups. We're going to have those that don't have any validator attached only simple beacon nodes. Node is taking beacon nodes. Now we're going to have those that we are going to come homestakers which have less than 10 validators. In this graph you're going to see a very big bar that's the one with two validators. I hope you can see it. Those are a very big chunk of the network, almost 60% of the network. Then we have those that we call homestakers. Most of them have just one validator and then we have those that we call institutional stakers. Those are the one that have more than 64 validators which is the big bar at the end. Now why do we care about bandwidth to begin with? So there are a couple of downsides of high bandwidth. The first one is of course an increased cost. If this cost comes from running for example in the cloud where you're charged for data transfer and it's also a matter of diversity because this means that you're only able to run a full node in devices that support processing all the data that you're receiving from the network and for people that are able to pay for the services in which these nodes are being running. I see a few depths here. My colleges for example. We know that there is going to be an upgrade to the network that is going to bring a lot of a big big increase in bandwidth, like a lot. So we need to start tackling this problem right now before this happens. That's another big reason why this is important. Also another really really important reason is when we say for example a diversity problem the truth is if you are for example a home staker and you're running your beacon node with your single validator attached and then your brother or your sister starts playing video games then you're going to have a bandwidth spike and you might lose at the stations. You might miss at the stations blocks. So basically you're losing money simply because very very short spikes in bandwidth. So yeah we don't want that. Now let's check where is the bandwidth coming from. For the sake of this talk we're going to separate protocols between discovery and non-discovery. So why is that? Because discovery has its own transport, its own encryption, it's over UDP whereas all the live P2P we're going to call them live P2P protocols they're over TCP. We have gossips of and their request response. The one we are going to be focusing mainly in this talk is gossips of. Now that we have this separation between let's say discovery and non-discovery let's check where is the bandwidth coming from. This is what the bandwidth, total bandwidth looks like for one of our nodes for the last couple of days. Is it discovery? Is it live P2P? Where is it? So now we have two graphs. The purple one is going to be discovery. It's all the discovery bandwidth and the other one in the bottom is the live P2P bandwidth. Now forget about the scales. This is just like a non-node very science analysis of the shape of the bandwidth. So I guess we can more or less agree that bandwidth has the shape of the live P2P when. Whereas discovery is well it's very magical as we will see. So the total bandwidth is around 250 kilobytes per second. Is it better like this? Yeah? Okay. About 250 kilobytes per second. Then we have live P2P which is around 200 kilobytes and then discovery which is between 40 and 60 kilobytes per second. So for the sake of doing analysis over bandwidth, discovery is really not important. So we're going to focus on live P2P. So I was saying we have Gossip Soap and required response. We're going to focus on Gossip Soap. So in Gossip Soap, Gossip Soap is a published subscribe system, protocol. We publish messages to the network. Those get disseminated to other peers and we receive messages to subnets or to topics to which we are subscribed. We have large amounts of data and those we split between subnets. Again, here we have 64 attestation subnets which, again, age is going to be talking further in a moment. And then we need to think how many subnets are we subscribed to since this is the reason why we get so much bandwidth. And the reason is that right now is we can only subscribe into one subnet for each validator that it has attached. And yeah, well, subscribing to a subnet means a lot more bandwidth. So a very simple example of how Gossip Soap works. Gossip Soap has a lot of parameters. But one that we care a lot is the mesh degree. So this example is three mesh degree. So we are going to start with our guy in the middle. That one. I hope you are seeing the red circle just right in the middle. So this guy has a validator attached. It's going to publish a block. So it's going to pick three peers. It's going to disseminate the message. And then each peer is going to do the same with some other three peers. And so on. And so on. So if you think about this for a moment, you're going to realize that it's very likely that a node is going to receive one single, one unique message a lot of times from very different nodes across very different paths in the network from the source to themselves. Okay. So why do we like Gossip Soap? Or why do we like the way we do this message propagation? So it makes for a very robust network in the sense that we are sure that if we publish a message and we have this kind of topology, the message is going to reach all other nodes in the network in a timely manner. We also know that we have low message latency. That means the time that is spent between the creation of the message in the source node to when you receive it is not very long. And this is, as I was saying, because of lower paths, smaller paths, lower hoops. But then again, we have a lot of duplicates and this redundancy results in high bandwidth. This is a graph that we have for nodes. So this is like real data for the Gossip Soap duplicates. Now, this is going to be weird, but I'm going to ask you to focus on this number, the beacon block. So we know we need to publish one block issues lot. So we know for this topic, we're supposed to receive a single unique message. However, we are getting six or seven messages across the network. That means an amplification factor of about seven. So imagine what happens if we were able to reduce that kind of duplicates. That's like a huge, huge gain there. Okay, so a summary of the current state. So bandwidth is proportional to the number of validators a node has. H is going to speak about how we intend to tackle this. And the other one is the high amplification in Gossip Soap. How are we going to reduce duplicates without harming the network? So I'm going to leave Ash to continue this talk. Yeah, thanks, Diva. So I'm going to try and have a crack at explaining how we can potentially reduce some of these. So I'm Australian and I can see a few Australians in the audience. And I think they'll agree with me that we have terrible internet, like ridiculous internet. I run a validator at home and someone that watches Netflix in another room causes me to lose attestations. So as a disclaimer, I have a personal vendetta to try and actually reduce this bandwidth so that I stop losing attestations. Yeah. So there's probably two solutions we have to these kinds of problems, which I just want to kind of briefly touch on and give you a feel for what we're trying to work on. One of them is called minimizing topic subscriptions. Let me try and explain that if you haven't already covered some of this kind of stuff before. So a validator in an epoch needs to do some kind of, needs to publish some messages on these things called subnets. In order to do that, you need to have peers to be able to publish those messages on those subnets. Now, the problem with this is that peer discovery is a slow process. It takes a while to actually find some peers. And it's even harder to find a peer on a specific subnet that you need to publish an attestation. So the problem we've got essentially is that we need a stable set of peers that exist on each subnet that we can find them easily when we need to publish a message. So it's not really an easy problem to solve. The way that we currently have this solution, I guess, is that every time a validator is attached or sends a message to your beacon node, that beacon node is required to subscribe to one subnet for a very long period of time. It's about 27 hours or 256 epochs. So what that means is the more validates you have attached to your beacon node, the more subnets you need to subscribe to. And subscribing to a subnet means that you have to get all of the messages on that subnet. You need to verify those messages, and then you need to send them on to other peers. So you're doing this, that's a lot of bandwidth and a lot of processing. So essentially, you're supporting the network at the cost of bandwidth on your node. So you're kind of being like a good actor. So there's a number of downsides to this approach. One is that it's not enforceable. So what that means is you, as a beacon node, you can essentially lie to us and you say, oh, I don't have any validators, even though you've got like 2000 attached to you, and you just don't subscribe to any long-lived subnet so you don't consume any bandwidth. So you kind of incentivize to do this, and there's no way that any other node on the network can tell whether you're lying to us or not. The next thing is potentially our subnets are oversubscribed. We actually have quite a large number of beacon nodes on mainnet today. And when we originally designed this kind of process, we didn't really realize how many nodes would be participating in the network. So potentially we have more nodes than we need on each of these subnets. So that would lead us to think that we have excess bandwidth on the network that potentially we can remove. So the idea that's being proposed is why don't just every single beacon node on the network subscribe to one or many subnets. There's some benefits to this, but the general discussion is in this specs repo as an issue. So what would the bandwidth graph look like if we did this? If one beacon node was subscribed to one subnet, rather than having it being proportional to the number of validators attached to you, you would then get this green line on this graph, which hopefully everyone can see. And that green line sits at around 500 kilobytes a second at the moment from what we've measured. So essentially everybody on the network that has a validator wins in this scenario, except for those that have one. So sorry for the people that only have one validator in this room. You guys then have to, your bandwidth will increase by about 500 to 100 kilobytes. The other benefit that we get from this, so from a quick scan that we did of the DHT or the current nodes on the network, we found that 57% of nodes apparently don't have any validators attached to them. So they're either lying to us or they actually don't have validators attached to them and they're not participating in any of the subnets. So if we do transition over into this state, they actually have to start participating in helping out and it lowers the bandwidth for the institutional stakers or the ones that have high number of validators. Yeah, so essentially all the beacon nodes are now contributing to helping with this subnet stability. So it kind of sounds pretty good that we get a massive reduction in bandwidth, but there is a cost. And the cost is how many, what is the density of nodes that you can find that exist on these subnets? So if no one was subscribed to there, we wouldn't be able to find those peers in order to publish our messages on these subnets. So we still need a decent density so that if you just randomly look through the peer set, you can actually find nodes on those subnets and kind of hold onto them. And so this graph here is a distribution of the current density and what the density would look like if we switched to having one beacon node per one topic. So at the moment, it looks roughly like 8%. So you randomly pick a node, there's an 8% chance that it's going to exist on any given subnet. If we switch to one beacon node per one subnet, it drops to about 1.5%, which is 1 on 64 as you would expect. The benefit of that is that it's configurable. We don't have to say one beacon node, one subnet. We can say one beacon node, two subnets, one beacon node, three subnets. So we can adjust that kind of differential. And whether that's feasible is dependent on the number of nodes on the network. So that's something we need to measure. But the benefits that we get from this is essentially for the institutional stakers, which represent roughly like 10% of all the validating nodes on the network, their bandwidth will drop by about over 90%, which is kind of handy. And as I mentioned before, everyone that has more than one validate will have a reduced bandwidth. So fundamentally, we're talking about do we need all of these nodes on the network to support these subnets at the cost of the huge amount of bandwidth they're currently using. And so potentially it's not, and it's customizable. The second thing is it's also enforceable because we would tag a beacon node's node ID to a subnet. And so when we connect to a beacon node, and it's not subscribed to that subnet, that we know that it's being naughty or lying to us and we can kick it off. So we actually have the enforceability property. Yeah, so that's one solution. And that's kind of a low-hanging fruit. It's kind of easy to do when we get substantial gains. The other thing that Diva was talking about is message amplification and gossip sub. So if there's anything we can do about that. The idea that we want to try and push forward is a concept called Episub. So gossip sub is a protocol that exists in libp2p, as Diva was mentioning. And the libp2p guys, which is kind of run by protocol labs, have had a evolution stage of gossip sub. They've talked about Episub for quite a period of time. They've done a bunch of research, in particular Viso or Dimitri, who works for protocol labs has had this kind of vision for Episub for a while, but has never had the push or the drive to do it. But now that we're seeing quite substantial amounts of bandwidth, it's probably time we try and realize this thing. So I just want to briefly go over the concept and what it's going to do and how it could help us with bandwidth. Okay, so Episub. The biggest problem that we have is that if you have a high mesh degree, as Diva was saying, that means you have a high connectivity of nodes in your network. So every node could be connected to another 10 or eight other nodes, and then you get huge message amplification. So every time you send one message, most of the network will probably have to download it eight times. So if you increase that message by one megabyte, you're actually increasing it by eight times eight megabytes because that's the amount of bandwidth that has to be downloaded and then propagated. So naively, you would first think, okay, why don't we just log with a mesh degree? I did try to do that at one point, but as everyone points out, that's not a safe thing to do. We don't really know how the network is going to behave. We can try it on test nets, but the topology or the structure of a test net doesn't look like main net. If we just lowered the mesh degree to two, for example, a main net, you might just not start receiving blocks. So we have to do that with either great care or with a lot of testing. So it's an interesting idea, but we probably do better. So another idea is to dynamically adjust the connectivity. We also run into a kind of a similar problem, and typically a lot of the people that use our client, they suggest lowering the peer count, which is, it's a false, it doesn't quite help you. So that's also not the best solution. So the idea with Episub is to try and minimize duplication and latency by making the mesh. It keeps the same connectivity, but making it a little bit more efficient. And by efficient, I mean we're trying to remove the number of duplicates, and at the same time, either maintain or lower the latency inside the mesh. And the mesh is just a subset of your peers that you're connected to. That's where you receive your messages from. So that kind of sounds like we're trying to win on two fronts. We get less bandwidth and higher latency, which kind of doesn't sound like we should be able to do it, but let me explain the general principle and maybe it makes sense. So the general principle is you're a node on the network. You're receiving all of these messages. A lot of them are duplicates. You want to just start collecting statistics over which nodes are sending you these duplicates, and which ones are sending you late duplicates. And you'll end up having some distribution of the number of duplicates you get and the latency. So it could be that Paul over here in the front constantly sends me some late duplicate three seconds late, whereas Sean's sending it to me straightaway instantly. So I come up with essentially a choking strategy is what we call it, where we look at all of the distribution of people sending us duplicates and the ones that are late or the ones that have lower latency, and we send a message called a choke. So in this example we say, I'm going to choke Paul because he is sending me late messages all the time. What the choke message does is it indicates to Paul to no longer send me these messages over the mesh. Instead he does a process called gossip, where he just sends me like the message ID, which is a much smaller thing. So over time, eventually, based on your choking strategy, you should have a more efficient mess where you're only receiving the messages with a lower number of duplicates, maybe from just one or two peers where the rest of them are choked, and you still receive gossip from them. So if the two peers that you have in your mesh are slow, and the other ones that you have choked are started sending you message IDs before you send them in the mesh, we have an unchoked message. So we can unchoke them and put them back into the mesh. So ultimately the idea would be is that you're dynamically changing your mesh and how many, what peers are sending you messages and how fast they're doing it, in order to try and make it essentially more efficient. Hopefully that makes sense. I'll take questions afterwards. Does this work, I guess, is the question. There's been some preliminary simulations on this. As I mentioned, Viso from Lippie to Pea and Protocol Labs has done a work, done some work on this. He's built a generic simulation for the Go version of this with 250, 500, and 1,000 nodes. And so this is with a mesh degree of six, so there's roughly about a six times amplification if you look at the messages in these simulations. But pretty much in all of the simulations, what this graph is showing you is that you get roughly a 50% reduction in the duplicates from Episub. Now, as I said, there's a choking and an unshoking strategy which are left somewhat generic, and I think we can tune this to be significant, especially if we target it for the Ethereum network. I think we can get better versions of this, but at face value it looks like you can reduce the duplicates by 50% just by adding these kinds of messages. The next thing we probably should talk about is latency. I kind of suggested that we could get reduction in duplicates and reduction in latency. The initial results that Vizo has completed in simulation seems that the latency increases. So on the left is a gossip sub latency distribution where the buckets are like milliseconds since you receive the message and on the right is Episub. So we receive more messages with higher latency in these simulations. But as I was saying, these are somewhat generic. The topology of these does not look like the Ethereum network that we have, and these can be highly tuned to what we need. There's a lot of research work going in there. There's a few people in the audience where I think that I've promised results for a mainnet version of this. So us at Sigma Prime we've built essentially a production version of Episub inside gossip sub. The advantages are that it's backwards compatible so we can release it in our clients. It'll work existingly with every other peer and every other gossip sub node on the network. But if there happens to be another Episub node on the network we can start getting this bandwidth minimization. So the fact that it's backwards compatible is super handy. We can kind of just release it whenever we want. We're also working on a mainnet simulation so that we can actually simulate the bandwidth that exists on mainnet and then apply Episub and then get essentially more specific data that's more robust and how we can actually apply this to what it would look like on an Ethereum to mainnet. These we'll publish results from this very soon and if you're interested just let us know. So the title of the talk was reducing, if I can remember the title of the talk was reducing bandwidth for institutional and home staker. So if you're an institutional or a home staker and you came here being like how do I get all these bandwidth gains? What do you need to do in order to get a 90% reduction? The answer is nothing. You just have to wait, we'll release it, maybe run a lighthouse node and hopefully we'll be able to reduce the bandwidth. But that's the end of our talk. Thank you. Some minutes for questions. Great talk guys. Is this something that's backwards compatible meaning that clients can gradually roll this out as time goes or is this something that we all have to upgrade at once? Here we're handling basically protocol versions. So we can run against nodes that are Episub compatible and if they're not then we're just going to run Episub version 1.1. So yeah, that's what we mean with that. Yeah, we add a protocol ID into the gossip sub and so when you connect to a peer you can identify which protocols they support. If they don't support Episub you don't choke or un-choke them. If they do support it you can choke and un-choke. So it works perfectly with 1.1 nodes. Yeah, I would just want to ask why do we want to reduce bandwidth for institutional stakers? Isn't it kind of a nice property that there is no economies of scale at least if it's enforceable that they have to subscribe to things? It just seems like, I don't know, it's weird to me as a concept to shift the load from institutional stakers to like normal beacon nodes or like, I don't know, home stakers even if it's like a little bit but just generally seems like we want those, the lack of economies of scale. Good part of this is the part that Ege mentioned about being enforceable. So having 60%, almost 60% of the network being nodes that don't have any validators, we're not so sure that's true. Those might as well be institutional stakers. So the truth is that this is more like about being fair across the network. So it's more about that targeting something that is going to be better for institutional stakers. It's just that they happen to get more gains because of this. So I know with 4844 we're going to be exploding our bandwidth costs. I was curious if you guys knew if that scales with the number of stakers you have or number of validators you have. So just the question is that we're going to increase the block size in 4844 and how that applies with here. If that scales the way that under current Gossip Sub our bandwidth increases with the number of validators you have, I was curious if that was the case for the increase in 4844 as well. Yeah, it's not. So the bandwidth increase is due to the subnet subscription. When you increase the block size everybody feels that because everyone subscribes to the block topic. Yeah, so that's felt uniformly. Here the part that matters for what Mark asked is mainly the amplification factor and not that much how it behaves regarding the number of validators. So since we have an amplification factor of about six so now with this improvement we're going to have huge block sizes which if we continue doing things the way we are going to be sent across the network six times each block. So that's insane. But yeah, so it's related more about duplicates in that part than the number of validators for each node. So you showed that by modal distribution where there's a lot of one validator nodes and then there's that 60% that were just regular full nodes like not validators and then the institutional nodes with like more than 64. So that, I understand how the Episub like helps everyone and it like helps the home stake or helps the Netflix problem. For the institutional stakers they're all running like data center nodes anyway so I don't understand as much how the like subscribing to fewer gossip subtopics helps. So I think this is a very similar question to over here. So I think fundamentally we were looking at as a whole the network may be consuming more bandwidth than we need to be. Depending on how you build your client you can be clever about which peers that you connect to. When you have, so at the moment you have these institutional nodes, right? They've got the, they're not just institutional. Some people there's a usually parameter I think in most clients which is called subscribe or subnet. So you can even if you have one valid you scribe to all the subnets and the reason you do that is because you get some benefit from seeing all the attestations you get a slight increase in performance. So it's not necessarily just institutional people. But the institutional, I forgot what my train of thought. No, I forgot what I was saying. All right, I did have a point. Maybe here. Part of what Edge was saying is that well subscribing to a subnet we also advertise that we are inside the subnet inside discovery. So that means that we need to find peers using discovery that are useful for us. So one strategy that people who have the bandwidth for that used to be sure that they have publishing all in time and everything is timely is subscribing to all subnets regardless of how many validators they have. This is what Edge was saying about maybe it's not exactly true that all nodes that have more than 64 subscribed subnets actually have more than 64 validators attached to them. Yeah, so I remember my point. My point was that we have these institutional stakers or people that are subscribed to all the subnets and they become more valuable than every other node on the network from a client perspective. If you have peers that are connecting to you and one is subscribed to just one subnet and you have another peer that's subscribed to all 64 you're more inclined to be like I wanna keep a peer connection to that guy because in case I need to send a message on one of the subnets he or she is connected to that subnet. So you're left with like 10% of these nodes are super valuable nodes across the network whereas everyone else you kind of just throw them away they're not all that valuable whereas if we transition over to this thing one it's enforceable which is something that we want two we're not entirely sure whether the amount of banter for using across the entire network is necessary so we reduce that as a I guess maybe a side effect and the third example is that all the nodes are equally kind of valuable to you and because we tie it to the node ID it makes the node ID specifies which subnet they're supposed to be subscribed to so when we're doing discovery queries we can actually search more efficiently for nodes on a specific subnet because it's tied to that node ID so there's a number of benefits it's not just we're helping institutional stakeholders that's just kind of like a byproduct. Hey, cool to talk to you guys thanks so you mentioned reducing bandwidth and reducing latency to avoid missing us at stations something that we've looked at as well is like effectiveness ratings with validators can you guys speak which can result in like penalties or loss reduction or reward can you guys speak to how this would help with effectiveness ratings if it would? I guess the first one is kind of what I intro'd with my personal attestation effectiveness drops when someone watches Netflix so I imagine for a lot more home-stakers that are kind of on the bandwidth limit or their upload speed is quite low like in Australia and we reduce the bandwidth requirements a lot Diva said that when you have these peaks in bandwidth you can miss attestations you don't publish them in time and it's not just missed but you also get penalized if they're late so even if they still get included in a block but later you get less so that lowers the attestation effectiveness. From a lot of the people that use Lighthouse we find that there's a number of main reasons that impact the attestation efficiency and that's one is bandwidth and the other one is CPU limitation like processing so if you're running a node that's overburden and the topic subscription is another thing that can overburden a node because if you're subscribed to a lot of nodes let's say you have five validators attached to you and you have five long-lived subnets you have to get all the messages and you have to process them so the processing also kills you and lowers your average effectiveness so if we get that in there that all should in principle improve the effectiveness because of the bandwidth and because it lowers the CPU usage of your node. Would it be possible for a node to detect that it's been choked? If so, could it somehow combine that with being dishonest about its subnet count and other grief or otherwise stall its local node graph? Sorry, Mr. Lastpart? Could it, if it could combine that with being dishonest about its subnet count connection could it just stall or otherwise grief its local node graph? Oh, wow. Choking is explicit. So I know this is actually going to know it's going to be choked. We're asking him to stop sending messages to us. It's not, I know it sounds similar to what is happening for example in strategies, for example in I don't know, file sharing but it's different in the sense that we're the ones asking the peer to just stop sending the messages. So it's kind of a benefit for them. So when we connect to a peer, let's say in this new regime where we have, every node has to subscribe to a topic based on its node ID. We can look at its peer ID but let's say it's node ID. We know that it's supposed to be subscribed to let's say topic three. The way that we, when we connect via gossip sub they should send us subscriptions about what it's subscribed to and if it's not subscribed to that we know that it's being malicious or faulty straight away and we can just disconnect it. There's technicalities where it can say that it's subscribed but then won't let us, so the next phase of the thing is we try and create a mesh network. So if we're connected to let's say a hundred peers we only really form a mesh with the mesh degree with like three of them. So in principle they could always just say no, no you can't go on my mesh, you can't go on my mesh, I'm full, I'm full, I'm full. So that's one way to kind of grief us. In that scenario they still have to forward this gossip sub messages. There's a small mechanism inside gossip sub that is called gossip sub scoring is introduced in 1.1 which somehow mitigates, attempts to mitigate censoring so where you just say that you're subscribed but you do nothing and the whole network essentially tries to kick you out of the mesh. In terms of while being choked we only choke people that are in our mesh. So only people that are subscribed and then once they've grafted with us so we form this mesh connection which means they have to be subscribed, they have to be sending us messages if they're not sending us messages 1.1 scoring will kick them out and then we choke or unchoke them in order for them to be malicious and try and cheat that system the choking abstract, the choking strategy is abstract and can be implemented independently on each node. So you don't know on each of your peers what is specifically their choking strategy they could just pick random nodes on their mesh and choke people. So I think there is an avenue of security to look in there that hasn't been done yet but I imagine we can probably solve that with some of the scoring parameters in gossip sub. Thank you very much, Diva. And thank you very much, Adrian. Thanks.