 I'm going to be presenting you the packet of beans so we'll see how that goes. I did not know that was a thing until this conference. Light clients, 3 and 2 light clients. How light is light. They're really integral to the 3 and 2. We're going to see why they're integral. What a light client is. And I'm going to try to give you something like the thinking behind how a light client works and how to think about the light client universe. So a brief introduction about me. I'm an Ethereum 2 developer at Chainstake Systems. I have a Twitter and GitHub. You might interact with me online. These are my profile things right now. Working on a project called Loadstar. We're building a TypeScript Ethereum 2 ecosystem. We started with the beacon chain. We're also building on a light client and we're going to be involved with helping with the developer tooling as well. So trying to onboard a 3 and 1 developer is going to be a 3 and 2 ecosystem. We chat on Discord so if anyone wants to talk to us we're available there and also Twitter and all the other relevant places. Chainstake. This is where I work. It's a great group of guys and gals. Everyone's based out of Toronto mostly. And we all work on protocol level blockchain development. They are all online so they're really great people. So this is what we're going to be covering. We're going to start off with just what a light client is, why we need light clients, why we need them in Ethereum 2, get a sense of what is a full node, what is a light client, what is a light node. It's probably pretty obvious but in Ethereum 1 we haven't really been using them. We use Enfiro, we use Metamask, it uses Enfiro, we use Etherscan. We're not really used to interacting with light clients and there's been a lot of research into making it more usable, making it more like a default experience. But we're going to need to kind of start there with Ethereum 2 because of some reasons that I'll get into. We'll go into some background about that you'll need, that we'll kind of want to cover before we get into kind of the meat of things, which is the actual light client and how we are going to be syncing data. So just some motivation, what is a light client? A light client is software looking to securely consume blockchain data and it has requirements that scale logarithmically to the total blockchain state. So logarithmically you kind of think of like this kind of curve as like the blockchain grows and grows and grows, the amount of data that you need to actually process it as a light client you want that to not really change very much at all. One way to think of it is like as like the total size grows exponentially then the total size that you need to verify things will mean it doubles. So let's say the chain went from a thousand transactions a day to a million transactions a day that would only require like a doubling of where a light client would need to process. So why do we need light clients in Ethereum 2? There are going to be first class citizens I think in like the new version of Ethereum. So we think about some certain cases where we're going to need them. Resource constrained environment. So I wish everyone here has a smart phone on them, but who here has the chain synced on their phone? No one? Okay. It's probably, why do you have MetaMask on your phone? MetaMask is all using Infera and Infera is a great service. Not kind of dissent, but you're kind of losing out on some of the guarantees that you get on the blockchain when you're relying entirely on Infera. One really interesting case for light clients, especially in like this very burgeoning ecosystem of blockchains is to be able to have other blockchains as light clients to your blockchain. So when we all had Ethereum 2, it would be really cool if Ethereum 1 blockchain was able to in some way validate the data of Ethereum 2. You can think of that as like a light client into Ethereum 2. And so if we make things easy enough, if we make things like small enough, it becomes more practical to do that in more places. Another thing to consider is in Ethereum 1, it's really easy for everyone to get the whole state of the world. In Ethereum 2, we are moving to the sharded architecture where it's not actually going to be feasible for everyone to store all the data, even if you wanted to, unless maybe you just ran out of a bunch of servers on AWS. Like practically speaking, most people aren't going to store everything. And even within just the protocol itself, the validators are going to be required to be syncing shards at different points. They're going to be required to be proposing new blocks for different shards. So they're going to have to sync up at that point. They're going to have to be attesting to different parts of the recent shard state through crosslinks. And so during these cases, they're going to need to be quickly downloading parts of shards that they don't have. So they're going to need to do that in a way that they can verify. They're not going to be using an Infera system to be doing that. They're going to need to actually have proof that what they're getting is legitimate. So with that said, let's just cover some things that some of you probably know that I just want to cover because it'll really help out just thinking and understanding of my files. So Merkle proofs are basically the key to the castle for light lines. And I'm sure a lot of you know this, a lot of you might know this, but we'll kind of cover it more briefly. So Merkle proofs are a way to verify some piece of data within a larger suite of data where you only need to verify a logarithmic amount of data to validate that something is authentic. So usually when we are wanting some kind of data, we're assuming it's part of a Merkle tree. And a Merkle tree is just a hash of successive hashes of successive hashes and you end up with this kind of root hash which says something about all the data underneath the pit. And very importantly when we're kind of thinking about Merkle proofs, we have this root that we trust. And so this scheme of verifying data only works when we have this root that we trust. So the roots are often like what's stored in a blockchain. So actually when you're verifying this Merkle proof, usually the only thing you know is the root. And everything else is kind of just unknown to you. And that's why you're requesting a proof in the first place. You're trying to get some information that you don't already have. So when we're giving a chunk of data, what we're doing with the proof is we're linking it from this bottom piece all the way up to this root that we already know. And so we're wanting to be able to recreate this root, which we do trust. So what we need with the proof, what the proof actually is, is it's one intermediate node per level in the tree starting from the bottom. So we're able to kind of build up to the very top of the tree, to the root. And at that point, you can compare this recreated root against your previously computed or previously trusted root. And if they match, then you kind of like have proven this entire chain all the way to the bottom. And so what's really cool about this is that you only need one node per level in the tree. And that grows very, very slowly with like the total amount of data. So that's why it's so important for these live clients, because it's only a small amount of work that you need to do to prove, even if there's a giant amount of data involved. Kind of a side tangent, multi-proofs. That's kind of just an extension of the multiple-proof idea. And a multiple-multi-proof, we are proving data about multiple pieces in this tree, or multiple leaves in the tree. And so very similar to a multiple-proof, you're wanting to build back up the tree to the root. But what makes a multiple-proof a little bit different is that you can share these intermediate nodes between all of your paths. And so you're able to be a little bit more efficient than just having one proof for one branch, one proof for another branch. We ended up taking advantage of this for Ethereum 2 live clients. Another thing to think about is the difference between proof of work and proof of stake live clients. So in proof of work live clients, it's really, it's kind of easy because everything that you need is within the protocol. You can download all the headers and you verify everything based on protocol rules. So this is kind of like the headers, you download all the headers, everything is available to you just from those headers. You have the headers, you verify the proof of work, you verify that the header points back to the previous header, and then that's all you need to do. Once you're at the final header, the multiple roots are right there, and then you can request the proofs. So the steps usually go sync up, and now you have, now you have, then you can just request proofs. And proof of stake is a little bit different because the headers alone aren't sufficient to verify proofs. You have to also keep track of the stake of validators because in this proof of stake world, we're governed by some super-majority stake. So we have to ensure we're on the chain with the most stake. The only way you can do that is by keeping track of the validators and their balances and how they have voted. So it's a different beast than proof of work, but it's an opportunity to do things a little bit differently and maybe do things a little more efficiently. I want to take another little detour to what's called simple serialize, which is a spec that we have for Ethereum 2. And it's a way to consistently and easily murcalize our data structures. So it's going to kind of go through, do my example. So what it does is we create a consistent way of creating these murcal roots for any kind of data structure that we have. So for example, we have a checkpoint. A checkpoint is an epoch and a root. So if we wanted to make a murcal tree out of this, we would just put each of these elements into chunks and then create a hash from those. We can get a little more complicated. We have a thing called a cross link. Then we can still create a murcal tree from that. We just pat it with some zero data. And then we can kind of compose these data structures. So here's another example of an attestation data which uses a checkpoint and uses a cross link. And so you can, because things are kind of built out systematically, it makes it very easy to, if you want to think about the proofs that we kind of went over earlier, it makes it very easy to be able to traverse the tree and create a proof for any sub-element of a data structure. So these structures can get really, really nested in the Ethereum 2 world. So we have, we put murcal roots within a lot of these data structures. We can link beacon blocks to our beacon state. We can link our beacon state to beacon blocks, certain trusted beacon blocks. We can link our shard blocks to beacon blocks. And so you can imagine, say like the data root within this cross link here, that data root is actually a murcal root of some other data which you can then, you can imagine there being a tree below that. And then so you can end up navigating down this tree further and further into the past or into a shard or back into the beacon chain. So we've like designed this in a way where it's, we try to make it as friendly as possible to be able to create these proofs that can take you where you need to go, where wherever you need to go is some place in the chain. So with that we can kind of, we can start finally, finally get to the, into my client itself. So let's start with syncing. So we kind of looked at the group of work model of syncing, which is download all the headers and then process headers one by one. So what we need to do here is we need to think, how can we get these trusted murcal roots in our proof of stake system? And can we do this simply? So smaller, smaller amount of data, fewer steps, that's better for our life line of needs. But what is trusted? Well, trusted is our roots that are attested by a certain amount of stake, two thirds roughly. But the state, the state votes give way to the change. So we need to make sure that the roots that we're getting are roots that mean something. So we can kind of think of two key insights that kind of change how we can progress in syncing. So one is instead of syncing headers by hashing one by one and progressing just one at a time, we can use the stake to skip ahead to a current trusted header without having to verify every single header one by one. Instead of using kind of the cryptographic trust of each previous header being included in the next header, we can instead, if we already know who's validating, we can use the trust, we can use, if we count all the votes, we can jump ahead to a place much further where we don't have to go one by one. So if you imagine like a blockchain where the validator's never changed and the balance has never changed, and we knew everything, we would be able to skip immediately to the very head of the chain if we saw that everyone was voting for that head. The trick with Ethereum 2 is that the validators are always changing and so we need to be keeping track of that and that makes it a little more complicated. So that brings us to another key insight which is instead of tracking all of the validator balances and all of the votes, if we can find a place where we don't have to do that, if we can find a place where we only need to track a few of the validators, then that will really lower our requirements for keeping track of the stake. So where do these two validators validate? Well, they validate across the committees. There's one place. It's very fast. These committees change every epoch. And so they attest to recent chart data. They attest to the beacon block route, which is great, but they actually, we have to account all the votes across committees in that case so that isn't helping. And they attest to recent checkpoints also as the whole validator set. So the cross-link committees aren't really a great place to keep track of our clients because we really need the whole validator set so they're not helping us out. So the other place that we can look is in the shard committees. And so these are the committees that are proposing new blocks within shards, at least in the current scheme. And these only change every 27 hours. So this is actually a lot better of a place for life clients because if we're wanting to keep track of the votes, we're trying to keep track of the validators, those things only really update very infrequently. And so because they update more infrequently, that's less work that we need to do as a life plan. So that brings us to the sync protocol. So what we end up doing is we sync to a certain shard. Oh, well, my time is up. But I tweeted out my slides and they kind of covered the rest of all this if you want. It's on HackMD and I'll put it in the HackMD for this presentation. Can you hear right now, can you hear my voice? We have slides on this.