 green forward, red back. Hello? Okay, great. Hello, everyone. My name is Robert Habermeyer. I work at Parity Technologies. And one of the projects that I worked on was a light client for Ethereum. If you're not familiar with Parity, basically we build an Ethereum node. So I'm going to talk about one mode of operation of that node. So first, I'd like to split up the classifications of Ethereum clients. So actually we can split these pretty easily into three categories. The first would be a full node. So when you run a node today, you're mostly just running a full node. And that means that you download all the block headers and you check the consensus process. And then you download the block bodies as well. And you check that all the transactions have been executed correctly and that they're actually valid. And this gives you basically full security. You've checked absolutely everything you can. But what we've seen is that as the blockchain goes heavier and longer, it takes a very large amount of time, resources both for storage and computation and memory, to actually synchronize a full client and use it. So it's not exactly user-friendly in this sense. So if you wanted to run a client on your laptop, you couldn't necessarily run a full node. So the other kind of client that we talk about is a light client. So in a light client, what you do is you verify the block headers, but you're not checking the transactions. And that means that you're checking that mining or staking has been done correctly. But you're not actually executing those transactions to see if the output state is correct from the previous state and the new transactions that have been added. And the last kind of client that we have, which I'm not going to talk about too much today, but is relevant to the protocol that I'm going to discuss now, is a thin client. So a thin client isn't checking the consensus process itself, but rather has somebody else do that checking for them and checks it and trusts that and then confetch data about the chain in the same way that a light client would. So a light client has these security guarantees. So it's checking the validity of headers. It doesn't check the validity of state transitions necessarily. And this might leave us open to some kind of attack. So perhaps a miner would introduce an invalid state transition. And the thing is these kinds of attacks are not particularly likely because they have to be, first of all, targeted. So you have to target a specific person. And that's because of the assumption that we have that miners will only build on top of valid blocks or at least a majority of miners. When we talk about the group of miners as a whole, we say they're acting honestly. So they couldn't necessarily perform this kind of attack on the whole network, but on a specific targeted person they could if they took up all their peer slots and started sending them fake headers. But we also have some kinds of routing mechanisms, for example, something called K buckets, which are Sibyl attack resistant, which means that it's actually, if you have an established peer slot, it's very difficult for an attacker to fill up those peer slots and push your valid peers out. So I would say light clients are very useful for low value or medium value use cases. But of course, if you're transacting millions of dollars, you would probably want to have the security of a full node because that kind of activity would leave you maybe open to attack. And those kinds of attacks just to clarify are really just about tricking a user into taking different kinds of actions based on showing them a wrong state. So they can't actually steal money from them without the user taking action themselves. So we have some network protocol goals. So the first would be to minimize round trips and bandwidth because in a light client, you are requesting data from your peers. And if you want to run this on a mobile device or on low bandwidth connections, you need to minimize those round trips. You want to reduce that latency. We also want provable data, which means that you don't rely on any more trust assumption than the fact that miners are updating state transitions correctly. And of course, we have full clients who are serving the light nodes over the network. And we don't want these guys to get dosed. We don't want them to get overloaded. We need a sensible way to meter the requests that are being made to them. So in the next section, I'm going to basically discuss a few of the tools that we use to achieve this. So the first is back referencing. When you make a network request, what you can do is put multiple requests in the same packet, but further requests in the packet can rely on unresolved outputs of the prior request. So for example, you can say I have a block number and I want to get the header for that block. But once I have the header for that block, I want the state of this account at that block. And once I have that account, I'd like to inspect some of its storage. And this is actually a very powerful tool because we can express very fluid kinds of requests in a single round trip to the network, which means that the response time of the light client is going to be much lower. Another tool that we have is mostly useful for doing things like checking the result of a transaction execution on the light client, which is that what you could do is start to execute the code. So you fetch the code from the network, and then you see that the code execution needs to get some account. So we go one round trip to get that account, and then back, execute a little further, and we get some more data from the state. And you see that over time, we're jumping back and forth into the network just to execute that transaction and see what the result would be. But rather, what we can do is get a full subset, the subset of the state tree, so the state is stored in a tree, which is necessary to execute that transaction fully. So if you are a client writer on a, if you're a dapp writer for the light client, and you're calling this eth underscore call RPC to see what the output of some transaction would be, this can be resolved in a single round trip on the light client, as opposed to many. And it's unpredictable, actually, what kinds of data are going to be requested, even if the contract state itself has not changed. So another thing we're looking into is ways of expressing that a contract in fact doesn't do things like branch on the amount of gas provided to the transaction, that we can prove that if this contract state itself has not changed, that the output will always be the same, and that will let us re-execute from this cached state proof over and over and over again until finally it does change. That's very useful for reducing bandwidth as well. So the metering system we use is something that we would call request credits. So serving nodes are giving peers credits. So these credits are pretty arbitrary. Each node comes up with its own pricing scheme, which is based on data it gathers over time on how long these requests are taking to serve. And then the light client peers which want to request data are sort of spending these credits to get things like blockbodies or to see what the outputs of transaction execution would be. And we could make this micro payment based in the future. That's another active area of research because in fact in the future I think it's very unlikely that your average person will be running a full node. Yet we need full nodes on the network to be able to serve the request to light clients. So you need an incentivization layer and request credits, although right now they're simply just time based, just provide a sort of foundation that we could build micro payments on to in the future. And it's a very generic mechanism that we could plug a whole lot of different schemes into. And of course these different requests have different costs. So in the case of getting a transaction execution, you're actually paying per unit of gas, per unit of computation, whereas if you're fetching a bunch of headers, you're paying per header. So another cool thing is publish subscribe. So that's kind of fun to bake into the protocol because when you're polling for things like log events changing, and that means that on the light client you may be checking bloom filters and things like that and fetching a bunch of headers from the network, which may be false positives. So in fact the full node peers know better than we do when something has changed. So allowing subscriptions at the protocol level means that we can have a much more efficient light client. The caveat is that it's semi difficult to catch out peers who don't publish when they're supposed to. So for example, if we're tracking log events in some contract, and this peer is supposed to send us a message when log event is issued but doesn't, it's fairly difficult to catch them out without doing some kind of periodic checking of the validity of data. But if the network is mostly honest and the damage that they can do by omitting log events or such isn't that high, it's very unlikely that they would do it. So I would like to, now that I have you all here, make a little bit of a block pruning proposal. So for example, right now we have all nodes storing all the block chain. And that's a pretty high storage requirement. And in fact, we can do much better. So one thing we can do on a light client is actually store almost none of the history. And we can do this through a tool called canonical hash trees, which we'll abbreviate to CHT. And the basic idea is that for every, and these, so Geth actually has canonical hash trees, but they work slightly differently. And I can explain the difference. But in parity, what they do is every batch of say 2048 or 4096 blocks, you basically make a tree, like a Merkel tree, mapping block number to the canonical block hash as the light client has seen. So the light client syncs the network and it sees that in ancient history, block one had a certain hash. So we have a little diagram here. And they can make this table and essentially condense this tree made of this table down into a single 32 byte hash. So instead of a light client storing all those 2048 block headers, it's just storing a single 32 byte hash. And if we have a full client peer, who has the same history, they can give us a Merkel proof of that 32 byte hash of the tree starting at that 32 byte hash to give us that block header, even though the light client itself has discarded it, which is a pretty powerful tool. It means that instead of storing megabytes and megabytes or possibly gigabytes of headers, we're just storing a few kilobytes, no matter how long the chain gets, even to the hundreds of millions of blocks. And the reason we choose to have separate trees is so that the light client can calculate them as it goes, and furthermore that full nodes can recalculate them on the fly. So the full nodes don't necessarily need to store these roots, but when it receives a request for the canonical block hash of a certain block number, then it can rebuild, reconstruct that tree on the fly when it's going to serve that request. So we can do something similar on full nodes. So we already use this distributed hash table style routing in the network protocol, although it's not really used for anything. Right now we expect that all nodes are storing all data, which is an assumption that we'd like to relax. So we have this idea of splitting up nodes by node ID. So every node has a 160-bit ID, or it may be a slightly different number, but it's not particularly important exactly how long the ID is. But we split up, in the routing mechanism of each of each node, we split up all the peers that it has into buckets prefixed by certain IDs. So that means that each peer is trying to have a certain number of other peers whose IDs are distributed fairly uniformly across that ID space. So in the DHT, or distributed hash table world, what we're doing is saying nodes with an ID close to the hash of the file we're trying to store are responsible for storing this file. We can do something fairly similar for block pruning and say nodes whose ID starts with 0001 are going to serve the first out of every batch of blocks. So we can split up the whole block chain into, say, batches of 2048 or 4096. And we can say, okay, these batches are going to proceed in cycles of 16 or 32 or 256. And a node whose ID starts with 0001 is going to serve every one out of 256 batches. Or the person whose ID starts with 0004 is going to serve every fourth batch out of every group of 256 batches. So the amount of space we can cut down is really determined by how many peers the average node is likely to have. So if we have, right now, it's not really reasonable to assume that every node is going to be able to connect to 256 nodes. But 16 or 32 is fairly reasonable. And that means that you can always find a peer who's going to be able to serve you the next 2048 or 4096 blocks. But those peers themselves have cut down their storage requirement by quite a large amount. So I'm actually really interested to hear what you guys think about this proposal. Please find me after. I think it's a pretty good idea. So another thing that we can do on light nodes to get up to speed really quickly is warp sync. So the way it works right now is that you have to synchronize the whole block chain, essentially. You have to go from header zero from the genesis and walk the whole chain all the way to the head of the chain. But in fact, most of the time users don't care about that very ancient data. So we can introduce something like warp sync to try and get users caught up to the head of the chain quickly. And in the current formulation of Ethereum, you have this sort of thin client security level as soon as you have warp synced. But then when you download the ancient block headers later, you can jump up to like a regular light client level of security. So you have exactly the same security proposition as if you had synced directly from the genesis. But in future versions of Ethereum, most specifically EIP96 or in proof of stake that finalizes blocks economically, you can jump directly to the head of the chain basically without sacrificing security. And ancient block download might still be useful in that case if you want to see what the state of a contract was months ago. But that can be done say on Wi-Fi on a mobile device where you warp sync when you first start the app regardless of your connection. But you don't want to waste the bandwidth over data to download all those blocks, all those ancient blocks. So you wait until you're on Wi-Fi to finish that process. And that's a pretty useful way to do things. So I'd like to talk a bit about some RPC pitfalls. So if you're developing on a light client, what works and what doesn't? Because we have all these RPC requests that seem fairly simple, ways to gather data about the blockchain from the node. But some of them under the hood might be doing a lot of work even on full nodes and on light nodes, it's important to know actually which requests are going to be the most expensive. So first of all, getting logs can be very expensive, especially if you're trying to get all the logs in the history of everything. That means that if you're on a light client, which isn't storing all the headers, you need to then download all these headers, then check bloom filters, which may have false positives, and then finally get receipts and scan for logs. So that's a lot of bandwidth. Whereas if you're just watching the head of the chain for log events, it's actually fairly, fairly performant. But if you're trying to get historical logs, it can be very difficult. And the reason is they aren't embedded into that Merkle tree of the state. It's why they're cheaper, but it's also why they're more expensive to request for more ancient data. So there are a few approaches that you can have. So you could have your peers search for you. But again, you don't necessarily know that they've given you all the data they were supposed to. You can search locally, which is what I just described, which is to actually go through this long and involved process of downloading everything. Or you can keep some metadata. So that's also another approach where the light clients keep some metadata that helps them search. But again, it raises that storage requirement significantly, and there's not really a good way to bring that down. And if we're trying to target, say, the mobile space where users don't have that much storage, it may not really be feasible. Whereas on the laptop case, you do have that storage, and maybe searching logs can be effective in that case. Another problem is getting stuff by hash. So for recent data, transactions and receipts, you don't know anything but the hash, if you just broadcast a transaction, because that transaction may have not even been mined yet. But the problem is hashes by design leak no information about the block number. So if it's a block hash, you don't know which peer to ask for the block whose hash is that transaction hashes are even a little bit worse, because you have to find out where in the blockchain that transaction hash occurred, and there are multiple different forks where that could occur. And actually with one of the new EIPs, we can have the same transaction, exact transaction included in the chain multiple times, which just means even more work for the light client to check. And for checking these transactions and receipts, it means that the full nodes have to maintain this fairly expensive index mapping transaction hashes to maintain this very expensive index mapping transaction. So that's what I'm going to do. I'm going to go through their location in the blockchain. And lastly, estimating gas. So these state proofs that I talked about before can be used for estimate gas and eth underscore call. But the thing is with estimate gas, the typical approach is to binary search to find out exactly how much gas a transaction will take. So you start at some very high upper bound, and then you try at a lower bound, and then you sort of narrow in, you zoom in on exactly where that transaction begins to take place. The problem is that in the general case, contracts can branch on the amount of gas provided. So that means that the same state proof that you fetched for the upper bound case may not be enough to re-execute on the lower bound case, and that means you might have to take more round trips in general. So this is something to be used fairly sparingly, only when really necessary, and also think about the contracts you write so that they don't play poorly with light clients. And lastly, I'd like to take a little bit of time to talk about the light client plus whisper ecosystem. So we've seen that light clients can be very useful in mobile devices, but then there's also things like whisper, which people are going to talk about a bit more, which is for decentralized encrypted messaging, essentially with darkness as well. So it's like a global published subscribe system where people can't actually see what topics you're listening for. So that actually makes for some pretty interesting platforms like status to emerge, where you can have dApps running on your phone that are messaging with people doing medium latency state channels. It could even be used, for example, as for the state channels, micropayments layer of the light client request credits. And of course, we have a roadmap. So this is actually something you can mostly use today. So it's in our beta releases. You can just run parity with the dash dash light flag. The UI is still being rebuilt around light client best practices because it was written mostly with a full client in mind, but actually we're undergoing this rewrite basically now. But if you access the RPCs directly, or if you're using the Chrome extension and you just have some dApp that's running independently of it, it basically will work. I even registered a few names on the ENS using it. It does work in that case if you have peers. And another really interesting project that we're thinking of that we're starting to do is compiling the light client down to web assembly. So we're writing everything in this language called rust, which is very low level. It sits at roughly the same level as C or C plus plus, which means it has no garbage collection or runtime. And that means it's going to be compiled down to web assembly. And that means that we could have this same light client implementation storage embedded directly into a browser window. And I think that'll be very interesting to see how dApp developers use that. And if you'd like to follow the progress, I would recommend you do so at the parody text slash parody repository. You guys are developers and you love open-source, so you better get writing. Well, thank you very much. You