 Before I explain what the fault-proofs are, I'm going to motivate why we actually need them and what is the problem that we're trying to solve. So as we all know, at the moment, there's a really huge trade-off, or it's thought to be a huge trade-off between how much decentralization that your blockchain system has and how much on-chain throughput you can get. And effectively, the argument goes that the bigger the block size, the bigger the blockchain, the more expensive it is to run a full node, and so everyone will have to run a light client instead. And the problem with light clients is that they accept blocks that are invalid. So if a block contains invalid transactions, and they will happily accept that block, if the majority of the consensus has actually sort of gained consensus on that block. So for example, if there's a 51% attack, then you could trick a light client into accepting blocks that effectively generate money out of tenet or double-spend or break the protocol rules and so on and so forth, because they effectively assume that the majority of the consensus is honest, which is a really bad assumption that we should try to eliminate. So the big question that we might want to ask ourselves is how could we actually make it possible for light clients to reject these invalid blocks, just like full nodes do, because if a full node receives an invalid block, they will reject it. So how can we make a light client do the same thing so that they too don't have to trust the miners and they don't have to assume that the majority of the consensus is honest? So the solution to this is that we could use fraud and data availability proofs. And this is what I'm going to be presenting today is based on a paper that we released last month, and this is a joint work with Alberto from UCL and Vitalik from Ethereum Research. We also have the link is on the screen and also we have code as well. So the basic idea of fraud proofs is effectively that you have light clients and you have full nodes and light clients only download block headers and full nodes also download the entire block including the transaction data. And if a full node detects that there is some invalid transaction in that block, then they will effectively send that light client a compact proof that that block contains invalid transaction. And the size of that proof should be significantly lower than the size of that block. So the original Bitcoin white paper actually briefly mentions a concept like this called alerts which like in a single sentence. And the idea of alerts as Satoshi proposed them was that a full node could send a light client a message to alert them that the block is invalid. And that would cause the light client to have to re-download the block and validate that block again. But the problem with this is that as a full node, I could just lie to all the light clients and say all the blocks are invalid. And so effectively in the worst case scenario, the efficiency of the system for light clients is no better than downloading the whole blockchain again. So it boils down to running a full node again so it doesn't really work. Also, there has been some discussions in the Bitcoin space about having compact ford proofs which is what I just discussed. But some of these earlier proposals, they propose a different ford proof for every single way to violate the rules of the protocol. So for example, you might have one ford proof for double spends, one ford proof for UTX so not existing and so on and so forth. So what we can do here or what we're gonna do here is simplify everything so that you only need one ford proof. And to understand how this could work is we have to remind ourselves that we could generalize the blockchain as a state for transition system which is what Ethereum actually does. So effectively you have a transition function that takes in as input the state of the blockchain and some transaction and then every transaction modifies the state of the blockchain in some way. And then so between block X and block X plus one you have a number of transactions and they modify the state in some way. And in between those transactions if you apply those transactions sequentially you have many intermediate states as you can see in the diagram. So what happens if one transaction is included in the block that is invalid and modifies the state in an invalid way that is illegal or not allowed? So if there's a 51% attack then a miner could do this and insert this malicious block into the chain and like clients, sorry malicious transaction into the chain and like clients would happily accept it. So we need a way to prove, we need a way for full nodes to prove that this has happened so that like clients can reject the block as well and we need to do this in a compact way. So to do this we need some way to commit to the state of the blockchain in the block headers. So at the moment Ethereum uses patricio tree to effectively represent the entire state of the blockchain in a single route. And you can effectively represent the entire state of the blockchain as a key value store i.e. the accounts in the blockchain and values. But recently one of the proposed changes in serenity is to actually change this structure to a much simpler structure called a sparse merkle tree which is a much simpler way to actually represent the entire state as a single merkle root. And the basic idea of a sparse merkle tree is that it's basically a normal merkle tree but with an insanely large number of leaves. So if you wanted for example to represent every single possible shard 256 hash then the size of your merkle tree would have two to the power of 256 leaves. And you might be asking how is it even possible to actually compute this merkle tree? And actually there's some neat tricks that you can do such as for example, if you consider the fact that most of the leaves in this tree will be empty will have a zero or default value then you can effectively assume that the vast majority of the intermediate nodes in this tree will have the same value so you don't have to re-compute every single node in the tree because you know that the vast majority of these nodes only contain children with zero values. So effectively it is pretty much as efficient as a standard merkle tree. And so if we do this then we could effectively do this concept called stateless clients. So instead of imagining the block chain as a state transition system you imagine the block chain as a state root transition system because if we use this sparse merkle tree to represent the state of the entire block chain as a single merkle root and you include this merkle root in every block header which is what Ethereum is doing right now using a pituitary tree then you could also imagine that there are many intermediate roots between block X and block X plus one or the previous and next blocks and so you effectively have this execution trace between every for every single transaction in a block and so what you can do is you can not just include the final state root in the block header you can also include the intermediate state roots in the block header. So for example, if you wanted to you could include after every single transaction you could include the new state root which is the new 32 byte state root or you could do after every few transactions it's basically a trade-off between how big the for proof is and how much extra on-chain there that you want to put on chain. So if you wanted to so then you could easily generate a for proof if you have these intermediate state roots in the blocks. Effectively, the for proof would consist of the pre-state roots of that transaction the post-state root transaction itself and also the witnesses of that transaction. Now the witnesses of a transaction are simply all of the Merkle proofs of all of the state keys that a transaction accesses in the state Merkle tree and then you would also need the Merkle proofs for the transaction itself and the pre and post-state root. So then this is what effectively what the for proof would look like. And so that's all good. So now we have a working unified for proof system that only requires one single for proof rather than many different for proofs like some of the earlier ideas we're proposing. But the problem with this the biggest problem with this is something called the data availability problem. And the data availability problem is effectively says that, okay so what if a miner only distributes the block headers to the like clients? So like clients only know the block headers but what if no one knows the actual transaction data of the block? So just because you know the Merkle root for the transactions doesn't mean you know what the data behind that Merkle root is. So if a miner does this then that means a full node wouldn't be able to generate a for proof because they simply would not have the data for that. They would not have the data to generate that for proof. So we need a way somehow to guarantee the data availability or the availability of the data in the blockchain. And Vitalik proposed a neat way to do this called using erasure coding. And the idea of erasure coding is that let's suppose that you have some data that is X pieces long. What you can do is you can blow up that data to two X pieces long. And then if you lose any X pieces in that data you can recover the entire data from any of those X pieces. So what that effectively means is that this no longer becomes a 100% data availability problem. This only becomes a 50% data availability problem because you can recover 100% of data just from 50% of data. So what that means is that miners if they wanted to hide even a single bit or a single byte in data they would have to hide at least 50% of the data or not release 50% of that data. So what we can do with this is we can require miners to commit to the Merkle route of this blown up or erasure coded version of the block data that is extended to X pieces instead of X pieces. And then clients could use this construction to have a guarantee that the data is available. And so what they could do is they could randomly sample different pieces of that block of the erasure coded block. And if we can assume if we assume that the miner is trying to do this attack and has hidden 50% of data then there is a 50% chance in the first sample that you will end up in a part of the block that is unavailable. And so if you sample a part of the block that is unavailable i.e. you request from network to give you the part of the block that is unavailable and you don't receive a response then you don't accept that block because you think you might because you haven't received the response to your sample. And so if you keep doing more samples you can get a very high probability guarantee that the block is actually available because so then there would be a one minus two to the power of minus S chance of landing on an unavailable block if the block is largely or 50% unavailable. So and again if you land on an unavailable block you don't receive a response and then you don't accept the block. Problem with this, to add to this is that it's not a problem with this yet. The problem with this is in this slide but just as a note to this is for this to work you need a sufficient number of like clients to make enough samples to be able to reconstruct the entire data. So for example like if you only have like one client then there's not enough clients to make enough samples for 50% to sample 50% of the block. So then the scheme wouldn't work. Now there are ways to kind of and also there are ways to kind of get around this or to make it more possible so that you can't fool the first number of clients. But I'm not gonna mention in this talk it will be in the full version of the paper. And there's also more analysis and graphs in the paper that show how many like clients you need exactly. So the problem with the scheme that I've described so far is that what if the miner actually incorrectly applies the erasure code? So what if they just insert gibberish in the extended part of the data, right? So then that wouldn't be useful to anyone because then if you actually lose 50% of the block then that gibberish data is not gonna help you to reconstruct the entire block. So if you wanted to prove this to a like client then you would basically have to give them the entire original data and then they would have to recompute the erasure code themselves to check that it's correct. And this is going back to square one because you need the entire block to do this check. So the food proof would be equivalent to the size of the block itself. And that's what we're trying to avoid in the first place. So the way to fix this is that you could use multidimensional erasure coding. And the idea of multidimensional erasure coding is that you basically arrange your block data into a square. So you cross it up into a number of pieces and you arrange those pieces into a square. And then you apply the erasure coding one by one on each row and column of this square until you can extend it into a bigger square. And so if one of these rows and columns were incorrectly extended then the size of the food proof would be limited to a single row or column because only the like client would only have to need proof that one row or column is incorrectly computed to be able to know that the block is invalid. And this is much better because this has the efficiency of an O square root of a block size rather than O block size. And with this 2D coding scheme, so in the example here I'm using two-dimensional encoding, but you could use higher dimensions if you wanted to. Although personally I don't think it's worth it because there are other trade-offs. So with this scheme a miner would have to hide 25% of the data to hide the whole square. So here's a graph showing given a like client that makes S samples, what is the probability that they will land on an unavailable part of the block if the miner has hidden 25% of the square? And the higher the probability, then the higher the probability of a like client detecting that a block is not available. So we want the probability to be high. And you can see here that if you do say, after three samples, the probability is about 60%. After 15 samples, the probability is about 99%. So you can get very hyperbilities just from a reasonable number of samples. So some people might consider that 99% is too low. It's like if you wanted 99.99%, you would need 80 samples. But I think 99% is reasonable because what that would basically mean is that if you are a miner and you wanted to do this attack against someone, you would effectively on average need to mine about 100 blocks in order to get a chance of fooling the like client that you want to fool into thinking that a block is available when it's not. So that increases the cost of the attack. 100X, which I think would make it unreasonably expensive. Yeah, we have some performance measurements. I'm not gonna go into detail to them because I want to take questions. But everything is quite efficient. So under these parameters, assuming that you have intermediate state route after every 10 transactions. So the size of the ford proof, the state ford proof will be about 14 kilobytes. Everything is less than a few kilobytes basically. The biggest trade off though is that if you're a like client that wants to have a guarantee that the data is available, then you also have to actually download something called the access routes, which are basically every single row and column in this square has its own medical route that you have to download. And that does increase the header by about 10X. But you only have to do this if you do want the data availability guarantees. You can still run like super light clients which have no data availability guarantees. But I think this is quite a reasonable trade off. And in terms of computation, it's all quite efficient. Generating the state ford proof look quite expensive here, but that's because my sparse medical tree implementation is quite inefficient. Verifying ford proofs, verifying sample responses, verifying availability ford proofs are all in sub-millisecond times. So it's all quite efficient. So if you wanna look at the paper, link is there, code is in GitHub. You have the code for the data availability code. They read using irregicoding. Also sparse medical tree implementation can go under prototype for the ford proofs itself. Thank you. I'll take some questions now. The reason for not adding another dimension is just simply is just the right amount that the miner needs to fake. So actually if you add more dimensions, the miner, it makes it worse from a probabilistic perspective because the miner has to hide less part of the pieces of the block to hide the whole block. It's like with the one-dimensional scheme, you have to hide 50%. With the two-dimensional scheme, you only have to hide 25%. With the three-dimensional scheme, you have to hide 12.5%. So that means that you have to sample more sizes to get the same sample more pieces to get the same probability of guarantees. And also the other problem with having higher dimensions is that it dramatically increases the number of medical routes that a client has downloaded. So I don't really think it's worth it unless you have insanely large block size or like insanely large intervals between blocks. Yes. So could this scheme be used in the context of Plasma to force essentially the operating party to essentially like either they provide the proof or they get slashed? Because as far as I'm concerned, this allows somebody to just, you cannot know if the provider of the proof is unavailable if they're attacked or if there are adjustment issues. So I'm not an expert on Plasma, but there was a friend on Ethereum Research that asked that question. And the response was from Vitalik, said that it would be quite reasonable to do that. It's like if you wanted to... Is there a mistake? Sorry? Yeah, with the deposit. I think it would be quite reasonable to do that. If you wanted to make the Plasma operators erasure code their blocks and then make the clients have to do samples. And maybe you could even actually make it on the main chain that after some amount of time or after some delay, you have the main chain sample some of the pieces as well for extra security. Yeah. Yeah, I think it would be quite reasonable. Yes, yeah, okay, that works. So I'm just wondering if you change one entry in the state, then how big part of the erasure coding data structure will change? So how does do changes like propagate? Oh, so what happens if you change the structure of what? Sorry, the state. Okay, so our data structures are based on like rehashing, which means we just change a few entries. Each state right just changes a single entry and the logarithmic number of intermediate trinodes. So how much data will change in the erasure code after each state right? Okay, so let's go back to the erasure code. So this is the erasure coded version of the transaction data. So what you're asking is if you change a single transaction, how much of the erasure code would change? Well, first of all, I'm not sure why that would matter because that would change the medical routes as well. So I don't think there's a specific attack you can be from that. But it would actually change a significant portion. It wouldn't change the entire erasure code, I think, at least. Yeah, so it wouldn't change the entire erasure code, but it would change a significant portion of it. So it would change at least, for example, that entire fourth quadrant of the square. But it doesn't really matter because it would also change the medical routes of the rows and columns. To your right. Back here. To your right. Hi. Hello. I was wondering if you guys explored more succinct ways of generating and verifying the proof. It seems to me that it could allow you to go too high dimensional. Yeah, so Vitalik has some blog posts on using things like proofs of proximity to make it possible to verify that the erasure code was constructed correctly without having to rely on for proofs. And so this is like something completely different. This would be like, if you wanted to completely eliminate for proofs, then you would need a way for light clients to succinctly verify that the state transitions in the block were valid and also succinctly verify that the erasure code was constructed correctly. And there are some ideas on how to do that and they're on the Vitalik's blog. But there are currently some trade-offs that need to be ironed out. If I recall correctly, one of the ideas had the drawback of potentially requiring medical, making medical branches 100 times bigger in a worst case scenario. But it's definitely something that people are looking into. Let's give myself another round of applause, please.