 Next up, we have Martin Kleppmann from the University of Cambridge speaking on the Byzantine eventual consistency and the fundamental limits of peer to peer databases. Okay, hello everyone. So my name is Martin Kleppmann. This is work I've done together with Heidi Howard, we're both at the University of Cambridge. There's a lot of detail behind this which is in a paper which you can find on archives so I'll just leave the details to that. The problem that I want to talk about today is the problem of assumption of how many nodes are going to be malicious in a Byzantine fault tolerant system. And so generally for Byzantine agreement, the model is you have up to f malicious nodes, and you need three f plus one nodes in total in order to tolerate those f malicious nodes. Which means of course then that you need some measure to prevent civil attacks to avoid that limit of f nodes being exceeded, which means then on the other on the one hand either you can have the permission blockchain approach where you have centralized control over the number of nodes in the system, or you can use some kind of finite resource, like proof of work, in order to avoid the civil attacks. Now, for me personally, there are not enough sad faces in the world to express how I feel about proof of work. But I can see the need for wanting a permissionless system. And so this has led us to asking, okay, so we know that for Byzantine agreement, if you exceed the number of f faulty nodes, then basically all bets are off the algorithm cannot guarantee any safety properties or any liveness properties. It simply weird stuff can happen. But maybe there are some types of application in which actually you can tolerate any number of malicious nodes. So we don't need to assume that less than one third of the node is faulty with any even need to assume a 51% or anything like that. So in terms of applications, it's okay to have arbitrary numbers of malicious nodes and still gets the job done. Now, this won't be arbitrary cryptocurrencies or things like that they clearly need this limit. But it would be really nice if we can characterize exactly which applications can tolerate arbitrary numbers of malicious nodes, because they then don't need simple attack countermeasures. They don't need any sort of voting. They don't need any sort of proof of work. So what we would like is, imagine the space of all possible applications that people might build. We want to figure out where's that dividing line between the set of applications that required a three F plus one and required a civil countermeasures and the types of applications that do not require simple countermeasures because they can tolerate any number of malicious nodes. So that dividing line is the purpose of this work. And we're going to introduce this term of Byzantine eventual consistency for the types of applications that can tolerate any number of Byzantine faulty nodes. And Byzantine eventual consistency we can define in terms of a couple of properties. One is a simple lifeness property that is if one correct replica applies an update, then eventually all of the correct replicas will apply that update. The second property that we can require is convergence. And that is, if you've got any two replicas that have seen the same set of updates, then they must be in the same state. And that can be achieved, for example, using CRDT's I'm not going to talk about those in this talk, but there's a lot of existing work on that which also Victor mentioned briefly just now. And finally, we can talk about invariance about the data and I'll explain in a moment what I mean with that. So a couple of more technical properties that I haven't put on the slide here, but you can find in the paper, but crucially we want to ensure that these properties hold regardless of the number of malicious nodes in the system. So this idea of an invariant is quite an important one. So an invariant what I mean is a function that takes conceptually the entire state of a replica, and it returns either true or false, depending on whether the invariant is satisfied or not. And so typical invariants might be stuff like every account has a non negative balance. That is essentially the double spending prevention that you need in any cryptocurrency. That is, you can't spend more money than you have. And, and so that is one particular example of an invariant, but you can imagine many other invariant. So if you think about this in database terms, then for example, a foreign key constraint in the database is is the type of invariant that may or may not be satisfied, or a uniqueness constraint on a particular value. So if you want a particular username to be unique for example, or a domain name in the domain name system should be unique and so on. Then that again is a type of invariant. And we can now reason about whether an invariant is confluent or not. And so I'll explain what I mean with confluence. Imagine you have two replicas A and B, they're both initially in the state s and the invariant I satisfied for both for both replicas. Now the replica a performs some update you one and moves into state essay. And we're going to assume that essay still satisfies the invariant. Moreover, concurrently to that the replica B is going to apply some different update you to. And we're going to assume that in the state SB the invariant is also satisfied. So under all of those assumptions, we're going to say the two replicas now exchange their updates. They both merged them into some states and based on the convergence property earlier we know that this merge is always possible. And if the invariant is still satisfied in this merged states, then we say that the updates you want you to are invariant confluent with regard to the invariant I. Now, it's made seem a bit technical but I'll give you an example. So let's say that our invariant is that we want no negative account balances in the system. And the two updates are updates that decrease the balance of the same account. Well, then it could happen that each update individually is safe. So each update individually reduces the balance only to some non negative value. But if you take both of the updates combined, that will cause the balance to go negative. And so in that merge state, then the invariant would no longer be satisfied and therefore these two updates are not invariant confluent with respect to I. However, if we change it instead so that the two updates actually increase the balance of some account, then it's perfectly fine to perform those two updates independently from each other. And we know that we can always merge them together and we will never end up with a negative account balance because the only thing that can happen to the account balance is to increase. Assume we don't have any overflow. And so in this case, you one and you two are invariant confluent with respect to I. What we did is to prove now that using this definition of invariant confluence. It's actually possible to ensure Byzantine eventual consistency with an arbitrary number of malicious replicas. If and only if all of the updates are invariant confluent with respect to all of the invariants of the system. And this now tells you why a cryptocurrency is not invariant. Well, not amenable to Byzantine eventual consistency, because you have this double spending prevention that you need to do. And so that makes it not invariant confluent. But there is a large class of other types of applications that are still useful apps that are actually invariant confluent and which we can therefore implement in this model where we have arbitrary number of Byzantine 40 replicas. And they're quite exciting. Now, I'd like to show you briefly also an algorithm for actually implementing Byzantine eventual consistency. And the idea here is that you've got any number of replicas, and they can connect to each other pair wise, and they need to figure out in this pair wise communication what updates do they need to send to each other so that at the end they both have the same set of updates. So let's say we have replicas A and B here. They a starts off with updates you one and you two B starts with updates you one you three. And what we want is now that is going to send you two to be and B is going to send you three to a and then they will have both have the same set of updates. What we need to ensure that this protocol for exchanging the updates is resilient to any Byzantine 40 replicas that might be there. And so one thing we might do, for example, is to make one update depend on each other on another update. And whenever a later update depends on an earlier one we include a hash of the earlier update in the data of the later update. And this will give you a hashtag, just like in gates or IP LD or a million other systems that you'll use hash graphs. But then this now has a nice property that in order to represent the set of updates that you have. You only need to find the heads of this graph that is the updates that have no updates depending on them. And you can just send the hashes of those over the network. And that is that encodes already the information of what updates you have. And based on that now the nodes can figure out whether they're in the same state or not. These are the same. We know that the two nodes must be in the same state. They must have the same set of updates. The difficulty now, the annoying thing is that if the hashes are not the same, you have to walk backwards in this graph until you find the common ancestor where the hashes are known to the other side. And that involves a lot of round trips back and forth as you go step by step backwards in this hash graph. And if your hash graphs are very long, this becomes very slow. And so we now developed an optimized algorithm which can sync up these hash graphs very efficiently, usually in just a single round trip. Occasionally it can require more round trips with a small probability. And the idea for this is like this. So we assume that the two replicas have some part of their hash graph in common. And then some other part of the hash graph was added since their last sync. And they can remember what the outcome was of the last sync. So A remembers the last time I talked to B, this was the hashes that were the heads of the graph at the time of my last sync with B. And likewise B can remember what the hashes were at the last sync with A. And from remembering those hashes, they can now work out what are all of the updates that were added to this hash graph since the last sync between these two nodes. We can now identify exactly which subgraph of the hash graph was added. And well, one option would be to simply send everything that was added since last time. But actually the nodes might have got part of that graph from some other nodes because there are lots of nodes all syncing pairwise. So actually we want to figure out only which of those parts of the graph that were added are not yet known to the other side, so that we don't end up sending updates unnecessarily that the recipient already has. And there's a fairly simple trick we can use actually to do that using Bloom filters. So Bloom filter is a probabilistic data structure for encoding a set. And we can take the set of hashes that were added since the last sync and put those into a Bloom filter, which is a fairly compact byte array representation essentially, and send those Bloom filters of the network to the other side. And those will allow the other side to figure out exactly which changes I have which the other nodes does not have, and those they will then send to each other. And this provides us a really efficient way of syncing up to nodes, and it has the nice property also that it is resilient so that if some of the replicas in the system are Byzantine 40, they can't cause other nodes to go out of sync with each other, for example. So the whole thing is resilient and it achieves Byzantine eventual consistency in a system with arbitrarily many malicious nodes. We have implemented this algorithm in a CRDT library that I work on called Auto Merge. There's also a blog post on this hashgraph reconciliation algorithm, if you're interested in that, and of course the paper as I mentioned. So that's all from me, happy to take any questions.