 The clock is going to be on the sleepy model of consensus. Then she is going to be on the clock. OK, thanks for the introduction. I'm very happy to be here. I'll be talking about the sleepy model of consensus. And this is John's work with Raphael Paz. OK, so this happened in May, earlier this year. British Airways had an update. And this was what the airport was like. People were stuck there. And this happened because of a systems failure. And something strikingly similar that happened to Delta Airlines in, I guess, September last year. There was this one day where all the flights were cancelled. Everything was shut down. All the booking systems were shut down. And Delta Airlines suffered 100 million US dollars loss in revenue because of this system failure. OK, so the model of the story is that we need replication and robustness. This is a very simple idea. But it is precisely this very simple idea that gave birth to an entire area of research, distributed systems, and that has been 30 years of work in this area. OK, so in distributed systems, we care about a very important abstraction called state social replication. It is also referred to as linearly order lock or consensus. In state social replication, imagine we have a set of servers. In this case, we have Google wallet servers. Google wallet, for instance, wants to avoid a kind of disaster that happened to Delta Airlines. So these set of servers, let's imagine they want to agree on ever growing linearly ordered lock-off transactions. There are two very important security problems that we care about, namely consistency and likeness. Consistency is that all the honest notes must agree on the lock. Because we never did that yesterday, it could be, let's say, the last lock is a little longer than mine, but nonetheless, our locks have to be presented to each other. OK, so like this says, that whenever a client submits a transaction, the transaction has to appear in all of the honest notes locked fairly quickly. OK. In the beginning, this may seem deceptive-lacing, so after all, what is difficult about reaching consensus on a linearly ordered lock indeed, the problem would have been trivial if all the nodes that are honest can behave correctly. But if it is the case that a set of these nodes can be corrupt and these corrupt nodes can behave arbitrarily, then as it turns out, the problem becomes highly non-trivial and that's why you know there's so much work in this space. OK, so for the rest of the talk, whenever I mentioned consensus, I specifically made state machine application. State machine application is not new and in fact, it is widely applied in practice. For example, Google has a service called Chubby and the protocol behind Chubby is a pixels-like consensus protocol that is tolerant of crash spots. And of course, without Google, that's all the Silicon Valley companies have to copy. So roughly speaking, the open source counterpart of Chubby is a patched zookeeper and pretty much every other Silicon Valley company besides Google adopts a patched zookeeper to replicate their computing infrastructure. Traditionally, when we talk about consensus, the kind of scenario that conjures up in our minds is exactly what I talked about. There's a single optimization, there are a dozen nodes that interact with fast local area now. And what really changed our view of distributed consensus is cryptocurrency. This is like extremely exciting, you know, we have Bitcoin, Ethereum and now we can do consensus on a large scale over the internet. OK, so this is really amazing and you know, there's a lot of excitement in this space. For instance, this is a picture I found online of industry consultions and these consultions are trying to build blockchains for banks and these banks want to build a distributed network amongst themselves. For instance, I talked to companies that are trying to roll out solutions for banks in China. This scale we are looking at is in China there's like hundreds of banks and each bank implies 10 nodes, we can easily have like 1000 nodes scale. So everything is a lot larger than classical deployments and it's among multiple organizations. So interestingly when you talk to blockchain cryptocurrency people, it seems like the common wisdom is that a classical consensus protocol aren't robust enough for these large scale deployments and people want blockchain. OK, this is kind of like a very nice intuition but it's also not very satisfying because for one thing, it doesn't answer the question, what kind of robustness problem do we want in this large scale deployment? It doesn't articulate what robustness means. In the rest of the talk I'm going to try to answer these questions or at least partially. So first I'll talk about how do we define robustness? I'll give one possible definition and then I'll talk about why classical protocols fail to achieve this notion of robustness and then in the end I'll answer the question how can we achieve this robustness notion? So before I begin I wanted to say like cryptocurrency is an interesting area because it seems like the empirical success is faster, it outstrips our scientific understanding. So I have this colleague who made an interesting comment and traditionally when you ask a researcher, let's say you're a professor in a university and we have this technology, what is the gap from the research prototype till the technology matures and gets deployed in practice? Like traditionally you'd expect an answer between like five to ten years but for cryptocurrency if you try to answer this question it's like negative six months. Okay, alright. So let us begin with the first question. So exactly what robustness properties do we care about for these large scale departments? And let me make an analogy. Suppose we want to have 300 million people vote, what can happen? What's going to happen is that only 160 million are done will show up, right? Not everyone's going to show up. And perhaps we can hope that amount of people who do show up maybe the majority are honest because if the majority are corrupt, who knows what will happen? Who will be the next US president? Okay, alright. So in this table we define this as a new model for distributed consensus called the sleepy model and in the sleepy model we can consider that there are the sleepy, which means they're offline or they're active, or we also call it a wake. And sometimes nodes can go to sleep and then they can wake up later, right? So when the prune is so large you wake up. And the point is that when you wake up you are supposed to continue to enjoy consistency and life disparities. And this is not a requirement in classical disaster. In classical consensus, if I ever go to sleep then I'm automatically treated as corrupt. It's treated as a crash part. And later when I do wake up again I don't enjoy any of these consistency and life disparities anymore. Okay. So this is just further clarification on the model. I made this part in central. That's why Dan Monet is the adversary. Okay. A malicious node, as we said, can behave arbitrarily. The adversary can delay all the ordering messages but up to some narrow delay upper bound. And the step narrow delay upper bound is known to the protocol. So messages can be let's say delays where I'm lost out in a rough month and the protocol knows exactly what this maximum delay is. And all the nodes that are online can receive messages from other honest nodes within this maximum delay. Okay. So this is some clarification. A meaningful question that we can ask in this model is can we reach consensus if 51% of the online nodes are honest? Right. Not everyone can be honest but we want that among the online people maybe the majority of them are not honest. And before I answered this question I want to mention that this is actually the best you can hope for because in our paper we have an impossibility result showing that if less than 50% of the online nodes are honest then consensus is impossible in this leading model. This is also very interesting because if you contrast it with the classical setting in the classical setting if you have a PKI you can tolerate arbitrary many faults. But here even if you have a PKI this impossibility result will help. So that's why the model is fundamentally different from classical. Okay. So when I ask this question here are some of these implicit assumptions that I'm making. First, the protocol doesn't know how many nodes are going to show up. It could be that 30% is going to show up it could be maybe 99% are going to show up maybe I'm expecting 99% to show up but only 1% showed up. Not the test. Even if only 1% showed up we want the 1% people that showed up to still be able to reach agreement. Okay. And maybe you know you didn't show up in the beginning but you somehow join later and that's okay because as soon as you join as long as you are an honest node you should be able to enjoy the consistency and like the sanity. Okay. So what I mean is that this seems like a very simple and very natural question, right? And you would expect that with 30 years of work in this space such a question should have been resolved. But the surprising bit is that that's not the case and that it turns out all the classical protocols fail in this model and moreover not only do they fail none of these protocols work even if we are willing to assume that 99% of the online nodes are honest. So that's a very strong statement. Okay. I'm going to very quickly explain why classical protocols don't work. There are two types of classical protocols synchronous and asynchronous. In the synchronous model the messages are delivered immediately in the next round. In the asynchronous model the message can take some unknown amount of delay and the protocol doesn't know how long the delay is. Okay. So why do classical synchronous protocols fail? The fundamental reason is that the sleeplessness is some kind of asynchronous behavior because imagine I go to sleep now and I wake up later and when I wake up I receive altitude depending messages. So that's kind of modeling some kind of asynchronous behavior where the message can be delayed by the duration of the sleep. And for these classical protocols that expect synchronous assumptions to work these protocols don't work in the sleeplessness model. And maybe the more interesting question is why these classical asynchronous protocols also don't work in the sleeplessness model. And the reason is that in the classical models if the node goes to sleep it will automatically be treated as corrupt. So imagine so typically the way asynchronous protocols work is that I would ask people to vote and I'm going to collect two-thirds of the people to vote and then I would make progress. But if it still happens that only 1% of the people show up I would never, even if I waited for an instant delay now I would never be able to collect two-thirds of the people to vote. So the protocol will get stuck there. And the fundamental problem here is that we don't know how many people are going to show up in prior weeks. So to quickly recap I've explained one specific notion of robustness and why classical protocols are failed. I will quickly for the remaining, how many minutes? For the remaining 12 minutes I will try to talk about how we can achieve this notion of robustness. So this is just a note to say in this talk I only mentioned one notion of robustness but there are other notions but they can be published also at several talks. So how can we achieve robustness? We are going to draw inspiration from crypto currencies again. The community's wisdom is that Bitcoin's multimodal consensus is very, very robust. And in fact the amazing thing is that Bitcoin actually has been up and running for more than, maybe now it's nine years, for more than nine years. And maybe there were some minor attacks but nothing really major and that's why Bitcoin is referred to as the honey badger of money. A honey badger is a very robust animal. The good news is that we can actually now mathematically prove that Bitcoin's blockchain protocol indeed is capable of reaching consensus by 51% of the online nodes are honest. So that's good. But on the other hand this protocol is hugely wasteful, right? I recently read a report saying that Bitcoin's electricity consumption is more than that of Ireland. So we certainly don't want to run the protocol like this and this is just for confirming like three transactions per second by the way all this electricity consumption. Okay. And of course the interesting question is can we achieve Nakamoto's robustness? But without having to pay expensive proof of work. Okay. So we want to use Nakamoto's blockchain to remove the proof of work. And to continue I will very quickly talk about Nakamoto's blockchain protocol. I assume that many of you already know how Nakamoto's blockchain works. And then I'll talk about how to remove the proof of work. Okay. What is a blockchain? This is a blockchain. Okay. So how does Nakamoto's blockchain work? Say I have this blockchain I want to extend it. What do I do? I'm going to take a hash function. I hash the block I want to extend, the transaction I want to confirm and the puzzle solution, which is the screen just called puzzle piece. And if the hash outcome is less than B which is the difficulty parameter then the screen value is a good puzzle solution and luckily I get to mine the next mark. Okay. And the assumption is that the stage is like a random article, it's a random function. So the best way for me to find a puzzle solution is through brute force. I'm going to try many of the green lines until I find one that satisfies the solution. And of course trying many of these green jigsaw puzzle pieces is going to be expensive and that's where the proof of work happens. Okay. And the important aspect of the blockchain protocol is that people, if you see blocks you are going to pick the longest chain and that's important for security because let's say if Dan maybe paid me some money to buy my I know the Alfa Romeo, my car and now he wants to erase his transaction to double Stan's money he needs to basically mine a longer fork, right? And if he doesn't have enough competition power he is never able to mine a longer chain and therefore he cannot erase the past. Okay. So essentially if the transaction is embedded deep enough in the chain it is considered to be secure enough and it can be considered to be confirmed. Okay. So that's just very quickly how Bitcoin is blockchain protocol works and I want to focus on how to remove the proof of work. One thing interesting that we realize is that proof of work is kind of a leader in that chain process when we remove the proof of work one idea is to restrict the puzzle space, right? In Bitcoin you have to try many many puzzle solutions until you find a valid one but wouldn't as we've already explained the protocol let's assume we are in a permissioning setting right? We have a set of nodes and everyone knows that knows perfectly. Okay. It is a permissioning setting not permissionless. So this was how Nakamoto worked but now we want to restrict the puzzle space. So what do we do? So imagine we are this round-face protocol and in every round we want to elect a leader and we don't want people to try many puzzle solutions so the puzzle solution is basically your own identity, right? So let's say Dan wants to be elected as a leader and he's going to take this hash function, hash his ID and the current round number and if the outcome is less than the difficulty parameter and he is Dan the leader and if he is the leader he can basically take the secrecy he can sign the block he wants to extend with the set of transactions he wants to confirm and the current round number and this allows him to mine a new block and everyone can verify that the block is correct by verifying that indeed Dan is the leader of this round and that he signed all of these things correctly. Okay. So this seems like a nice idea, right? The question is whether this protocol gives you security well, I guess the answer is no, it's a trick question and there are two reasons why this protocol isn't secure one is that if Dan is elected in some round he can use this credit multiple times like honest notes are only going to use this credit once but the adversary can use it multiple times and the second issue is that honest notes are only going to mine in the present, right? Every round I'm only going to use the current round number but whereas adversary can use future round numbers so the adversary has much more choices than honest notes and that's not fair to the honest notes Okay, so to fix the protocol our idea is to add some constraints for timestamps in the blockchain so how do we do it? First we require that in the valid blockchain these timestamps these round numbers have to strictly increase number tip so this prevents the adversary from reusing the same round number twice in the same chain and also like I said we want to prevent mining in the future so honest notes are only going to accept the chain if all the timestamps are not in the future are in the present or in the past Okay, so these constraints intuitively they seem to constrain the adversary a lot but of course the question still remains is the protocol secure after these fixes Okay, so this actually is a non-trivial question The answer is actually yes but the proof is non-trivial Okay and one reason why the proof is non-trivial is because like even even though we know previous works have proven Nakamoto's blockchain secure it turns out that in this protocol the adversary has more choices than in Nakamoto's blockchain so in our paper we have a detailed proof that shows how these stochastic results still hold despite these new attacks Okay so like I said the previous no Nakamoto blockchain analysis basically don't work in our setting and I'm going to skip the proof in the ancient time if you want to learn about the proof you should look at our paper the paper also contains additional results from Oracle how she's stronger in security like the protocol I talked about has only adaptive security but if you want to get adaptive security you have to do something more Okay application of the sleepy consensus protocol consortium blockchain remember I said about distributed ledger among multiple banks and you can also use sleepy consensus protocol to build a proof of state protocol we have another thing about Snow White which basically takes the sleepy consensus protocol and we show how to do robust committee reconfiguration over time to reflect the current such that the committee reflects the current state distribution so we can get a proof of state protocol Okay, alright to conclude we have come a very long way where we started about we started here these classical distributed systems like within Google and Facebook and now we are in this exciting new world where we want to build internet-scale distributed systems we want to treat with other people that we don't know on the internet and there are so many exciting open problems like even just for consensus itself our understanding of distributed consensus in this large scale is fairly little Okay before I end I want to mention that following Azure Grid and this is generously sponsored by NBI and Tsinghua and co-organized by Tsinghua Carnau and given by I think some of you I'll see you guys there There are a difference between the sleeping model and just saying that messages can be arbitrarily long delayed because it seems really sleepy it's the same as just about getting your messages for a time that's a very good question what's the fundamental difference we actually don't know how many people are going to shop in the sleeping model so what you are asking is more close to the classical asynchronous model where the message really can be arbitrarily long but these protocols require knowing exactly how many nodes are there because so for instance we know that in the classical asynchronous technique we can design protocol secure against when they are crushing and the way these protocol work is that I know and people want to participate in the protocol I'm going to have everyone vote I will count two thirds times N number of votes and I do something I take some action and then you vote again and again I wait for two thirds of people to vote so the point is that in the sleeping model there may be some that will never wake up so in the sleeping model if you do this it's not going to work because I'm expecting two thirds times N people to vote and the protocol is kind of there forever maybe one more question you are kind of like a leader to find the next chain but what if the leader is not online it's probably just down here if the leader is not online so that's a good question so I didn't talk about difficulty adjustment potentially you can do difficulty adjustment right now the protocol slows down if you are online you are doing random selection eventually you elect someone who is online and that guy is going to sign the block but it is true that if only half of the people are there the protocol slows down by a factor of two right if I don't adjust the path of difficulty parameter but at some point maybe there's not enough people online and the protocol doesn't know it's down there because you don't have a chance to adjust the parameter as long as some of these people are online eventually they know it's going to get elected let's say there's only one person online so you need to set each and number of rounds if I'm going to elect the protocol let's say roughly speaking I'm going to elect a leader every round right so after linear number of rounds the other side is still going to get elected and when he gets elected he's going to make progress but of course the protocol slows down by a factor of n unless you do have difficulty adjustment and it is indeed possible to do difficulty adjustment in a sleepy light protocol so for instance Nakamoto's blockchain has this difficulty adjustment mechanism that looks at the block generation rate in the past two weeks and it will set the mining difficulty parameter according to the next election I think it's like 2016 block. which is roughly two weeks alright thank you