 that you have learned something about distributed consensus and it takes a lot more than just the consensus protocol obviously to build a cryptocurrency right so for instance we need to make sure that the incentives are aligned and there's good governance policy and in the interest of time I'm not going to be able to cover these other topics in this talk okay so let's begin I'm going to begin with a little bit of storytelling in early in May this year so there was this one day when all the flights out of London were cancelled you know for British Airways and this was because of a systems failure and this is a picture showing people lined up at the airport and quite interestingly in September last year something very similar happened to Delta Airlines and for this one day you know everything was shut down you couldn't fly you couldn't book any flights and Delta suffered a hundred million US dollars last year in revenue because of this systems failure so what's the moral behind these stories we need replication and robustness right so it's a very simple idea but it is precisely this very very simple idea that gave birth to an entire line of work called distributed systems and this has been around for 30 years right it's not something new in distributed systems we care about a very important abstraction which I will call state mission replication so let me quickly explain what state mission replication is okay so state mission application is the task of agreeing on the ever-growing linearly ordered log all right so let's imagine we have a set of servers in this case we have Google wallet servers and obviously Google wallet and doesn't want the kind of disaster that Delta Airlines has had right so these servers would like to agree on a linearly ordered log of transactions and there are two very important security properties that we care about namely consistency and liveness okay so very quickly consistency says that all of these honest notes must agree on the same log however note that the network can have delay right so it could be that your lock is a little faster than mine that doesn't matter nonetheless your lock has to be a prefix of mine or mine of yours okay so that's what consistency says for liveness whenever an honest client submits a transaction we want that this transaction will appear in all of the honest server's lock very quickly okay so any questions at this point if you guys have questions feel free to interrupt the audience has reduced a little bit which is good for asking questions yes quickly so typically there's a technical definition with the liveness parameters so the liveness parameter can be some kind of polynomial function over parameters of the execution like the security parameter the number of nodes but for the purpose of this talk so the protocol I'm going to describe is like confirms in like one to two rounds but in general this parameter you can specify as a function of other parameters of the execution okay all right so that's a very good question and in you know at first sight this definition seems like deceptively simple right so what can be so hard about agreeing on a linearly order lock indeed if all of these nodes behave correctly then the problem is indeed trivial like there's nothing hard about it but what's interesting is if some of the nodes are compromised let's say they have malware and then they can behave arbitrarily and deviate arbitrarily from the protocol and even in these cases we want that for the remaining set of honest servers they still have to respect these security properties so that's why the problem is highly non-trivial okay so for the rest of the talk whenever I mentioned the word consensus I exclusively mean state to machine application okay so like I said state to machine application is not something that's new right it's not just like you know blockchain invented state machine application this has been around for a really long time and it has been making real-world impact for a really long time right for instance every Silicon Valley company actually implies some instance of state machine application protocol to replicate their computing infrastructure so for instance for Google right and they run a service called Chubby and behind Chubby is this protocol called Paxos and that's like a crash-fought tolerant version of state to machine application obviously whatever Google does you know every other Silicon Valley company is going to copy and that's why we have an open source counterpart I mean sort of counterpart called Apache Zookeeper and pretty much apart from Google every other company in the Silicon Valley implies Apache Zookeeper okay so traditionally when we talk about distributed consensus the kind of scenario that conjures up in our minds is exactly what I said right there's a single company there's like five to ten nodes and the nodes are interconnected with fast local area network so small scale and what is really amazing is that you know with cryptocurrencies like Bitcoin and Ethereum we now have empirical evidence that distributed consensus is actually possible on a really large scale on the internet so in many ways this is like extremely exciting and that's why banks and these large industry consortia like everyone's super excited about this and we want to kind of replicate the success to the permission setting like we want consortium blockchains the banks want to build the distributed ledger amount amongst themselves okay for instance in China right we have China central bank wants to build such a distributed ledger among all of the banks in China and the kind of scale you are looking at is like you know there are easily a hundred banks in China and if every bank contributes like ten nodes we are looking at a thousand node scale okay so and perhaps not surprisingly right all these large industry consortia companies in the space everyone's racing to create the dream protocol for large scale consensus so all of this is very exciting but then again at the end of the day when you sit down and think about it perhaps it doesn't make any sense right so why after 30 years of work on distributed consensus everyone still has to roll their own implementation right after all implementing consensus is like very much like implementing crypto right it's very tricky very error prone and normally like you know for crypto we tend to think this is not something that you should like roll your own implementation for okay so to understand why there still isn't a dream protocol for large scale consensus it helps to understand the consensus landscape so I'm going to talk about the consensus landscape a little bit before I start to talk about Cinderella roughly speaking there are two broad classes of approaches there's classical and there's blockchains so by classical I mean protocols like PBFT and taxos so what do we know about classical consensus these protocols are fast most of the time they can confirm transactions like in constant number of rounds but on the other hand these protocols are also notorious for being incredibly complex and there's a lot of like there are implications for what complexity means on a large scale right so I'm going to revisit this complexity point again a little bit later in the talk and there are also other issues with these classical protocols like for instance in my research earlier in Cornell we now mathematically understand why these classical protocols aren't robust enough for large scale so this is a point that I won't have time to articulate in this talk but you have to take a leap of faith that we now actually mathematically understand what robustness means on a large scale and why these protocols aren't suited okay so of course at the other side of the spectrum we have blockchains right so blockchains the amazing thing is that they're not just an empirical success we now actually mathematically understand that these protocols actually reach consensus using a mathematical mechanism that departs completely from classical consensus so in other words blockchains are also a theoretical breakthrough okay whenever we talk about blockchains of course your first reaction might be that you know these protocols are extremely wasteful because the miners are you know solving computationally expensive puzzles and in order to make progress in this talk I'm going to ask you to take a leap of faith again and imagine that the proof of work problem has already been solved so we did have an earlier work that shows how to remove the proof of work from blockchain style consensus and still be able to retain precisely the same mathematical properties of these protocols so this is a paper called sleepy consensus it's going to appear in Asia this year and and then in the rest of the talk whenever I mentioned blockchains it can either be proof of work-based or non-proof of work-based okay so in comparison with classical consensus right there are many advantages of blockchains for instance they're very robust right their robustness has been empirically proven Bitcoin was like for instance often referred to as the honey better of money and the protocols are very very simple right if you look at the protocol rule it's essentially everyone picks the longest chain and they try to extend the longest chain that's that's it so simplicity you know in a large-scale distributed system can often be your friend okay on the other hand even though the proof of work problem can be solved blockchains still suffer from slowness right so as we know in Bitcoin it takes like 10 minutes per block and often to get enough confidence you have to wait for like six to ten blocks which means your transaction confirmation is on the order of an hour right so I know Ethereum parameterized the block interval differently but the point is that there's security implications when you reparameterize these block interval parameters and so from this mathematical analysis by Pasadal we understand that if you want resistance to 49% attack your block interval should be 60 times the maximum maximum hour delay so in other words the block interval cannot be too small with respect to the narrow delay so this actually seems like an inherent drawback of blockchain style consensus okay so this is kind of the overall landscape and what this tells us is that it seems like the stream protocol for large-scale is still eluding us unfortunately and in the rest of the talk I'm going to try to answer this question and here's the plan I'm going to start talking about how classical consensus works and I'm going to go on until I get stuck and when I do get stuck I'm going to try to combine the best of both worlds and get a new consensus protocol so some derela is an extremely simple consensus protocol at the end of the talk I'm going to summarize the protocol in two sentences okay any questions at this point all right so without further ado let's talk about consensus and today we have a Vitalik versus the superheroes Vitalik will act as the leader and everyone else is the voter okay so some of these voters can be compromised and behaving maliciously right so for instance in this case Loki is corrupt and also do keep in mind that the leader himself can also be corrupt so Vitalik can also be malicious okay so here's roughly speaking how these classical consensus protocols work Vitalik as the leader he's first going to make a proposal the proposal contains a batch of transactions attacked with a sequence number right so the sequence number is going to determine where in this very long lock this batch will land okay so the final protocol will contain many instances of this little protocol but for simplicity for now let's focus on the little instance with a single sequence number when Vitalik makes the proposal everyone's going to vote okay so here an imagine a golden vote is voting for the golden transaction which is the transaction that Vitalik has proposed and and if you are corrupt obviously you don't have to vote according to the protocol rule right so Loki in this case is voting for the red transaction and the blue transaction instead of the the golden okay so then number step number three I'm going to wait till I hear enough people voting for the same transaction and then I'm going to confirm or in other words output the transaction so this is a very simple voting protocol and I'm going to try to and describe what kind of properties this protocol can achieve okay and before I do that I want to stress that the most important invariant of this protocol is that honest notes are only going to vote uniquely for every sequence number so if I'm honest I'm going to wait till I hear the first proposal from Vitalik and I'm only going to vote for that first proposal I hear I'm not going to vote for anything else okay so for content to understand the security of the protocol let's consider concrete parameters and equal to 3f plus 1 right so n is the total number of notes and f is the number of corrupt notes so this is saying let's imagine the number of corrupt notes is less than one-third and in this case I claim that it's sufficient to wait for two-thirds of the people to vote so I'm going to set my threshold to 2f plus 1 right so this is exactly waiting for two-thirds of the people to vote and once I wait for enough votes I'm going to confirm okay so why does this protocol achieve consistency here's a very simple argument imagine we have Spiderman and Ironman Spiderman has heard 2f plus 1 people voting for the red transaction so I'm going to call this the red quorum Ironman has heard also heard 2f plus 1 people voting for let's say the orange transaction okay so so these two quorums they may not be the same however by a very simple pigeonhole principle right to the n equal to 3f plus 1 it's easy to see that these two quorums must intersect at an honest note and that's the key observation right so if you think about it this is like very very easy math and because of this remember that I said honest notes vote uniquely so we can conclude that the red transaction must be the same as the orange in this case so consistency is guaranteed and interestingly and I want you to notice that in this argument I never relied on the fact that the leader has to be honest right so in fact the argument holds nonetheless even when the leaders corrupt the only fact I'm relying on here is that honest notes vote uniquely okay okay so let's now imagine there are two possible words right on the left hand side imagine Vitalik is honest and on the right hand side Vitalik is corrupt when Vitalik is honest everything's all good we have both consistency and liveness so consistency I've just proved for liveness it's very easy to see because if Vitalik is honest he is going to propose the same transaction to everyone right and everyone will all the honest people will vote on the same transaction and obviously because there are at least two F plus one honest people soon enough I'm going to hear two F plus one votes and then I can make progress on the other hand what if Vitalik is corrupt if Vitalik is corrupt fortunately we still have consistency right like I said the consistency argument didn't rely on Vitalik being honest however we don't have liveness anymore right so why is the case because if Vitalik is corrupt he can propose different transactions to different people and everyone's going to end up voting for a different thing right so so you can wait and wait and wait but you never collect enough votes for the same transaction and you just get stuck there okay so now the corrupt of consensus is to solve the liveness problem when the leaders corrupt so that's what we want to be able to achieve and if we can achieve that then everything's good okay so how can we guarantee liveness if you look at these classical protocols like PBFT and Paxels they rely on a very complicated leader reelection mechanism and in PBFT this is called view change so I don't want to have to explain this okay so again this is like the anatomy of a white class of classical style protocols there's a very very simple voting path the normal path as I explained right all these protocols have a similar voting path but when the leaders corrupt they go to this very complicated recovery path and that's where things kind of get extremely complicated so here's an interesting anecdote right so chain.com is a San Francisco based startup company they signed a high-profile contract with Visa and and what they the way they deal with this problem is that they only implemented the normal path and they basically just ditched all of this complicated recovery path and what's going to happen is you know if things the protocol is under attack things go go around the leader is corrupt then you have to kind of do this manually and it's not going to be fine if you have like a hundred banks running the consensus protocol because you have to go to all of them and say okay now you know sink your lock to this this state and then maybe reboot okay so essentially I mean we are kind of stuck at this point and like I promised earlier when we get stuck we are going to try to combine the best of both worlds all right any questions at this point okay so here's our idea we still have this very simple voting path but we remove the complicated stuff and replace it with a blockchain and that's the idea behind Cinderella okay so it's very easy easily said and there are a couple tricks to doing this correctly which I'm going to explain and before I explain the protocol here's the kind of guarantees we can achieve with Cinderella right so number one we are a blockchain based protocol fundamentally and that's why we are just like almost as simple and as the blockchain itself and we are just as robust as the blockchain to 95% of the time in practice you are going to live in the fast path we also call it the optimistic path and in the fast path you are going to confirm transactions with a single round of voting right so not even a block interval so two to three actual network rounds and when you're under attack it's not the end of the world because you're far back to the blockchain's performance and guarantees as well okay so Cinderella can be instantiated for both permission than permissionless settings and for concreteness when I talk about the scheme I'm going to assume the permissionless setting and more concretely let's imagine Ethereum what Ethereum wants to do right so Ethereum has like like the previous previous speaker talked about and Ethereum wants to move towards proof of stake but as a first step the goal is to have the stakeholders form a committee and vote on top of a blockchain so currently this is a proof of work blockchain but in the future they want to prove replace it with proof of stake blockchain okay and so I'm not going to mention how to elect the committee and elect the leader as I said upfront this is like outside the scope of this talk but I want you to take a leap of faith and imagine that there exists mechanisms to elect the committee and elect the leader from the set of the miners and the stakeholders okay so we are going to make a couple assumptions in order to achieve our worst case guarantees so we are going to assume that the majority of the miners are honest and if this is a proof of work blockchain we are assuming the majority of the computation power is honest and for the committee we are also going to assume that the majority of them are honest but they don't have to be online right so if like if the entire committee is offline it's not the end of the world because you can always fall back to the blockchain okay and so I'm going to take a slight detour to talk about a scheme that others have considered this I like this example because it kind of helps to illustrate what approaches work and what approaches don't so here's a very simple idea I'm going to run the blockchain and I'm going to have the committee vote on the blocks if I see enough people vote on the same block I'm going to confirm the block immediately without waiting for more blocks to grow okay so what do you think about this very simple idea where people vote on blocks and you want to confirm as soon as you hear enough votes on the same block okay so it turns out that this scheme actually doesn't give you a consistency and here's why right let me explain a very simple reason why this protocol doesn't work at some point of time maybe people see the FARC A and they all voted for this FARC so this FARC collected enough votes and people would have confirmed this FARC very quickly but because the narrowcast delay right so even if there is no attack just because the narrowcast delay the blockchain can have forks like in an organic faction so it could be possible that at the end of time the FARC A didn't survive it is FARC B that survived to the end so this would be problematic because earlier if you had confirmed the FARC A then you would have risked inconsistency with everyone else at the end of the time okay is this clear okay so yes so if we are going to assume honest majority for the committee you have to wait for three quarters of the people to vote has a way of this this is like something that I said is outside the scope for instance Ethereum has a mechanism to elect a committee and maybe consisting of 2,000 voters or validators maybe 1,000 to 2,000 validators so you know exactly who the committee are blockchain protocols cannot are not petition tolerant so that this is the point I will mention at the end of the talk okay okay all right so the point is that even if we can fix the consistency problem this is not the kind of protocol that we want because you don't really want to be voting on the blocks if you are voting on the blocks you are slow to start with because you are you are subject to this one block interval right like I said we want to confirm in two to three actual round trips we don't even want to wait for a single block interval okay so that's our goal and actually in our paper we call this and technically we refer to this property as responsiveness you don't want to wait for any a priori set synchronization delay okay so now let's get back on the right track okay so this was like the the kind of scheme I talked about earlier right so I'm going to quickly recap the scheme but now I'm going to keep in mind that there are many instances of this little voting protocol so Vitalik makes a proposal everyone votes and we are going to wait to hear three quarters of the committee to vote that that's because we are assuming on its majority of the committee right earlier we use like slightly different parameters okay so here every batch of transactions that has a sequence number so this means that this transaction has collected enough votes and I'm going to call this as and call this notarized so here one two three five six are notarized but four is missing and because we have to process these transactions and in a linear fashion right in this case I can only process the first three transactions and and for the lack of a better term I'm going to call this the maximum lucky sequence okay so I'm always going to confirm the the maximum lucky sequence okay so again the problem I'm trying to solve here is how do we get liveness when the leaders corrupt or let's say when the committee is not online okay and that's where we want to make use of the blockchain right so we haven't used the blockchain yet and we are going to use the blockchain for two purposes first to collect evidence to detect when something is go around so the blockchain can tell us okay now the fast path has failed and you should fall back to the blockchain and once we detect we make such a detection we'll make use of the blockchain to enter the slow mode and of course you don't want to be stuck in the slow mode forever right once you are back in the blockchain mode you can always use a smart contract to re-elect a new leader and new committee and then you can try to reboot strap a fast path okay so number one and two and I'm going to talk about how to achieve and one and two respectively first how can we collect evidence so this detection mechanism needs to be robust in particular we want that the 40 nodes cannot falsely accuse Vitalik right Vitalik is honest we don't want him to be falsely accused as well as the current committee okay so I know today you know when we think about the blockchain we tend to think that the blockchain collects transactions for the purpose of this talk I'm going to ask you to think of the blockchain slightly differently right so when miners mine the block when I mine the block I'm going to put in the block everything I've observed in the protocol so far so this is a conceptual way to think about it when you actually implement the protocol there's a way to implement it such that you don't have to put so much information in the blockchain and that's going to be more scalable so for simplicity for this talk I'm going to ask you to imagine when miners mine the block they tell the blockchain whatever they know everything they have observed so far okay so here the transactions tacked with the sequence number and are notarized and otherwise they're unnotarized okay imagine the following is happening in some block there is a red unnotarized transaction okay so normally what's going to happen is that if the leader's honest and he observes this he will propose it very quickly to the committee members and soon enough this red guy is going to become part of a lucky sequence right so suppose this has not happened even after Kappa blocks where Kappa is a security parameter then something has to be wrong right if red has not become part of a lucky sequence it's either the leaders corrupt and trying to censor this red transaction or maybe the committee is just not online in either case we want to fall back to the slow mode any questions yes I'm going to explain how to go to the slow mode it's actually slightly tricky to do it correctly and so how do we fall back to the slow mode so now we have all observed this and we all want to fall back how do we do this the tricky part here is that when we all want to fall back our fast path lock may have different lengths right because narrow cast delay like your lock can be a little longer than mine so essentially at this moment we have to decide on the cutoff and this is will be the cutoff of the fast path lock before switching to the blockchain okay yes so direct transaction is like one of the transactions that appears to use one of the miners that we're able to kind of like finalize a block or what work like do we get it the red transaction like so so I'm not sure that I understand the question but I'll be happy to discuss it like who picks who picks the transaction that gets checked against this security sequence and where do you get it if you pay transaction fee you can put transactions on the blockchain and normally you know this transaction should become part of a lucky sequence or everyone should notarize it the leader should propose it and if it has not happened then the fast path has failed so but like you as the initiating party then have has to go like and call yeah my my transactions got not sequence and the lucky sequence so now I'm calling off the like the security right so everyone's making this this check so everyone can see the blockchain and everyone is going to make this check to see if the blockchain has yeah so it's assuming that's like more miners that that we're able to like finalize because because though it's still works in a sense that like individual miner finalizes the block or like all of the miners kind of participate in the block this sounds like a very interesting question I think maybe we should take it offline because I would like to understand your question a little better before I can answer it okay all right okay so the the problem we want to solve is how do people agree on the cutoff right so this seems like a very simple problem you may have some kind of ideas on your mind how to how to resolve this problem but the first thing to notice is that the first thing to notice is that this is an agreement problem in itself and here we need both consistency and liveness right unlike earlier when we were talking about the fast path we punted on liveness here we cannot punt on liveness anymore we have to solve the agreement problem in full so again how do we do it the blockchain is your friend and and the key is that you don't just immediately enter the slow mode you introduce a grace period the grace period is for people to cool down and once you pass the grace period you enter the slow mode and how does the grace period work okay so imagine roughly speaking at this block we we start to detect that something has gone bad so now everyone stops stops participating in the fast path and we are just going to tell each other what we know and we tell the blockchain what we know right so the fast path stops at this point so soon enough after Kappa blocks where Kappa is the security parameter I claim that let's look at the prefix of the blockchain from the very beginning up till here so what's going to happen is that the maximum lucky sequence contained up till here is going to be at least as long as everyone's off-chain lock and this is the cutoff we are going to decide on okay so once we decide on this cutoff we can proceed and move to the slow mode and to quickly recap this is the protocol you use the blockchain to detect that the fast path has failed and then you leverage a grace period to enter the slow mode and when you are in the slow mode you can always try to reelect leader and bootstrap another fast path okay so to quickly and recap the security guarantees right so when the miners are majority honest and the committee is also majority honest we can have our worst case guarantees however when things are good let's say the the leader is honest and online and imagine three-quarters of the committee are also honest and online and in this case things are all good and we confirm transactions in two narrow ground trips not even waiting for a single block interval so a small side note is that this thing this parameter is actually tunable right like maybe I'm very paranoid and I don't want to trust that the majority of the committee be honest which is fine and for instance I want to say I want security as long as a single member of the committee is honest and the single honest member doesn't even have to be online and that's okay too but you pay a little bit in terms of the the condition that's necessary for the fast path in other words you would have to require that all the committee members are honest in order to be fast okay okay so here are the two sentences and to remember about Cinderella here's how Cinderella works when things are good we conduct a single round of voting and when things go bad we use the blockchain to do a view change so it's as simple as this in our paper we have a lot of other discussions on how to elect the leader and the committee how to have practical optimizations that make the protocol even faster and more scalable we have formal proof self-security okay I've mostly focused on the permission list setting but actually Thunder Rella can also be instantiated for permission right so if you are you have a company that's commercializing consumption blockchain and you can if you if your customer wants high volume and fast confirmation Thunder Rella can also be a good choice so I'm actually talking to a couple companies including Hasara and Cryptape to see if they're to understand if their customers would like something like Thunder Rella okay so I'm going to try to conclude and before I conclude I want to talk about some of the deeper insights we have gained in this process right so why after 30 years of research we can suddenly claim to be a lot faster and simpler than everyone else that there is actually a reason because Thunder Rella is in fact a new theoretical paradigm okay so classically and these are what classical protocols are like and the reason why they're complex is because they work with a synchrony right these protocols tries to reach consensus in asynchronous or partial asynchronous networks where the protocol isn't aware of the maximum narrow delay so this means that in the protocol I cannot you know wait for five seconds to receive a message and then if the message doesn't arrive I can assume that it never will so asynchronous makes life a lot more difficult and in some sense we baited and baited and switched to a synchronous protocol underneath so this is also saying that blockchains are by nature synchronous and there are a couple ways to understand this right so in blockchain as we know we have to set the the block interval right and as I said that the block interval has to be reasonably large with respect to the narrow delay for the protocol to retain consistency but in fact actually interestingly I have a position paper with Raphael in CSF we actually show that in the permissionless setting when you are not sure how many nodes are going to show up in fact any consensus protocol has to be synchronous asynchronous consensus is basically not possible when you are not sure how many nodes are going to show up so this actually has a very simple law-bound proof okay so now we can look at things from the perspective of asynchronous versus synchrony right so asynchronous protocols are fast because the protocol doesn't know the narrow delay the only way to make progress is if I make actions as soon as I receive messages so of course the protocol is going to proceed as fast as the network makes progress this is all very nice but if you try to work with asynchronous directly there are problems like not only are the protocol complex there are also fundamental barriers like for instance there is a very well-known law-bound by the work at all that any asynchronous protocol can tolerate only one-third corruption and like I said if you want permissionless it's going to be even worse because asynchronous is not possible with permissionless okay if you look at like the line of work in let's say SOSP these are top systems conferences right and also the systems that Google and Facebook implemented like none of these systems actually considered synchrony and why is this the case so classically we tend to think that synchronous protocols are kind of slow because in a synchronous protocol there is a very important parameter to set which is like the synchrony delay right so if your average now a delay is like one second you might want to set the synchrony delay to be 10 seconds just to be safe because if this assumption is violated your protocol can lose all security guarantees okay so the classical wisdom is that you know synchronous protocols are kind of slow like in Google scenario they actually want microsecond latency right so our insight is that this classical wisdom in some sense is incorrect so Tanderella is very interesting because most of the time 95% of the time we live in the asynchronous world and when things go wrong right in the 1% of the time we fall back to the synchronous mode so that's why we can circumvent these fundamental barriers related to a synchrony right like there's a one-third barrier which we can overcome because we can tolerate and minority corruption in a permissionless setting and in fact if you are in a classical setting we can even tolerate arbitrarily many thoughts for instance if you instantiate the underlying blockchain with the Doh Lefstra like protocol so we can theoretically circumvent these barriers related to asynchronous okay so to conclude you know simplicity is really your good friend especially for large-scale distributed systems and and our company is currently hiring if you are interested you can email these addresses if you're one of the one of the key contributors of an early-stage startup thank you very much