 Hi everyone, I'm Erica and today I'm going to tell you about tardigrade which is an atomic broadcast protocol for arbitrary network conditions developed as part of joint work with the Jonathan Katz and Julian Loss. So first of all, what is atomic broadcast? Atomic broadcast is a fundamental problem in distributed computing in which parties receive input values over time and need to agree on a growing ordered sequence of values and to complicate things, some of those parties are Byzantine or faulty and those parties may behave arbitrarily. In our setting that we're going to consider, we have point-to-point authenticated channels between all of the parties. Messages sent by an honest party eventually arrive, meaning messages can't be dropped and we assume a trusted dealer performs setup for public key infrastructure, PKI for short, threshold signatures and threshold encryption. Let's say that you're an engineer who's considering using an atomic broadcast protocol for some application. You've done your research and know that two of the most widely studied models are the synchronous model and the asynchronous model. There's also the partially synchronous model, but we won't consider that in this talk. In the synchronous model, messages arrived within a fixed delay denoted by Delta. In the asynchronous model, on the other hand, there's no upper bound on message delay. Furthermore, you know that the optimal fault tolerance, assuming a PKI, is T less than n over 2 in the synchronous model and only T less than n over 3 in the asynchronous model. How do you choose which one is right for your use case? And maybe more importantly, what happens if you're wrong? If we take a protocol that is secure in an asynchronous network and stick it in a synchronous network, it will be perfectly happy as long as there are no more than n over 3 faults. But it can't be secure if there are more than n over 3 faults. If we had guessed correctly and just used a synchronous protocol, we would have been secure up to n over 2 faults. On the other hand, suppose we have a protocol that is secure in a synchronous network, where the delay is at most some fixed Delta, and we run that protocol in a network where the message delay might exceed Delta. If the delay exceeds Delta even by a tiny bit, and even if the number of faults is less than n over 3, our proof of security doesn't hold anymore, and it's not clear what will happen. To give us an idea of what's been done before, I'll just briefly mention a few related questions that have been considered in the literature. This is by no means exhaustive, but hopefully it provides a reference point. There's been prior work looking at networks that might be synchronous or partially synchronous, looking at temporary partitions or sluggish faults, looking at tolerating up to n over 2 crash faults if the network is asynchronous. Most closely related is our prior work on network agnostic protocols for Byzantine agreement and also secure multi-party computation. Thinking about these different choices led us to the following question. Is it possible to design a protocol that tolerates strictly more than n over 3 faults if the network happens to be synchronous for a fixed value of Delta, and still tolerate some lower number of faults if the network is asynchronous? We made three main contributions on this topic. The first is a lower bound showing that for any TA and TS such that TA plus 2 TS is greater than or equal to n, it's impossible to have an atomic broadcast protocol that is secure against TS faults in a synchronous network and also secure against TA faults in an asynchronous network. Our second contribution is tardigrade, a protocol for atomic broadcast that achieves the optimal resilience, i.e. that gives us exactly those guarantees for any TA and TS, such that TA plus 2 TS is less than n. We also discuss how to make tardigrade adaptively secure. Our third contribution is upgrade. Another atomic broadcast protocol with security guarantees in both synchronous and asynchronous networks. Compared to tardigrade, upgrade has better communication complexity but tolerates an O of epsilon fraction fewer corruptions and is not secure against an adaptive adversary. Designing a protocol that is efficient and also achieves adaptive security for the optimal number of faults is an open question. Before we continue, let's formally define the problem we're trying to solve so we're all on the same page. In atomic broadcast, each party has a local buffer of values called transactions and a right once array of blocks. For our purposes, a block is just a set of transactions. Values are added to parties buffers over time via some external mechanism. We don't assume that transactions are added to every party's buffer at the same time, or even that every transaction is eventually added to every honest party's buffer. So for example, if we have five parties, then perhaps P1 has a buffer with transaction 1, 2, and 3. Meanwhile, party 2 has seen transaction 2 and 4 and they're all going to get together and try to agree on a sequence of blocks that include those transactions. The security properties that we care about are consistency, which says that if two parties have both output a block, then at the same position in their array or chain, then they should be the same block. Completeness says that each party eventually outputs a block at index K for all K. And finally, liveness says that if a transaction is in all honest party's buffers, then each party should eventually output a block that contains that transaction. Sometimes you'll see blockchain protocols that achieve a stronger version of liveness, especially in synchronous settings, but this weak definition is fairly standard for settings that don't necessarily assume synchrony. Next, I'm going to give an overview of the lower bound, which states that there is no atomic broadcast protocol that is network-agnostically secure, meaning secure against TS faults in a synchronous network and against TA faults in an asynchronous network if TA plus 2TS is greater than or equal to N. The proof follows from a generalization of Togue's classical lower bound for randomized Byzantine agreement. The intuition is that when TA plus 2TS is greater than or equal to N, an honest party might not be able to distinguish between executions where the network is synchronous and a set of malicious parties are refusing to participate and executions where the network is asynchronous and messages from a set of honest parties are very delayed. So just to talk things through with a bit of an example, let's assume towards a contradiction that we do have a protocol that achieves these properties for N equals 5, TA equals 1 and TS equals 2. And suppose that P1 is honest and hasn't heard from P4 and P5. One explanation is that the network is asynchronous and P4 and P5 are honest. Their messages have just been delayed. On the other hand, it could be that we're in a synchronous network and P4 and P5 are malicious and just refusing to participate. If they're malicious, then P1 can't wait for them forever because this contradicts liveness. On the other hand, if they're actually honest and P1 moves on without them, we might violate consistency. So in the proof, we formalize this intuition by proving that there is a synchronous execution with at most TS faults and an asynchronous execution with at most TA faults that are indistinguishable from the perspective of an honest party and then showing that in at least one of these executions, security must be violated. As a quick interlude before we learn about the protocol we named tardigrade, I'd like to tell you a little bit about the animal by the same name. This friendly microscopic fellow, sometimes known as a water bear, is capable of surviving extreme heat, cold, radiation, and pressure by entering a state called cryptobiosis, which is pretty apt. In one experiment, 68% of tardigrade subjects survived exposure to the hard vacuum of outer space. Not many animals or atomic broadcast protocols can say that. So without further ado, here's tardigrade the protocol. A quick disclaimer before we get started. In the paper, we go into detail about the block size and ways to improve the throughput, liveness, and communication complexity using threshold signatures and threshold encryption. For the purposes of this talk, I'm going to present a simplified version, assuming no upper bound on block size, so we can just focus on the protocol flow. With that out of the way, let's get started. The process for agreeing on a new block has four main stages, an input stage, two agreement stages, and an output stage. In the input stage, each party signs their whole buffer and sends both their buffer and that signature to all other parties. Once a party has received buffers and signatures from enough different parties, they bundle them together and input them to the agreement stages. We call these bundles of signed buffers pre-blocks for short. Now in the two agreement stages, we have the same goal to agree on a set of pre-blocks, but each stage has different security properties. I'll talk more about that in just a moment, but for now let's fast forward. Eventually, the second agreement stage outputs a set of pre-blocks. At that point, each party combines all of the transactions into a final block and outputs it to their array. Okay, now we're ready to fill in some more details. During the initial input sharing phase, I have two timers running. If I'm able to form a pre-block by the time the first timer goes off, I'll input it to the first agreement sub-protocol, which we call block agreement. Then, if I output a set of blocks from that phase before my second timer goes off, I'll take that output and input it to the second agreement phase. If the network is synchronous, then everything works out. I receive all the inputs I need before the first timer goes off, and the first agreement phase completes before the second timer goes off. Of course, the network might not be synchronous in which case I need a backup plan. The backup plan is simple. If the first timer goes off while I'm still waiting to receive enough inputs or if the second timer goes off while I'm waiting on an output from the first agreement stage, I'm going to give up on the first agreement stage. I'm just going to wait to gather a pre-block if I haven't already, and then input it directly to the second agreement phase. In either case, whichever path we end up taking, once I output a set of pre-blocks from the second agreement phase, I combine them into a block, and that completes one epoch of the protocol. We keep repeating this process to agree on more and more new blocks. So that's how it works, but why does it work? Basically, each agreement phase achieves stronger guarantees in one setting and achieves weaker guarantees in the other. The first agreement phase, which we call block agreement, can be viewed as a form of validated multivalued agreement. If the block agreement protocol is run in a synchronous network with up to TS faults, then all parties will agree on a set of pre-blocks before the timer runs out. On the other hand, if it's run in an asynchronous network, then it might not terminate in time. But if some honest party does receive output, they still output a set of pre-blocks that satisfies some minimum validity property. The second agreement phase, meanwhile, is similar to a standard asynchronous common subset, or ACS sub-protocol. If it's run in a synchronous network with up to TS faults, we only guarantee a weak validity property, because that's not really what it's meant to do. But if it's run in an asynchronous network with at most TA faults, we get full security. So to review, we offer three main contributions. A lower bound showing that this notion of network agnostic security is impossible if TA plus 2 TS is greater than or equal to N, and two constructions, one with optimal TA TS, and one with better communication complexity. And like our friend the tardigrade, constructions are able to survive in whatever environment they end up. That concludes my talk. Thank you for watching.