 Next we have Florian's Repair from Cornell University, who will be presenting their title, Basil, Breaking Up BFT with Acid Transactions. I'm really excited to be able to talk to you today about a system we built called BASIL, where we try to rethink how to build performance with expressive BFT databases. Before I get into BASIL itself, I wanna start by briefly building our context by revisiting the simple but beautiful abstraction of totally ordered ledgers that lies at the core of most BFT or blockchain systems. Now, this is a really powerful abstraction because it allows us to have mutually distrustful parties that are able to replicate and share data, but regardless of a sum amount of parties compromising or acting in a malicious way, they can still agree on a common view of the system state. And by totally ordering transactions we can trivially maintain traditional asset guarantees such as atomicity or isolation, which makes it easy for database applications to materialize key value stores on top. This is a really powerful abstraction and desirable abstraction for our applications to build around, but implementing it in a fairly scalable fashion is challenging. On the one hand, ordering is costly. There's both permissionless and permission systems I'll be focusing on the permissioned ones. We usually need several roundships of communication between the replicas. We usually rely on a dedicated leader to act as a sequencer and we usually have costly and quite complex recovery protocols that no without reason give BFT a bad rep for being overly complex. On the other hand, while executing transactions sequentially is trivially safe, that's an obvious super bottleneck. And to make matters worse, this kind of order execute pattern usually forces us to use restrictive transaction models such as store procedures, one-shot transactions or even UTX models that in practice database systems don't actually use very much. So in fact, there was this interesting study in 2017 performed by CMU that showed that the majority of database systems use store procedures less than 10% of the time. So we want to strive to do better than that. In general, it's not hard to see that all this coordination, this ordering, the sequential execution is overly heavy-handed for most real-world workloads that consist largely of commutative operations. This transactions of Alice and Bob are trying to buy a race car, gelato, respectively could have been safely executed in parallel. And that's by no means a new observation. We were hearing several talks earlier about how to use sharding and how we can use that to leverage increased parallelism or systems. But what I want to highlight is while we definitely agree that sharding is necessary to scale our resource horizontally, using it to achieve parallelism is more of a side effect that's dependent on our workload and the quality of our partitioning. So really within each shard we still retain this total order. So really for the most part, sharding is a band-aid that allows us to recoup some of the parallelism that we've artificially removed by totally ordering. Additionally, when we have multi-shard transactions we need to run some sort of this real commit protocol, a two-phase commit. And now we are enforcing coordination for consistency twice, both to replicate within shards and across shards to maintain safety. We also can't shard arbitrarily because increasing the amount of shards for multi-shard transactions increases the communication cost for coordination linearly, which is especially costly for signatures in BFT. Now what we propose instead in Basel is to take a page from our crash failure-based counterparts and instead of implementing this total order start to implement executions that are equivalent to a total order or serializable. So take, for example, the executions down left. What serializability states is that Alice, Bob and Charlie's transactions can safely execute in parallel and in a non-atomic fashion because their operation outcomes are equivalent to the sequential execution of first Alice and Bob and Charlie. So in Basel, we take this idea and we build a BFT transactional key value store that offers interactive and serializable asset transactions to scale the subtraction of a totally ordered fault-tolerant log. The way we realize this is by building Basel around the core ethos of what we call independent availability that roughly states that all operations that can be independent should also be executed independently. So in Basel, what we do is we adopt a client-driven approach in which each client drives its own transaction processing and is responsible for its own progress. And as part of that design, we try to strike a balance between optimism that allows for aggressive parallelism but still have robustness to failures. It's not hard to imagine that by empowering also Byzantine clients and letting them be optimistic, that's a slippery slope and doing so safely and robustly is the main challenge in Basel. At a high level, Basel is made of three core components. First, a concurrency control mechanism that allows for optimistic parallelism that ensures serializability. Second, a commit protocol that is integrated with the concurrency protocol to avoid redundant coordination and efficiently ensure consistency both within replicas and across shards. And lastly, a fallback protocol that allows clients to retain independent ability and liveness in the face of Byzantine client failures. Now it's actually not immediately obvious how to even quantify serializability in the presence of Byzantine clients. So in the paper, we also spent some time quantifying that in the next slides, I'll briefly go over parts of these protocols. But if you're curious, I do encourage you to read the whole paper if you want to know more about the details. So let me start by very quickly outlining how execution works in Basel. So clients use this interactive transaction model and executes the transaction in parallel with other clients. Reads are sent to local replicas and importantly, since those could be Byzantine, they need to make sure that A, of course, these are valid values, which you can do by just adding signatures, but also two that they are not tricked into reading arbitrarily stale values. Instead, writes are buffered locally in order to delay the visibility until commit time. And that's an important property for recovery that I'll touch upon in a little bit. But the intuition here is that we don't want Byzantine clients to be able to leave incomplete transaction lying around arbitrarily. Now, of course, any such speculative execution performed by clients in parallel might not be serializable or Byzantine serializable at all. So we need to validate them before being able to commit. And the way we do that is by having clients submit their completed transactions to all replicas and all involved shards, who then vote on the local safety of this execution by running a efficient multi-version concurrency control check. Clients just collect these votes from a shard and if sufficiently many deem this transaction locally serializable, then we can conclude that it's safe to commit on any given shard. All right, when we have multi-shard transactions, clients simply aggregate the quorum of votes to form a two-phase commit decision. And in most cases, when there's no contention or failures, this decision is immediately durable and clients are able to directly commit in a single round trip and return to the application. Now, of course, that might not always be the case when we have contention and concurrent transactions, they may conflict, which causes some transactions to either receive fewer commit votes, in this case, Bob, or to be forced to retry their transaction and the board. Now, in that case, we do need an additional round trip to make this two-phase commit decision durable. But what is really cool in BASIL is that we can do this on only a single shard, no matter how many shards are involved in execution in total. Now, another thing that is really cool, but I don't have time to go into detail, but I do want you to take away is that BASIL's commit protocol at this client-driven design is designed in a way that allows neither clients nor groups of Byzantine replicas to single-handedly dictate the results of the transaction outcome, right? Whether it should commit or abort. Now, paper would formalize this as a general BFT system property we call Byzantine independence. This is a really strong property that most traditional state machine replication or BFT protocols don't meet, because a leader that they rely on as a sequence that has undue control over the ordering and can inject transactions for front-line. This ordering concern is really important in many financial applications, and actually conveniently saw that Mahimna is gonna talk about this later, so I'll leave it to him to convince you of the importance of this. The takeaway here should be instead, that BASIL with its client-driven design sidesteps all of these concerns of ordering and fairness entirely. Now, what I did swipe under the rug in the previous four slides, but I kind of alluded to previously is the fact that by empowering clients, I mean, in particular, also empowering Byzantine clients. Now, if their transactions are commutative, then that has no impact on other clients, but if they do conflict, then misbehaving clients can block contending transactions to stall indefinitely or even force them to abort and have to retry. Now, the way we address this in BASIL is by allowing any client to drive the commit protocol for any transaction. Now, that's perfectly safe because I already told you we delayed the transaction visibility until the commit time, so we're not accidentally stealing away incomplete the transactions. And B, like we just talked about, clients are not able to decide the outcome of the transaction and hence, it doesn't really matter which client coordinates the commit. Now, this recovery protocol has some nice properties. It requires usually only a single roundtrip. It also only involves a single shard and does so with linear communication. Now, like with most BFC protocols, the details can get fairly involved, but what I find especially cool is the fact that clients are A, in charge of their own liveness. Those are B, recovery only affects contending transactions and that's unlike existing view change protocols that have to hold all transaction processing when the leader is under the risk. Of course, with this client-driven design, there's several other challenges, such as dealing with equivocation or live lock with multiple clients, which I won't talk about now, but reading about those in the papers, one of the many pleasures you'll have been reading the full paper. Yeah, at this point, I want to cut short the short overview, this ever so brief overview I gave you of BASIL, give you a quick recap and resume. So we built BASIL, which is a replicated and shorted BFT database. It implements interactive transactions that offers a flexible application interface for developers. And it meets both these properties of Byzantine serializability and business independence that I kind of alluded to, but we define them more detail in the paper. And the performance side of things, it allows transactions to execute in parallel, rather than sequentially, it allows them to commit across shards on just a single round trip in those cases, and that's so with linear communication complexity and without incurring the potential scalability of fairness bottleneck of a leader. And lastly, like we were just talking about, it allows for independent failure handling for commuter transactions. All right, so before we end, I want to briefly talk about how this design translates to practice. So what we did is we implemented the prototype and we evaluated it by implementing also three common OTP workloads, TPCC, Small Bank and Red Twist, that all use interactive transactions but experience varying level of contention between their transactions. So on the left we compared BASIL to TAPER, which is a recent crash failure state of the art database that also uses similar client-driven design and also integrates the distributed commit and replication layer. And here BASIL's main overhead stem from the fact that it requires signatures and larger forms for BFT, but nonetheless it offers quite competitive performance given the increased security and resilience to BFT. On the right, I'm showing you a comparison of BASIL against two BFT baselines that also both are built to offer interactive transactions and concurrent execution, but unlike BASIL follow this kind of standard modular approach that layers to face commit and concurrency control, the top black box consensus protocols, in this case, hot stuff and PBFT or its implementation BFT smart. And here we can see that BASIL significantly outperforms both, mainly in this case, because it avoids this redundant coordination during distributed commit, which reduces latency down to a single round trip that in turn translates to throughput on convention bottleneck workloads. Now, it's obviously great that BASIL achieves this performance during gracious execution, but that was partially achieved on the back of the premise of empowering business clients as well. So to validate how robust BASIL remains when clients do misbehave, we also evaluated the impact that business clients specifically have on the transaction possibilities of correct clients, right? Because that's in line with the speakers of independent vulnerability of each client driving their own transaction. So clients may fail in many ways they went from install their transactions during commit, which causes contending transactions to block, but BASIL's recovery protocol allows them to very quickly complete some transactions. And in fact, without in most cases, to having to abort at all, because our concurrency control mechanism also allows clients to form re-dependencies on prepared but uncommitted transactions. So these failures are just turned into blocking and after some sort delay, the transactions finish anyways. So I was showing here, the impact is fairly overseable. Instead, the worst failure that clients may do, which is however also detectable, is that they may try to equivocate, they may try to replicate inconsistent results or proposals during the logging phase of the distributed commit. And that incurred significantly higher performance penalty you can see in the red box, because recovery is more involved. But fortunately, like I told you earlier, BASIL doesn't actually allow clients to themselves choose what proposals or what values to propose. So this was an artificial simulation where we allowed them to always equivocate. But in practice, and you can see that in the green box, we show that the ability to equivocate is exceedingly rare in practice. And that makes the strategy impractical or infeasible to pursue. So overall, we can show that despite more than 30% of transactions being faulty, BASIL is not just live but robust, it has robust performance because failures only affect conflicting transactions and when they do, they can be recovered swiftly. All right, at this point, I want to end by concluding that we show good BASIL that using this client-driven design, it is possible to still build this abstraction of the BFT totally ordered log, but do so in both a highly concurrent and yet highly resilient way. And I know we had very little time to talk about the interesting technical bits. So feel free to ask me in the Q&A or send me an email and in our paper, which I've linked here, you can find detailed discussions of the protocols that compose BASILs and several other microbenchmarks to help understand the performance profile. Thank you.