 So thanks everyone for joining today's virtual Hyperledger Meetup. We have Jakov Menević speaking about Byzantine Fault Tolerant Consensus Library for Hyperledger Fabric. If you have any questions throughout the presentation, please do feel free to add your questions in Zoom chat and we will get to those and we welcome your comments, questions, feedback and with that I will hand it off to Jakov. Thanks. All right, so let's start. So first of all, a few words about me. So I'm a senior research software engineer at IBM Research. My current interests and also the stuff I currently work on are central bank digital currency and BFT consensus. And also I do some applied cryptography and implementations, for example, threshold cryptography like threshold signatures and stuff like that. Also I've been a maintainer of Hyperledger Fabric Core, meaning the order and the peer components since early 2017, really since the inception of Fabric. And lastly, and this is what this talk is about, I led the development and integration of the BFT that Fabric currently has into Fabric. And it took, it started mid 2019 and you will see during the presentation where we are today. So what I'm going to talk about today. So first of all, what is Byzantine Fault Tolerance? Also I'm going to cover a history of consensus in Fabric. And by the way, this talk is for technically oriented people. So I assume you have some familiarity with Fabric. I assume you know what is consensus. I'm not going to define these things, but feel free to open a new tab and read while I'm speaking. Also I'm going to talk about, of course, the library, the BFT library that we have integrated into the official Hyperledger Fabric version. And how did we integrate it? What are the various nuances? And I'm going to give some deployment tips in case you actually want to run it in a pre-prod or production eventually. So what is Byzantine Fault Tolerance? So the scientific literature and also reality, it identifies two failure models. So we have the most simple failure model, which is a fail stock model, where nodes, servers, they can either crash or become unreachable. Like someone cut the cable, some router is unreachable, so you cannot communicate with the node. However, there is another failure model in which basically the node can do anything it wants. What would it mean by anything? So first of all, you're not assured that the computation that this node is beginning will ever end. The node can be stuck in some infinite loop. It may return some incorrect result of a computation, and it may deliberately also return some misleading result. And lastly, the node might be so malicious that it may return a correct result to one node and an incorrect result to another node. So why is this? And also, of course, a protocol that can only work in a CFT, in a CFT model, it will not work well in a BFT model in the sense that you won't have any more safety and liveness guarantees. So we will see in the next few slides, if you take a rough protocol and you let it run with nodes that can behave arbitrarily, you might get a fork. Some nodes will agree on one value, some nodes will agree on another value. So why is Byzantine tolerance even important? From all kinds of reasons, actually, the first reason is that you may have software bugs, and software bugs may cause arbitrary behaviors. So a Byzantine protocol is somewhat more resilient than a CFT one to software bugs. Of course, if you have a software bug that affects everyone, all nodes, then, well, of course, then you have a problem, right? Another reason is that if, in some cases, you have parties that these parties run these nodes, and they may deliberately behave in a malicious manner, either to get some economic incentive or just out of spite, just because they want to crash network, for example. And lastly, of course, nodes might be compromised by hackers. So if you want your protocol to be really resilient, you might want to consider using a Byzantine fault-tolerant protocol. So let's compare some protocols that we know that are very popular. So for the crash fault-tolerant protocol family, we know where there are Paxos and RAF protocols. And in the BFT protocol family, we have PBFT, which is really the ancestor, probably, of all BFT protocols. And we have the relatively new protocol stuff. And in the CFT case, the assumption is up to a half, like up to less than a half, R40. And in BFT, up to less than a third, almost a third, R40. However, as I said before, in the BFT case, when a node is faulty, it can behave in any manner you want. But in the CFT case, it can either crash or become unavailable. Now, sometimes when a leader node, the node that usually is the node that proposes the next batch to be agreed upon, if this node crashed, for example, you need the protocol to rotate this node. So how is it done in CFT? In CFT, the leader rotation is opportunistic. Basically, a node can say, hey, guys, I'm OK. I know the previous leader crashed. Make me a leader. And the rest of the nodes, they can agree or disagree. And usually it's like the fastest node that is doing that. The fastest node that saw the previous node crashed. However, in BFT, you cannot have this because then a malicious node will always be the fastest node. So you have a round robin protocol that rotates the nodes in a round robin manner. And lastly, so in the CFT protocol, single node can dispatch requests from a client. And this single node can make sure that the transaction will eventually be ordered. For example, in Raft, the way that it works is that if the transaction reaches a leader, the leader will include the transaction in a block. But if the transaction reaches a follower, then the follower node will simply send it to the leader. However, in BFT, we cannot assume a single node will behave correctly because a single node may want to preserve its network bandwidth. So it may not forward this transaction to the leader. So in BFT, we would like to send the transaction to as many nodes as possible or to everyone. Any questions so far? Let's see in the chat. I think that no question. It's OK. So let's go on. Next, I'm going to talk about the consensus in high-plagia fabric, meaning the history. So how did we get to where we are now? So like a brief reminder, we are talking about high-plagia fabric. High-plagia fabric has two types of nodes. We have ordering service nodes and have peer nodes. So peer nodes, they process and validate transactions that appear in blocks. But how are these blocks even formed? So transactions are sent to ordering nodes and ordering nodes. They totally order these transactions in the sense that the output of the ordering nodes is that the output is blocks. And these blocks, they reach all the nodes, the same blocks, they reach all the nodes, and then the same blocks are disseminated to all peers. So at the beginning of high-plagia fabric in early 2017, we only had a Kafka-based ordering service. The way that it worked is that transactions were sent to the ordering service nodes that were that they were basically front-ends to Kafka. They would forward the transactions to Kafka and then they would all read them in the same order back from Kafka. And then they would cut blocks, each ordering service node would cut blocks in the same way. And this way the blocks would all be identical. However, unfortunately, this didn't really work out very well. Why? Because Kafka, as you know, it depends on another deployment of ZooKeeper and also someone needed to actually deploy the Kafka nodes, which is like a message broker. So this kind of deployment was complex. It was very centralized. And we cannot really say that it was distributed. Then in mid 2019, we released the RAFT-based ordering service. The RAFT-based ordering service used the ETCD RAFT library, the same library that is used like in ETCD, in Kubernetes. And their ordering service actually improved many things. So first of all, it was easy to deploy. It was only all in the single process. No more dependency on Kafka or on ZooKeeper. So we had it had no management overhead. However, it was only distributed. It was it was not really decentralized. So someone said someone asked some question about quantum mechanics. It's not really relevant to BFT. So I'm not going to read questions that are not relevant to this topic. So the good question, what is the difference between distributed and decentralized? So I will get to that in a few slides. Good question. So as I said, the RAFT was distributed, but it was not decentralized in the sense that it makes no sense that you will have several parties each running a RAFT node if these parties do not trust each other. Why? Because as I said, RAFT is only a CFT crash fault tolerant, contentless algorithm. And therefore, a node in RAFT, if it is malicious, it may deviate from the protocol and cause like a fork in the blockchain and all kinds of other problems. So it is distributed in the sense that if you run RAFT, you need to assume that all parties trust each other or that no party is malicious. However, a decentralized setting is really like what you have like in BFT or in today's permissionless blockchains where no party actually trusts another party. So as we said, Kafka and RAFT are both only crash fault tolerant and we didn't have a Byzantine fault tolerant consensus in fabric. Then in 2019, a couple of researchers, they published a research work that integrated a Byzantine fault tolerant consensus algorithm into fabric. So let's quickly go over how it worked. So it worked in the following manner. On the right side, you have fabric peers and then you have the fabric ordering service node. So imagine that if you have a client, the client sends her transaction to the ordering service node. The ordering service node, which is written in Go, the fabric core is written in Go. This node would then forward the transaction through some kind of a shim, which is like a proxy, to a set of BFT nodes written in Java, because the implementation of their paper was Java-based. So then the BFT nodes would do their consensus and the shims, these proxies, they would read all transactions in the same order. You can think of it basically as a Byzantine fault tolerant version of Kafka, because that's also how it is done in Kafka. And then each such a shim would forward the batch of transactions in the same order to the ordering service node that it came from. And then each ordering service node on its own would sign a block of transactions and the block would be disseminated to peers. So this is obviously a very big leap from what we had before. However, this deployment had some problems. So the first problem is that in order to actually maintain this and to operate this thing, it's not that easy. It's actually the same way that you had in Kafka. So imagine that now you want to add a new BFT node. You cannot actually express that in fabric terms. You cannot send the fabric transaction to this BFT node cluster and then something would happen. So this operation deployment was disconnected from fabric itself. The second problem is that if you remember, I said that each ordering service node would cut its own block. That also means that this ordering service node would also sign this block on its own. However, this also means that each block was only signed by a single ordering service node. However, we are in an decentralized and Byzantine setting. So there is no reason that a malicious node would promise to actually only cut identical blocks as everyone else because what we really had to do is somehow these ordering service nodes then communicate between themselves and compare these blocks and then cosign each other block because otherwise a single signature in the Byzantine setting, it doesn't make much sense. So in 2018, we have the BFT smart ordering service node. It is decentralized to some point. But as I said, it is not easy to deploy. It's not in a single process because we have this proxy thing and different and the BFT nodes are written in Java. The ordering service node didn't go and there is a management overhead. So then in early 2020, we released a new type of ordering service node. We called it smart BFT because it is based on the BFT smart protocol, which I will explain next. And it fulfills all these criteria, so it's easy to deploy. There is no management overhead and it is decentralized. Questions so far? All right, so let me dive into the library. So the first thing I want to say is the library itself is hosted on an external GitHub organization. It's called smart BFT-go. The name of the repository is Consensus. Can we pick any of these Consensus options when using Fabric? Yes, of course. That's what I'm explaining. Yeah, so apart from the actual library implementation, in 2021, we presented a paper about it in the conference in ICBC. The title of the paper is the title you see here. It is based on the library for Fabric. And the initial contributors for this library were the contributors that you see listed here. And you might think, why am I even showing you this GitHub commit statistics? So first of all, I am presenting this thing. So I want I want to give credit to other people who work with me on this. Obviously, I'm not the only one that worked on it. I did lead the development, but it is a team effort. So I want to give credit to other people. And also I want to mention that this project is, in its essence, it is an engineering project. So it's not like we didn't have some kind of groundbreaking theoretical breakthrough. So it's mostly engineering. And in engineering, in my opinion, if you didn't write any code, you didn't actually make a big contribution. So that's also why I want to show that everyone here actually contributed to this library. So the smart BFT library that we developed, it is a BFT library that it was designed from the ground up. We think with the fabric in mind. What do I mean by that? So it is located, sorry, it is embedded into the fabric ordering service process, as go routines, as a real early part of the actual code base. So you can see here, this smart BFT, if the ordering service node runs multiple channels, you will have multiple instances of this library. And this library uses all other utilities that the fabric ordering service node can give to the library in order to be utilized. In terms of tradeoffs. So you know that in engineering, there are always lots of tradeoffs. So we chose to go to the tradeoff to sacrifice throughput in order to get protocol simplicity, ease of the administration and decent latency. So what do I mean by protocol simplicity and ease of administration? So I mean, these things are not really well defined, because throughput is well defined, latency is well defined. But it's not clear what does mean protocol simplicity or ease of administration. So let me explain that. There is a question here in the chat. Is smart BFT built using Go language? Indeed, yes. Smart BFT is built using Go. So what do I mean by ease of administration? So this smart BFT library, it preserves the semantics of fabric. For example, configuring a smart BFT order is actually almost the same as you do that for raft. And as I said before, raft is very easy to maintain and to manage. It's also very easy to do dynamically configuration. So dynamically configuration is something that is considered in the consensus literature as something complex. So in smart BFT, it is really easy to do that. And I will explain in a few slides why. Reconfiguration means no additional removal, in case you wondered. And also, there is like an iron rule in fabric that configuration transactions must end up in their own blocks. So if a block contains a configuration transaction, this is the only transaction in the block. And there is a ton of code base in fabric that depends on this fact. So this library also honors this fact, despite being actually application agnostic. So the library doesn't actually understand that there is a fabric on top of it that uses it. However, we built the API in a way that allows the library, sorry, allows fabric to express its own semantics and to enforce them on the library. And lastly, blocks only contain transactions that fabric rules actually approve. So unlike the BFT smart protocol library that I showed before, you cannot send like an invalid transaction. And this invalid transaction will never actually get into a block. So what do I mean by easy to integrate and use? So I want to compare with the Raft library. So as you know, there is a Raft ordering service node and as someone that was involved in the development, sorry, in the integration of the Raft library into fabric, it was really not easy because the Raft library API, it works somewhat like this. So you have a command that you ask from the Raft instance, hey, give me the next command. The Raft instance gives you the next command. And then you need to process this command and to actually do what the Raft library tells you to do. The only problem is that you actually need to understand the Raft in order to know how to process the command. And when I say to understand Raft, so there are a ton of fields that you need to be able to go per hand and understand what they mean and to properly handle these fields when you process the commands from Raft. So this is a nightmare to integrate not only into fabric, but into another system unless you are very familiar with Raft. So this small BFT approach that we took is different. So first of all, the small BFT library, it identifies primitives that BFT protocol needs, such as communication, cryptographic operations, like such as signatures, and stage replication. And then these primitives are abstracted into interfaces that are called dependencies. And then from that point on, the job of the person that wants to integrate the small BFT library into its application, the job is done. Why? Because the entire lifecycle is managed by the library itself. So you don't actually need to understand how small BFT works in order to integrate it as a consensus engine. If we take a closer look, we have these three layers. So the topmost layer is the layer of the library itself. We have these components, which I will explain in a few slides. The middle layer is the dependency layer. These are interfaces that the application that uses the library exposes to the library. And this application that uses the library needs to implement these interfaces. And underneath the bottommost layer in case of fabric is the actual components of fabric that are used to implement these dependencies. So now I want to switch over to the small BFT consensus protocol. Any questions so far? Is adding and removing order still done by updating the consensus section of the channel config block? Excellent question. The answer is somewhat, yes. I see we have some disturbance from the participants. I just muted them. Okay, thanks. Thanks a lot. Yeah, let's continue. Yes, so the answer is, yes, it is somewhat by updating the consensus section. But there is a small difference. I will explain it in a few slides. Another question is, are transactions containing smart contracts chain code treated equally? Yeah, that's also a good question. So there are treated equally in the sense that an honest BFT leader would consider them equally. And what I mean is that we don't have any kind of fairness guarantee. So you are guaranteed that the transaction will eventually get into a block. However, you're not guaranteed when. So the smart BFT library does not provide, for example, front-end protection. So if you want to use it to build some kind of stock trading application or something like an auction, you would need to implement an encryption or commitments and the commit reveal yourself. You cannot charge the library to ensure that if one transaction is submitted before another one, then it will get into a block before the other one. But yeah, good question. So let's go into the actual protocol. So the smart BFT consensus protocol is, I would say almost identical to the BFT smart protocol. First of all, the actual agreement on blocks, it works very similarly to PBFT. However, the only difference is that in PBFT, you can agree on several blocks at the same time, which I like to call a pipeline. In smart BFT, as in BFT smart, there is no pipeline. You agree each time on a single block. Now you might think that, okay, why would you do something like that? It's very slow, right? Because a pipeline actually makes your throughput higher because you can you can send several blocks at the same time. You don't need to wait until the agreement on the current block ends and then to send the next block. So although it's true that doing only a single block at a time is slow, therefore low throughput, it gives you simplicity. And why? Because if you only have a single block at a time, you know that the state of your entire protocol is always contained, can always be contained in a single block. So in smart BFT, we actually encode the state of the protocol inside the fabric blocks, and we ride on top of the fabric blocks in order to preserve our state. So when a fabric block is committed into the ledger, so is the state of the entire protocol. There is some common case. I'm not going to talk about it. The common case is related to view changes, but almost always the entire state is encoded in a single block. And this also means that it is very easy to do a reconfiguration. Why? Because once you submit a reconfiguration transaction, you know that there are no other transactions that may invalidate it or maybe they might be affected by it, also done in parallel, also being agreed upon in parallel. The next thing I want to mention is censorship resistance. So a proper BFT protocol should be able to detect if the leader is censoring transactions. So imagine if you have a client, and the client sends its transaction to all nodes. And let's suppose that the leader is replica zero. replica zero is malicious. From some reason, it doesn't like this client. It doesn't want to order its transaction. So the way that this is handled in both in BFT smart and Spark BFT is that once the transaction reaches a replica, a BFT node, the replica starts a timer. When the first timer expires, the replica sends the transaction to the leader node. Now, why do we do that? Because it might be that this client did not send the transaction to the leader replica, or maybe it tried to send, but it failed from sending. So I want to make sure that the transaction actually reached the leader replica. After a second timeout, we say, okay, probably this replica is being malicious. It is censoring this client. And then we start doing another protocol, which is called a view change. By the way, in the BFT smart paper, they use different terms. It's not a view change. It's something else. In our library, we adopt the Pbft official terms. So we have like view and also the names of the messages we send. We also like, they're very similar to the Pbft paper. Yeah, so what is a view change? In a view change, basically, the nodes agree on the next leader. And the way that it is done is basically the nodes first all broadcast that, hey, we want to do a view change. Then every node that receives a quorum of such view change messages, it depends. Sometimes it's a quorum. Sometimes it's less than a quorum, like f plus one, it depends on the variant of BFT. Let's not get into it. Then each node, it sends its last known block to the next leader. This is called this view data message. And then this leader aggregates all this lastly known block and sends a message, which is called a new view message. And this new view message effectively contains all these view data messages. And this is the message that makes all of the other nodes switch to the new leader and then the regular BFT protocol continues. So how does this all work in the library? So let's follow a simple flow. So we have a transaction, which in the library, it is called a request, not a transaction. So a request is received by a controller. And the controller from its name, you can assume it orchestrates all the entire transaction flow. The controller then then incuses this transaction into a request memory pool. Then at some point, the controller reaches to some batcher component and asks for the next batch. The batcher component just turns to the memory pool, says, hey, do you have any transactions for me? The memory pool returns the transactions to the batcher. And then the controller has a batch in its hands. However, it doesn't actually know how to create a fabric block. Why? Because the smart BFT library is application agnostic. So it doesn't know what is fabric. However, this is when we actually use the dependencies. So the controller, this smart BFT library, it reaches into the assembler, which is a dependency, and actually assembler, the thing that implements assembler is fabric itself. So it reaches into it, it says, hey, here is a batch of transactions. Can you construct something for me? Fabric, disguised as an assembler, it actually creates a real fabric block without the signatures and returns it to the controller. Then the controller proposes this block, which is a batch, to a view component. And the view is the component that actually runs the actual BFT programming protocol. So it reaches into the communication dependency, which is fabric again, and asks it to broadcast the messages. It might also reach into verifier dependency. And the verify dependency, for example, when implemented as fabric, it will check signatures of nodes or signatures of transactions. Yes. Also, again, the view reaches into the communication dependency. Also, sometimes we need to sign some message. I will get to that a little later. So again, we reach to the abstracted signer dependency, which is done by the MSP layer in fabric, and so on and so forth. Eventually, the view component, it has in its hands a batch of transactions, which is a block, and signatures on the block. And then it gives it back to the controller, and the controller gives it back to the application, which in the case of fabric, this is the way that signed fabric blocks are written into the ledger, into the fabric ledger. In some cases, the controller might decide to abort the view, or if the controller realizes, hey, it is behind, some other nodes have a higher proposal number than itself. It may reach into the synchronized dependency, which again is implemented inside the fabric. And the synchronized dependency, it basically syncs the ledger underneath the library. While this is happening, the library is like not doing anything. And then it returns with the latest sequence. Another thing is that, as I mentioned before, if actually I want to talk about this a bit later. But I will just say we have a view changer component, that this is the component that I mentioned before, that orchestrates the entire view change here. Yes. Okay. Okay. Now, until now, what I explained to you, the Smart BFT library is very much aligned with the scientific literature. It follows the BFT smart protocol. However, there is some small difference. So first of all, in the literature, if a leader crashes, then how do you detect it? Well, you detect it if there is some transaction, and then you suspect the leader is censoring. However, what if no transactions actually happen? Maybe it's night. And at night, I don't know, maybe clients are sleeping. There are no transactions. So we obviously don't want to wait until people come in the morning, try to buy their coffee, and then the BFT notes will realize, oh, wait, the leader is actually offline. It has been offline for the entire night. Now we need to do the view change, which takes time. So can we do better? And in Smart BFT, the leader actually does something very similar that is done in RAFT. So the leader sends hard bits in a periodic manner. And if the follower notes, which are not leader notes, if they see that the leader did not send the hard bit for some time, then they also try to do a view change. Sorry, they start to do a view change. And the motivation for this is we want to detect the leader crashes proactively instead of reactively. Something extremely important is signature piggybacking. So in fabric, as I mentioned before, blocks need to be signed by ordering nodes. Why? Because a block in fabric might be transferable between different nodes. So a block might be transferred between two ordering service nodes, or a block might be transferred between an ordering service node and a peer node, right? Because peers pull blocks from the ordering service node, and they need to verify that these are actually the blocks, and there is no other block instead. Now if you are in a Byzantine environment, maybe a malicious node will try to forge these blocks. In order to prevent that, what we do is we have the nodes signed over the blocks, and each block contains also signatures from a quorum of ordering service nodes. So how do we actually do that? So during the actual agreement on the block, in the last message of consensus, in the commit message, each node attaches its own signature on the block, and then the commit step involves a broadcast. So this way each node's signature is being broadcasted to all the other nodes. However, what I want to ask you now is, is it really a good idea? Because actually what we're doing here is we're sending a signature on the block before we actually finish the consensus round, right? Because when does the consensus round finish? When can we actually commit this block into the disk? We can only commit this block into the file system once we as an ordering node receive the quorum of commit messages, right? But I just said that we sign the commit message before we, sorry, I just said that we sign the block once we send the commit message, not once we receive the quorum of commit messages. So why is it safe? And the reason that it is safe is that we know that once we actually start the commit phase, we know that we have observed a quorum of prepares. So we know that even in some case of a crash, this block is going to be the block for that sequence, in the sense that there cannot be a different block for that sequence, even if something bad happens. So that's why it is still safe to sign a commit message, sorry, to sign the block and to piggyback this block signature on the commit message. Now, we want to say that if, so we are collecting a quorum, which is like around 2f plus one, right? It depends on like what is f, but usually it's 2f plus one. So we collect 2f plus one signatures on a block. But had we done the signature collection after the actual commit, sorry, after the actual consensus phase, after the block has been persisted into the ledger, if we had done that, then it would be sufficient to only collect f plus one signatures. We would not need like 2f plus one. So it's a trade-off in some sense. Any questions so far? Before I dive into how this library is integrated into fabric, can you go through performance? Sorry. So in terms of performance, I actually, can you remind me at the end of this talk? Because so I have some backup slides. We did some performance evaluation where we submitted the paper. I mean, the paper was published. So the performance evaluation can be seen in the paper. However, the performance evaluation was conducted on a much more, I would say, ancient version of this library. So I will touch upon that in a few slides. But basically we, just to say like briefly, the performance we had when we ordered the fabric, actually like real fabric transactions, it was in a local area network, it was two and a half thousand, like 2,500. In a wide area network deployed across the globe, it was around 1,000 Chandax per second. So it's not much. However, as I would say in a few slides, there is some work being done to improve this performance. And also I would say that the fabric peer itself is anyway slow. So even if a smart DFT had 1,000 transactions per second, it's meaningless because then the bottleneck would be the peer, because the peer can only do like around 2,000 transactions per second with go level DB, maybe 3,000. With couch DB, it's much less. So it doesn't make much sense to have a very smart consensus if the actual transaction processing layer is anyway slow. So someone asks, so smart DFT obsoles to some extent both implementations wrapped in Kafka. No, it does not make, okay, so Kafka is already obsolete in the, so we, okay, I will touch about that. I will talk about it in a few slides, but in the latest version of Fabric, Fabric 3 that we released a preview of in the 1st of September, or the 1st of September, Kafka is no more, you cannot run Kafka anymore. I will not say that smart DFT obsulates Raft because Raft has a much greater throughput than smart DFT because Raft has a pipeline like it can do many batches at the same time. So if you want to run in a centralized setting, if you trust all your parties, not to be hacked, not to do malicious things, you should use Raft. If you want decentralization, Byzantine for Torrents, you can use smart DFT. Another question was, so is changing number of message passing F plus 1 from 2F plus 1 is a new development? No, so it is not, it is not, it's not a number of message passing. It is a number of signatures you need to, you need to collect in order to certify a block. It's not a new development. It's just, it just stems from the correctness of the protocol, or rather it just stems from the correctness argument. So if you collect the signatures during consensus, as we do in smart DFT, then you need to F plus 1. If you collect these signatures after consensus, after the dust settles in and after the block is in the ledger, which also means after you yourself as a node, you have assembled 2F plus 1 commit messages, if you, if you collect the signatures then, then you only need F plus 1 signatures. So it's a, it's a property of how, of when you do it, but it's not number of messages, it's like number of signatures. Is there a way for migrating from older Kafka to smart BFT? The way would be to first migrate from Kafka to Raft and then to migrate from Raft to smart BFT. How to verify that the smart BFT implementation is consistent with theory? Good question. You can read the paper and then you can be convinced. But as I said before, the actual protocol of smart BFT, it follows almost identically the BFT smart protocol. And the BFT smart protocol has been peer reviewed by theorists in academia for many years. So the protocol itself, it should be safe. Of course, you can always have implementation bugs, right? Okay, so no more questions. I will now continue to the integration to a hybrid fabric. So initially, the integration was released into a fork of fabric, not into the official fabric. As you can see here, it's in the fork of the smart BFT Go organization. Then integration into the official fabric began early in 2022. And lots of bug fixing and testing has been done during this time. So to some extent, we benefited from this being deployed in some environment. We finished the integration of smart BFT into the official fabric on the 1st of September, which is not by coincidence the Byzantine calendar new year. The main RFC that explains this can be seen as follows. And I also put the Github profiles of the people involved in this process. Again, to give credit, this is a team effort after all. In terms of RFCs, so there are three RFCs that have been merged that are related to BFT. The third RFC, the implementation of the third RFC has been completed. However, the implementation of the top two RFCs has not been completed. I will discuss it a bit later. Block validation policy. So as I said before, fabric blocks, when run in BFT, they are accompanied by a quorum of signatures from others. Now, how is this done in Raft? So in Raft, since Raft is CFT, you only need a single signature, right? And the question is, how do you verify a block signature? So in fabric, we have something called the block validation policy. In Raft, the block validation policy is the following. So it's like some technical detail. It's called an implicit meta-police, not important. And the rule here means any writers, which basically translates to, hey, any order node from the order organization can sign any block. And it is also sufficient only to verify a single signature from any ordering service node. Now, obviously for BFT, this is not sufficient, right? However, when we designed the BFT, sorry, when we integrated the BFT library into fabric, we also wanted to pave the way for other consensus protocols. So as I said before, smart BFT assumes that we collect 12 plus one signatures. But maybe other consensus protocols would collect a different number. Therefore, what we wanted to do is to be able to convey in some manner a block validation policy that can fit many types of consensus protocols. So the question here is, how can we express this policy in a consensus agnostic way? So we don't want to need to understand the consensus in order to know how to verify a block. And the solution is that we actually encode the policy as a signature policy. So in fabric, we have something called a signature policy. And when we create a signature policy for BFT, you can think of it as a tree-like graph on the right. So at the root of the graph, you have a threshold. And this threshold, in this case three, this threshold determines how many subpolices need to be satisfied in order for the top policy to be satisfied. So the bottom policies, so a signature policy is like a recursive policy. So you can have recursively other child policies and so on and so forth. But in the case of smart BFT, what we do is we put a threshold of 12 plus one or a quorum. We will put a threshold of a quorum at the root. And then the direct descendants are policies such that each policy is satisfied by a single signature from the specific node. So for example, in the policy you see here, we have four nodes in total. Assume we have four nodes in the system. So we only need three signatures, which is the quorum in case of four. We only need three signatures to satisfy the block validation policy in this case. Now, how do we encode this thing? So when we create the genesis block, which is the first block of the blockchain, we encode this policy into the genesis block. And then if we want to add more nodes or to remove nodes, then as we're doing fabric, we issue conflict transactions. And we made it that the smart BFT order itself, it validates the conflict transaction. It looks at the policy. And then if it doesn't make sense, it says, hey, this policy doesn't make sense. I'm going to reject this conflict update. But if the policy makes sense, then it accepts it. Another thing I wanted to talk about is how we actually encode this policy in a consensus agnostic manner. So in the previous versions of fabric, fabric one and fabric two, in order to verify a signature of some consensus algorithm, or in order to know how to bootstrap its communication, you would actually need the consensus specific code. Why? Because all these things, like the idea of the node, the host in the port, which are used for communication bootstraping, the identity, which is used to know how to verify the signature of the node, all of these things, they were encoded in some part of the configuration block of fabric. And this part was opaque in the sense that it is not part of the actual fabric framework, but it's consensus specific. And this creates a problem. Why? Because if you have a, I don't know, if you want fabric to support several types of consensus protocols, then you would actually need to write code that parses each and every one of consensus protocol and verifies it. So we don't want to do that, right? Because it makes integrating new consensus protocols very hard. So what we did is in version three, in fabric version three, this part is no longer opaque, but instead it is a first class citizen of the entire configuration block structure. So now it is located in another part. I will show it in the next slide. And this also means that you can configure all the nodes in the general part of the configuration. And the consensus specific parts, they only do things related to consensus. So the nodes are now found in a place in the config called the consensus mapping. And now you can bootstrap the communication and the block verification policy without actually needing consensus specific code. And this is explained in the RFCs. Lastly, how do we prevent block starvation? So I said several times by now that each block is signed by a quorum of ordering service nodes. So a malicious ordering service node, it cannot forge a block, right? Because to forge a block, it would need somehow to make a majority of ordering service nodes to sign something that they did not want to sign, that they didn't agree to. However, as I said, the peers, they pull blocks from ordering service nodes, right? And also other ordering service nodes, they also pull blocks from other ordering service nodes. So what they can do is they can simply avoid sending blocks. They can say, you know, I don't want to send you a block, or they can say, I don't have any block for you, despite the fact that new blocks have been produced. So this is like block starvation, block withholding, censorship, you can call it many things. And the way that we solve it in fabric is that when a node, like an ordering service node or a peer node, where it wants to pull blocks in a BFV setting, it connects to many ordering service nodes. From a simple ordering service node, it fetches blocks. And from the rest, it fetches block headers and their corresponding signatures. And this way, you can detect if new blocks were formed. Because if new blocks were formed, you will detect that new block, sorry, you will detect that new block headers and their corresponding signatures have been sent to you by honest nodes. Any questions so far until before, sorry, before I go into deployment tips, which is the last part of our presentation. Now the transactions must be sent to all ordering service nodes. Is the peer gateway service supporting that to achieve the transaction flow? Excellent question. And the answer is very short. Yes. There is any validations that orders do before signing? Yes, there are all kinds of validations that they do before signing. So in every step in the consensus protocol, an ordering service node does all kinds of validations. It depends on the step. So as an example, the BFT agreement works like this, right? Like the leader sends a message that contains a batch of transactions called the peer preparer in PBFT. Then each follower node goes through all transactions in parallel and verifies the signatures and also verifies that these transactions are well formed. Now in the case that this batch contains only a single transaction and this transaction is a configuration transaction, then what happens is that this follower node, it tries to simulate in its head this configuration change. If this configuration change doesn't make sense, then it will say, okay, I have received a configuration change from the leader and I don't think it makes sense, but the leader itself, that leader should have also validated it before including this configuration transaction in a batch. Therefore, the leader is malicious and then it starts a view change. So the answer is yes. There are many validations being done by orders before signing. So deployment tips. First of all, you have two guides here that you can read. The first guide is like in general, how do you manage the smart BFT order? The second guide is how you add a BFT order or how you remove a BFT order. Something that I want to mention that is very important is that in the config.tx.yaml section, you have all kinds of configuration parameters that you can configure. This might be a BFT protocol as you see here. All things here you can configure on the fly except the request pool. If you want to change the size of the maximum number of transactions that the request pool, this in-memory pool that I mentioned that contains the transactions, if you want to change the size, you will need to restart the node. Unless, of course, we or someone else implements the dynamic reconfiguration of the request pool as well. Something extremely important is that you need to be very careful when you do a reconfiguration transaction. Why? If you do a reconfiguration transaction and as a result, your reconfiguration doesn't make any sense. For example, I don't know, if you put a request forward timeout one millisecond and like extremely sensitive, extremely low timeouts, then what would happen is that you will lose consensus. I mean, you will lose a quorum. The nodes will start doing leader changes and it will never end. You cannot undo the reconfiguration because to undo the reconfiguration, you would need to send a configuration transaction, but in order for it to be applied, you need consensus. Be very careful when you actually do these reconfiguration changes. What I recommend is scripting it and automating it, that everything will be automated or scripted to eliminate any possible human error. Another thing that you might see, you might notice is that in no place here, we mention F. Like F is the assumption of how many nodes can fail. The reason is, like the whole idea behind Smart BFT is that it should make it as easy as possible for the person to use it. F is computed implicitly from N. If you have 10 nodes in your deployment, then F is three. If you have nine nodes, then F is two. There's a question. In Raft, there was a limitation of adding orders one at a time. What is the limitation for BFT besides losing quorum? Good question. This limitation no longer exists in... Sorry. I got confused for a second about some other limitation. Let me put it this way. You should only do one config at a time. Not because of the Raft limitation. Just because of the fact that you want to take it slow. When you add a new node, actually, I'm looking in the next slide, but when you add a new node, you want to make sure that this node replicates the blocks and then only then to actually add this node. Let me just jump to the new slide. When you add a new node, you first need to give it the config block through the channel participation API. You make sure that this node replicates all the blocks and all the afterwards, you actually add it to the channel, meaning to the config ticks. Why? Because otherwise, if you do it in the other order, in the opposite order, if you first add it to the channel and only then actually give it the config block, imagine that you are running in a blockchain that has two million blocks, I don't know, ten million blocks in it. It will take a while for this node to catch up. During the time that this node catches up, it cannot participate in BFT. Essentially, what you're doing is you have now one extra node, but this extra node doesn't add your resiliency. It doesn't add you availability because this node cannot replace a failed node. That is why I recommend doing it once a time. There was a problem in Raft. The way that we did config changes in Raft was not atomic because in Raft, the way that it broke is we did a Raft grid on the fabric block and then it issued a transaction of a Raft reconfiguration transaction. It was not atomic. In BFT, it is atomic. The problem that I think maybe you mentioned before, sorry, the problem I think you mentioned doesn't exist in BFT, but nevertheless, please only do it one node at a time. Don't try to add two nodes one after another. It might end bad, that's all. But theoretically, it's possible. Just don't try this. The last thing I want to say is the current status of this whole effort. Smart BFT is part of the Fabric v3.0 preview. It is not GA yet. So officially, you should not use it for production, but in my opinion, it is pretty stable that you can start like testing it, playing with it. We have public samples that you can play. We have test network and the test network on a bash. You can play with them. I tried. It actually works. Things left that are still in progress. First of all, this block on reservation mechanism that I mentioned before, this is still in progress. I expected actually most of the code is already there. It's waiting for a code review. I expect it to be merged pretty soon. Other things that we're looking on are performance enhancements. For example, the current batching mechanism, Smart BFT is very inefficient. We have a new implementation of the batching mechanism. We just need to finish integrating it. Of course, for it to be production, we need better test coverage and to do some more comprehensive system testing. So there is a question here. Does BFT also support data recovery and restoration and any limitations associated with it? I'm not sure what you mean by data recovery, but let's put it this way. If you back up, okay. So that's a good question. So in BFT, specifically in Smart BFT, I can only talk about this specific implementation. In Smart BFT, the content of the file system should be identical across all nodes. So what is this content? So, as I said before, the state of the protocol resides inside a fabric block. So each fabric block contains a snapshot of the state, except in some cases, when you have a view change, like you want to change leaders, then we write this view change into the right headlock. Why? Well, obviously, if we're doing a view change, there is no consensus because the leader is dead. So we write this into the right headlock, but this right is deterministic. So it should be uniform across all the other nodes. So theoretically, if you take this right headlock and the fabric ledger, you can use it to restore from a backup. I hope this answers your question. So data recovery, like if the blockchain and ordering service goes down at the same time because of this server, yeah, yeah, yeah. So, yeah, okay, good question. So you can, if you really want, shut down all the nodes at the same time. It should support it because we actually write the decisions into the right headlock. This is, in my opinion, this question is like, this is part of the system testing that anyone should do before going to production. Yeah, it's a good question. Any other questions? Because I finished from my side, from my end. Any considerations on ordering leader election getting stuck? Yeah, sorry. Oh, actually, that's a good point. I wanted to show, so this is performance analysis that we did long ago when we submitted the paper. So in a local area network, the throughput we had is like two and a half, two and a half thousand transactions per second. In a wider network deployed all across the globe, we got like 1,000. So regarding the question, any considerations on ordering leader election getting stuck? So the leader election in BFT is round robin based. So the way it works is that you try to move to the next leader and if you fail to do that within a timeout, then like you try to move to the leader after. So it can get stuck maybe if you don't have connectivity at all between all of the nodes, right? Like if the network is completely down, you will have a view change after a view change after a view change. Eventually, the network should stabilize and then you will be able to elect some leader in some future view. Another question is, can you run separate consenter sets using Raft and some smart BFT for different channels? That's a good question. I can tell you that no, but the reason is not the reason you expect. I think that no because, okay, so the thing is it's actually mentioned in the RFCs. The communication framework that is used in Raft is currently different than the communication framework used in smart BFT. I think in the code we only initialize one of them. So you won't be able to run a channel of Raft and the channel of BFT. But you know, I mean, you can like if it's important enough, if you have an important use case, this is an operations project. So I invite you to contribute. I didn't look to deepen into the code, but the way I remember the code, it is not that far fetched to implement something like that. If you really want, I invite you to contribute to the upstream. And then, I mean, this is open source, right? So the way you do open sources, if you have use case, you are free to contribute your patch, your pull request. And if it's good enough, then other people can benefit from it. Is there a limit of the number of nodes you can run in smart BFT? I don't think there is a theoretical limit, but there is a practical limit. The thing is that in a single leader of other protocols, like smart BFT, like PBFT, the leader node is doing the heavy lifting in terms of broadcasting the transaction. So imagine if you have 100 nodes and you're trying to broadcast the transaction, sorry, a block, and the block is one is, I don't know, suppose 10 megabytes. It's a lot, but let's say the block is 10 megabytes and you have 100 nodes. So the leader would need a bandwidth of one gigabyte, I mean, one gigabyte per second, right? So it's like eight gigabit per second. It's a lot. That is why you cannot actually empirically run too many nodes with smart BFT. However, I can say that in my group, we are actually working on more scalable consensus protocols where the network load is more evenly distributed between the nodes, and these protocols are, in their nature, they're more scalable, so you can run more nodes. Yes. Any other questions? Okay, thanks. Going to ask about the mechanisms to prevent front running and if there were any use cases that you stumbled upon, were you able to solve that issue? Yeah, okay. So good question. So I would say two things. First of all, smart BFT itself does not provide any front running protection. The reason is simply we didn't have the need to implement it. What I know from the literature, the way you do front running provision is usually either you use some kind of threshold encryption scheme, right? Like if you're a client, you use a threshold publicly, you encrypt your transaction, you send this transaction encrypted in encrypted form, then the nodes, they run some protocol to... Yeah, definitely. I mean, it's not different, but I mean, one way of implementing it is using the computational if you have an assumption, right? So yeah, so usually you would have this threshold decryption protocol by the nodes. Another way you can do is you can use up secret sharing. It may be like you can encrypt your transaction with a symmetric encryption key, a secret share of the symmetric encryption key between the nodes, then they reconstruct the key and decrypt. That's one way of doing front running. Another way is doing like a commit reveal scheme, right? So there are all kinds of ways. We just didn't have a use case for that, but I don't know. Again, I mean, this project is open source. You can fork this project and implement it and add the implementation. If the community will see that your implementation is robust enough, good enough, maybe you will get it... What about something like smart order book routing? Again, smart order what? Smart routing for order books. Smart running for... Order books for like an exchange. Like when you have like a market making mechanism and you have the market maker and like if you have like web sockets pulling all the pricing data from the different exchanges and then the market maker that's ultimately making the decisions and then the front running mechanism which would be able to have like the proper oracles to understand like if it's a bad like what to do. That makes sense. I'm sorry. No, I understand. As I said, we don't have any kind of front running protection. It's also not in the roadmap. So I said, as I mentioned, if someone wants to implement like sorry, if someone wants to fork the project to incorporate front running protection and then to contribute upstream, it's possible. I'm not saying that I'm not against it. It's just not implemented. There is another question. Any thoughts on paralyzing leaders in smart BFT like in near BFT? So that's a good question. I can say the following thing. We are currently working on some new BFT protocol that also has something like this. But however, paralyzing leaders in general paralyzing block creations, it sounds very good in theory, but to implement it correctly with dynamic configuration changes, it's not very trivial. So I mean this project, this smart BFT project, its goal was to be to be used in production, in actual production system. Therefore, as I said, it is an engineering project. So the focus here is engineering stability and correctness. Of course, you can also have eventually a production ready mirror or mirror like protocol. Of course, I'm not saying you can't. It's just that it is a bigger effort. Any other questions? Can you share the slides? Yeah. So I will send the slides to Dave after this talk ends and he will show the slides. I am not flying the Pegasus though. Okay. I think that's enough. If anyone has any other additional question or wants to contact me, just google my name, you can reach me on either Github, LinkedIn. I'm also sometimes on Discord. My name is Yakov M. So you can also find me there. Dave, are you there? Sorry, Yakov, I've been on another call. I haven't been able to listen into the last part. Are we all set? Yeah, I mean we're done. Okay, great. Well, thanks for your time today and thanks everyone. I will send the link to the recording and the slides to everybody. So look for that in your email and thanks. Sure. Thanks for having me. Bye-bye. Bye, everyone.