 This is normally addressed by databases and luckily there's a lot of innovation and research in this field and the main problem that these innovations try to solve is how to add and modify and remove information from the database while keeping it in a consistent state and doing so at high performance, which normally means in parallel. Modern databases use a lot of tricks to achieve this. They have things like multi-version concurrency control and write ahead logging and shadow pages. These are all clever tricks to ensure consistency while also being very performant. Now, I would love to talk about these things, but the blockchain is different. We can forget about all that. To see why, we're going to look into what type of information software will need to store and retrieve and query. I have some blockchain facts like this hash is the hash of this transaction. This transaction is valid up to a certain rule set. This hash is the hash of a block header, et cetera, et cetera. The interesting thing about all these facts is that they have something interesting in common and what they have in common is that once they are established, they're all immutable and undisputable. They may be at some point no longer be interesting for the software, but they can never change. Now, this is very good news for our database architecture because that makes parallelization concurrent readers and concurrent writers really easy. We just have a bucket of facts that we add to. We don't have to worry about these facts being consistent with each other. We don't have to worry about the order in which the facts come in. We don't even have to worry about two processes trying to write the same key at the same time because we know that if they will do that, they will also be writing the same value at the same time. Well, is this common knowledge? Well, yes and no. Yes, the fact that immutable data is much easier to parallelize and to enable concurrency for. Well, this is well known. But no because the current software, the current implementations, don't utilize this. They introduce, in fact, they introduce a lot of mutable states, a lot of big mutable objects that need to be updated in a consistent way. It starts with the notion of the blockchain. A block has a parent, but two blocks can have the same parent if they're racing to become the longest. So programmers cause with structure a tree, not a chain. And it may be an oddly shaped tree because constantly all the branches are cut off, but it's still a block tree, not a blockchain. Now what implementations do they ignore this? They maintain a main chain, and they keep the side branches separately. And whenever a side branch becomes the main chain, becomes the longest, then a complicated process called a reorg takes place. And this means that the blocks on the main chain have to be undone, and then these blocks on the side chain have to be reapplied onto the main chain to become the longest. This essentially turns the entire main chain into one big mutable object, which must be updated in a consistent way, and which destroys any chance of parallelization. Such a reorg procedure cannot be parallelized. Something related to this is the notion of a UTXO index, and this is an index that stores all the transaction outputs, which are currently unspent. Now in itself it's not a bad idea to index this, because it's what you frequently need, but this UTXO index also maintains exactly at the specific spot in the blockchain, which transaction outputs are unspent at that specific spot. And that means that this entire index has to be updated in a consistent way, again destroying any chance of parallelization. So I think we can do better. I think we can utilize this immutability instead of fighting it. And as we will see, we will have to introduce some mutable data, but the key to writing high-performance scalable software is to minimize the amount of mutable state. So let's start by taking these facts and putting these facts into key value stores. So we have one key value store that stores the transactions per hash, and stores for each transaction its validity state, which rules that it is valid to. We have one store that stores the headers, and for each header, again, its validity state, and we have one store that stores the content of blocks, or that is to say which transactions are in a block. And for this last key value store, we use an interesting format where we store each transaction as an index into our transaction database, and we store for each input of the transaction, we store an index to the output it spends. I will explain it later. This is going to be useful. But first let me note, all these are key value stores, but we certainly don't want to be using general key value stores for them. Because stores like LevelDB or RocksDB, they are designed specifically for immutability. And besides, they are designed to handle any type of data. They can handle like a few gigantic objects or billion tiny objects. And for our purpose, we know exactly our access pattern, how we access it, and how it's going to be immutable. But we also know exactly the type of data that's going to go in. In fact, for a large part, we know exactly which data is going to go in, which is a really awkward luxury to have when you're designing storage. Because normally, yeah, you have no clue what's going to go in if you write something generic. So we can use a much more simple approach than that these stores use. We can use a simple store with one concurrent, large concurrent hash table as root and append only the data. And it's a very simple design, but such a simple design will outperform any generic key value store. So what can we do if we have stored this this way? What can we query? What kind of questions can we ask our storage? Well, we can clearly ask for transactions or for the validity state, but we can also answer questions like, this transaction, is it currently unspent? And to do so, we load the block from the current tip, and then we start scanning backwards to see if we find the output that we're trying to spend. And if we find it, then we know we have a double spend. If we find the transaction that contains the output, then we know that it's currently an unspent output. So we just have to load the blocks, and if it's not in the block, we load the previous block and we scan backwards, and that way we can check if a transaction is unspent at the moment or if it's spent or if it doesn't exist at all. And it may seem slow to actually scan these blocks, but the thing is that we store them in a very compact form, just 64-bit indexes. So we just have to scan this sequentially, and scanning something 64-bit value sequentially is something that computers are really good at. It's just scanning cashlines, and we can do it for every input in parallel. So we can ask if a block comes in, we can ask for all the inputs in the block, are the outputs that are being spent here, are these currently unspent? Is this transaction valid at this block, valid at this point? So it looks that we have the basis already of a store that can handle all the queries and all the thing that we need for our software. But as promised, we do need to introduce some mutable states, some mutable data. And the first one is rather crucial, and that is because I already just said we need to find the current tip. We can find the current tip. The current tip is the block header with the most accumulated work. We call it the current tip. We can find it here, but we have to scan all our headers. So instead, we want to maintain at each time consistently an index pointer into our header table that represents the current tip. And this has to be updated at each block. For now, this one mutable variable also serves as a complete replacement for our reorg. So instead of having a chain and then having to undo all these blocks and redo all these blocks, the only thing we have to do if another side chain becomes the longest is just move our current tip. That's all. So now we have a fast implementation of a store. It can be used, but it's not yet fast enough. And this has to do with the scanning. I said the scanning is really fast. This is true. And most transactions spend recent outputs. This is just how it works in practice. In practice, most transactions are spending something recent, but some transactions are spending some output that already existed in the chain for many years. And using the scanning model, that would require scanning all these hundreds of thousands of blocks backwards to see if the order is right, if it's unspent at this point. So we have to speed that up. And for that, we're gonna introduce another index, another mutable data structure. And this is the spend index. And this is a very small bitwise index of all transactions and outputs, which maintains at each point in time exactly, or at one point in time exactly, which transactions are open at that moment and which outputs are spend or unspent. And what the trick that we use here is that this spend index, instead of maintaining it at the current tip, we let it trail by a few blocks. And the benefit of letting it trail is that you, well, we first start scanning blocks, and then when we hit this spend index, then we do a lookup. And the benefit of that is that we don't lose any of our concurrency. We can still do full concurrent writers and full concurrent readers, as long as we ensure that we always read from the spend index for a block that we already have completely written. So we don't lose any concurrency with this mutable structure. So now we have a full mode that is really fast and really low on resources, which we can use for all our queries and questions that we need, at least almost. We are still missing some queries. What we're missing here, it has to do with the mempool. This model doesn't have a mempool. When a transaction comes in, it is immediately written to disk as an unconfirmed transaction. And I think this is a pretty good idea because these transactions are actually quite bad candidates to keep in realm, your unconfirmed transactions. Because you're gonna have to write them anyway. You're no longer gonna need them and you're gonna have to write them anyway, unless you would opt for some kind of transaction auction model. But I think we can agree that would be silly. So we have to write them anyway. And so what it means is that they're written and you can always, that works for the note. You can look them up, you can see if they're unconfirmed. You can relay them. But the query that you cannot do is ask for the set of unconfirmed transactions. Because that you have to scan all your transactions to know which ones are unconfirmed, which is way too slow. So in order for minors to create blocks, they have to, well, they have to be able to, they need an index into this transaction database of everything that's unconfirmed. It's essentially just a map index, a map reduce index, which maps, I'll take it to there, yeah. Map reduce index just maps a transaction into a certain priority. So they can directly always have a mutable view of what at each point in time are the unconfirmed transactions that they could put in a block. And it's also actually not only used by minors because full nodes also need this index for fast block propagation. Fast block propagation, like compact blocks and extender relies on predicting what a miner is gonna put in a block or a grapheme that relies on that mechanism. So then also full nodes, they need to have an index, a priority map where they know this is going to be in a block. So now we have a model which does introduce some mutable data, but which keeps the mutable data to an absolute minimum. And that means that we have vastly reduced the amount of IO operations. Because with immutable data, you always have to write everything only once. And it also means that it makes it much easier to parallelize. With this model, you can easily parallelize two blocks that come in at the same time. Or you can parallel verify every transaction within a block. So I believe that a model like this is key to, well, to scalability and to, well, to larger blocks also. So this is the model that we're using with the BitCrust node software, which is full node software in development. We also, we have the storage layer done and we have working on a network layer. And it still work in progress. I wanted to share this nonetheless, not just to share what we're doing with BitCrust, but also because I believe that this idea or the individual parts of this idea are also very useful for other projects or other implementations or other developers. Thank you. Okay, wow, all right, that was huge. Okay, so did you just present us BitCrust? Yeah, the model of it. The model of it? Okay, you guys better look up BitCrust, I just Google that thing. I'm like, okay, it's written in Rust. All right, okay, yes, questions, let's go. All right, thank you. It better be good because that was great. All right, awesome. Hi, Tomas, you've had some pretty interesting ideas around doing UTXO commitments and fast syncs as well. And these sort of seem to be contradictory ideas. You could maybe give your viewpoint on how UTXO commitments work in this model or if they even matter, that'd be great. Yeah, I don't think they're contradictory per se. You still have to conceptually, of course, the idea of UTXO at a certain point. And the UTXO commitment that I've now developed and that we're now being reviewed, that is something that is maintained separately from your storage. And it can be used, that can be done here also because you can do it for a block and also the fast sync method that comes out of it can also be done because fast syncing in UTXO set is essentially nothing more than fast syncing the prune blockchain. You're essentially just fast syncing all the data from the blockchain except everything you don't need. So now that doesn't bite it. But I do want to comment that there are other ideas for next version UTXO commitments. Some of which are really maintained separately from the storage. But some of which are, and these may be the most efficient ones, they are really integrated into the storage. You can store the UTXO in a way that it directly gives you UTXO proofs. And I think, although those may be the most efficient, they would buy this model and they would, the drawback of inventing something like that is that it removes the flexibility of improving storage. So I think, yeah, for now we're good, but there are some ideas that would buy it, of course, yeah. As there are no solutions or only trade-offs, we've looked at many of the benefits of this approach. What do you feel are the main costs or vexing issues of the approach? I think that individual parts of this approach are just improvements that can be done. I think the difficulty of this approach is that it's quite a different model. And that means that it's going to be quite some time and difficulty to get it as stable. And it's not something that you can easily integrate into the existing code. So, yeah, it's not going to be easy to write a note with the same quality as we currently have, just because so many things have to be done differently. I think that's a big drawback. And yeah, there could be other drawbacks specifically with the model. Yeah, I don't think, I think these are just generic improvements. So I think it, I don't know what the drawbacks are specifically then. I think you spoke a little bit there about pruning, but I think I missed that bit. Can you speak a bit about how pruning works in this model? Or if it's possible? Yeah, pruning doesn't really change much. You can still prune everything. It's also if you have the UTXO set that you're basically which maintains only the unspent transaction. And you automatically prune every transaction output that is spent because it only maintains the transaction outputs that are unspent. And here, we maintain all the transaction outputs and you have to actually prune in the sense of having a process or threats running in the background and deleting everything that is too old. But then you end up with the same amount of information that you need to store. So it doesn't save any disk space. The model is generally the same, you prune all data. And yeah, you leave leftover with what you still need. The memory that's been of that prune data, I guess, can be overwritten or you don't need to rearrange, would it allow you? You don't need to rearrange it, no. There's two options, you can use a file. You can just use numbered files, which we're using. But you can also use one file, zero it out and then use a trick called truncate, which means that it's no longer taking disk space. This specialized key value database, it seems valuable to maybe more other projects. Is it very integrated into BitCrust or can it be reused for us, for example? No, yes, certainly. That one is separately built and it's a separate crate. It is in Rust, but it can also have a C interface. It's actually written with the individual modules of BitCrust are written with the idea in mind that they can be individually also integrated with the SSE interface. So yes, that's possible. I have a question, you did not mention any performance results. And on your site, you said the performance results were removed. Anything to show us on performance? No, last year I only had like the validation model for this with how the spend tree works. And then I saw like quite a good improvement over the existing data. And this is me. And this is an alarm that I have to go up. Okay, I don't have any new and that's unfortunate. I wanted to have full integrators network layer already ready, but yeah, I can't present it yet. So I don't know. Any other questions? All right, well let's go. Thomas, thank you very much.