 Better great Hi, everyone. I'm Leo raisin. I'm very excited to tell you about this work It is hot off the press in the sense that the code was just released yesterday so so you are Seeing it for the first time the paper has been on the print for a little bit, but the code is just out So this is joint work with Dmitry Mishkov, Alexander Chepornoy and Sasha Ivanov, and I'm gonna talk about Authenticated and our dictionaries. I'll explain what they are and how we apply them to cryptocurrencies. I want to explain what those are Okay, so here's our motivation. You've got Alice who wants to pay 14 bitcoins to David and she writes a transaction about that and You want to validate the transaction, right? And so part of the validation is stateless you look at the transaction and you say okay There's like the transaction syntax the fields have been filled in right and and they all match and there's a digital signature of Alice's public key And that all you can validate just by looking at the transaction But then the big piece that's hard to validate is actually making sure that Alice actually has those 14 bitcoins or more to give away And that part is stateful and that part you have to know how much Alice has based on prior transactions, right? So maybe you know that public you've Alice actually has 36 bitcoins and that's good enough But of course you're not just trying to process transactions for Alice if you're really trying to validate transactions you're gonna have to have this key value store of Public keys mapped to the current amounts that they have and you're going to have to look things up in that key value store In order to validate transactions Right and the problem with this key value store this dictionary data structure is that it's big And it's growing so maybe today. It's not huge, but it's gonna grow if we if this thing is gonna scale today it's about I want to have gigs in Bitcoin if you serialize it and and You know things get worse if you have a blockchain for many assets because then you have to have one key value store for every asset that you're Dealing with and this becomes a problem So that's the problem. We're gonna try to address So the question is right so you have this state. Where do you keep it? You have this key value store. Where do you keep it? And there's kind of two answers You keep it on disk and then you have slow validation and that has actually been used for denial of service attack forcing you to do disk seeks and So, you know weak laptop isn't gonna be able to do this and the other answer is that you keep it in RAM Because you've bought a lot of RAM in which case you're kind of limiting, you know the the ability of weak devices To validate it and you get sort of more central centralization of a cryptocurrency Which sort of defeats the democratizing purpose of cryptocurrencies at least to some envision it So we'd like to be able to enable validation of weak devices, right? So the observation that has been made sort of before us Is that you really don't need to store this huge data structure, right? As a verifier orbit transaction by Alice you kind of are only interested at the moment in how much money Alice has and the rest of the stuff Yeah, it's there, but at the moment you're not interested in it So you want to prove of this one single fact how much Alice has and and this idea that's been kind of floating in various early versions And was crystallized a bit more precisely by bill white is that why don't we actually use authenticated data structures and Alice will prove as part of the transaction that she has the right amount of money or somebody will prove that Alice has the right amount of money, right? So we will authenticate this key value store and Show that the value associated with pKa is 36 and therefore you can subtract 14 from it and still remain positive So I'll go into a little bit more detail Into how we're gonna do this. So Imagine that we just put alphabetically all the public keys At the bottom of a tree and we hash things pairwise and hash things up and hash things up That's called a merkle tree in case you haven't seen this before and the very root hash of that thing is the merkle root and We are going to imagine this is a block header, right and we're going to put that merkle root into the block header Okay, and then you know if And and for now assume that this merkle root at least can be trusted. I will explain later why it can be trusted Okay, so just live with that assumption for a moment assume it's a it's a true indication of who has how much money the root hash of the data structure Okay and then Right if I want to prove at the moment that I'm verifying the transaction that that Alice actually is sending 14 Bitcoins to David that Alice really has that money. I'm going to send an authenticating path It's the stuff highlighted and how it was supposed to be read, but I don't know what you see down there. Is it can read ish, okay? so Right, we're going to send the merkle path Which are the hashes of siblings on the path from the leaf to to the root of this merkle tree? And then anybody validating the transaction will be able to see that You know that that is indeed a leaf in this merkle tree And it's fairly standard merkle stuff. There's nothing new. I've said yet if you've seen a merkle trees before But the point is that a light verifier can check the entire block of transactions without storing this huge key value store by just verifying these merkle pants and And sort of you get the light verifier with full verify security guarantees Right, so each transaction will include the merkle path that proves that Alice has the correct amount So if you think from the various verifiers point of view the verifier gets the block header Again, I'm assuming that this root hash can be trusted. We'll figure out why in a little bit and Then the verifier will you know take the root hash take the transactions and the proofs for each one do the merkle Verification and output yes or no and this is sort of important because we're trying to prevent denial of service attacks Because merkle paths are short the logarithmic in the size of the key value store, right? The verifier should be able to do this quickly and so the denial of service stack should not be an issue That's sort of an important goal Right go back to the pro for a moment What happens to the prover right the prover is a the prover in this case is the minor the one who's putting transactions into the block A transaction is going to modify values, right? So if Alice sends 14 bitcoins to David Alice's 36 will become 22 is it too late in the afternoon to this math And then the hash values up above will will will change all the way to the root And then of course David will get this money and the hash values up from David will will change And so right a bunch of more transactions will come in and a bunch of stuff will change and so there will be a new root hash And this is where included in the next block Right the minor who puts the transactions to the current block will also have to put the new root hash into it The into the next block And so This is where things get interesting Where something new has to come in the verifier actually needs to check that this new root hash is correct That is essential to verifying the chain The next blocks root hash Has to match what would happen to the root hash when the transactions take place Otherwise, you don't actually know that the accounts data structure is correct Maybe the minor gave herself a ton of money in the process and now has a root hash that says I have a ton of money Right that that's not good So in addition to doing the verification that that we talked about the verifier also needs to compute the new root hash and Match it up and make sure it's correct And it is really this process That enables us to go from one block to the next This is different from the merkle trees You've probably seen because what we need is we need to support not only authentication But also update of the root hash by the verifier who does not know the data structure the verifier doesn't have the Merkle tree Of course the minor has the Merkle tree can update the root hash because that's standard binary tree stuff It's the verifier who doesn't have the whole thing, but still needs to be able to recompute the root hash And the things that we need to support are okay. The easy one is update the value of a key That's essentially the same as authenticating the key because authenticating path will be enough to update You know subtract 20 subtract 14 at 14 that sort of thing The more interesting thing is that you will also be inserting new keys because new accounts will come online And you will also be deleting keys when you get a zero account Let's say you want to get maybe you don't maybe you do depends on your application, right? But you may want to get rid of zero balance accounts for example and in binary trees these operations are interesting Because insertions and deletions if you want a good binary tree that doesn't get unbalanced right require you to rebalance a tree Which means you have to look at a bunch of nodes and move them around and that requires thinking about how are you going as a Verifier without seeing those nodes. How are you going to compute the new root hash? So that's where things get interesting But now of course if we manage to do that sort of thing then we can go all the way back to the Genesis block Verify from the beginning and that's how we really know that this root hash can be trusted Right we go back from beginning and we do this one one step at a time So what you want to be able to trust this accounts data structure that is now authenticated and whose root hash is in The block is to be able to compute the next root hash when changes take place We know what the changes are they are the transactions that are included. They're serialized We know exactly what they are, but we need to be able to perform them without having the entire Merkle tree We need to be able to compute the new root hash without having the Merkle tree that stores the key values okay and Just to emphasize that this is not a typical thing you've seen authenticated data structures before you've probably seen Dynamic authenticated data structures where we change things all the time In like certificate transparency and we record the history of changes in those systems the verifier is typically not checking the new root The new root is given by by by someone else here. The verifier is actually computing the new root So this model is called the two-party model We only have Provers and verifiers in our model as opposed to the more traditional model You've probably seen which is the three-party model The three-party model because there's a third party who gives you the new root We don't have that we have the only Provers and verifiers. Nobody is trusted That doesn't mean we're the first ones to do this There is prior work in this model and actually now I want to go over what's been done Kind of all relevant prior work that's relevant to our story here is based on Merkle trees some variants thereof The difference are the differences are only in how you structure and rebalance the underlying tree Because binary trees if you remember your data structures one-on-one come with different balancing algorithms and that's where things get different for them and Of course the better you balance the binary tree the closer your leaves are to the root and the closer you leaves are to the root the shorter your proofs are and That's the only thing that matters for us is the length therefore the proof length So we're going to look at prior work in terms of proof length The first data structure that that was explicitly working in the two-party model is by papa month in Tama sea From 2007 and it's a skip list skip list is essentially a variant of binary tree the way it was implemented there So I'm not going to go into what it is. Let's look at its performance and there are two things we worry about here There's the updating an existing value, which is essentially just a proof of a lookup and inserting a new value And those proof lengths is what we want to look at they're both 1.5 times h times log n probably good idea to define what h and n are And is the number of things in your key value store the number of leaves the number of public keys Okay, so that's the number of people in your system and h to the length of a hash because we're giving hash values after all The best you can hope for is h log n simply because binary trees are like that. This is 50 percent worse It's not it's not bad, right? It's only 50 percent worse than what you could hope for in the best-case scenario the problem with this approach for our purpose is that requires trusted randomness if Skiplists are inherently randomized if randomness is bad, then you get an unbalanced tree and if you get an unbalanced tree Then you get very long proofs which defeats the purpose because the verify I can get a Nile of service I can get linear length proofs as opposed to logarithmic. That's bad And we don't have a source of trusted randomness unless we come up with something clever because it's the Provers Who could be interested in making the verifiers be slow and the Provers are the ones putting the transactions in So that's kind of the problem with with this approach for us There is an approach by Miller hacks Miller Hicks cats and she from 2014 which works for generic data structures Beautiful paper that can turn any data structure into two-party authenticated one The specific thing they implement is red black trees with a plus at the end plus means all the things have been pushed to the leaves All the relevant stuff is at the leaves And they achieve h plus k log n now what is k k is the length of of the public key The thing that's at the bottom So in our setting It's roughly about the same as the length of a hatch is 256 at least for 128 bit security And so essentially have doubled the optimal rates are slightly worse than skip lists in terms of length The reason that there's the case is because it's generic it works for any data structure But what is considerably worse is the insertion of new things and it's it makes it's worse by a factor of 3 And that's because of the red black tree algorithms They happen to require a lot more nodes. So the verifier to perform the insertion needs to know all these nodes and Therefore the proofs get long And there's a ton of work on three-party solutions going back to know or Nissan that I'm not gonna go over because well They don't work for us. They they don't allow the verifier to compute the new route So where do we come in? We tried very hard to find the correct binary search trees that will work well for this problem And we settled on AVL plus trees. What are AVL trees? they've been designed they were the first binary trees that are balanced by Adelson-Velsky and Landis from 1960s and You cover them in your data structures one-on-one perhaps the nice thing is that for both of these operations They give us exactly H log n. They are optimal. They're as well. They're as good as you can do They don't require trusted randomness They don't have this factor of 2 and factor of 3. They're basically the proofs are what they are The source of improvements given the time. I won't exactly tell you what they are. I feel trees are cool Look in the paper But I want to show you a little bit of implementation results. What do you get with AVL trees? This is the proofs the upper line on this axis is the proof length for the Ethereum try The lower line is our proof length on the y-axis is the tree size on log scale From about a thousand to about a million and you can see that we're three times more efficient than Ethereum proves in terms of length And Ethereum does not give you the ability to compute the new route while we do Okay, so that's one data point. We didn't just compare with Ethereum We implemented a bunch of other data structures that one could hope for Treeps and skip lists and other things and you can see that our line is our line is the bottom one It performs better than all the others the upshot is if you've got about a million Keys in your system. Your proof size is about 765 bytes At high security value 128 bit security Deletion proofs are about 50 bytes longer deletions are a bit harder, but not much longer And this result also improves many three-party papers If you know the cross b wallock paper that does a bunch of three-party authenticated data structures This uh, this improves it by a considerable amount We have one more improvement and that's probably all the time. I'll have to tell you Imagine that we have a bunch of transactions in a block Right, they are going to have to share proofs Because they're going up the tree you're sharing some hashes we can actually improve the result by By by say by combining these proofs together So for example, right, what is the proof for the value of alice? It's the red stuff Notice that there's one hash value at each level of the tree, right? And into the leaves What is the proof for the for bob? It's the blue stuff Again, one hash value of the trees, but we don't need to send Both the red and the blue the purple is kind of coming out purple The purple right the purple hash we save because we only send it once It's good for both At the next level we save again because we save nothing instead of two values Because both the red what was what was red and what was blue can now be computed from the leaves So you save in two ways by combining these proofs some things you just don't need to send in some things You only send once And as a result of this compression So what is this graph showing the black line is no compression? It's a straight line The red line if you just try to apply jizip, which should be good at getting repeated values out Because that's sort of what it's designed for the blue that's the red line the blue line is ours What's on the x-axis is the log of the batch size? How many things are you putting together? On the y-axis is the proof length per operation, of course So once you start putting a bunch of things together Let's say you put two thousand operations together Which is what you need for a thousand transactions because it's two thousand things that change right plus One thousand minuses one thousand pluses two thousand operations If that's your batch out of a million keys you save about a factor of two In the in the proof length Per operation so you get about three hundred seventy proof Three hundred seventy byte proofs per operation on your dynamic authenticated data structure You've got two thousand operations together and it's a very nice Uh curve going down on on log scale going down linearly So I think I'm out of time to conclude you can actually go down from uh, you know Expensive machines down to cheap machines if you allow yourself to use authenticated data structures We have a in the paper. There's more detail. Of course a simulation the black line is the Time per block that you took to process a block the x-axis is is the number of blocks in your chain The black line is what happens when you have to go to the external key value storage to the hard drive or the solid state drive to get your key value associations and the red line Is is what happens when you do hashes instead in verifying merkel trees verifying our proofs The paper is on e-print there's code it's public domain cc0 license do with it whatever you want And it will be incorporated into the WAVE platform, which is an actual cryptocurrency multi-token cryptocurrency. Thank you