 And go. Hello, everyone. My name is Dmitry Kovratovich. I'll be giving a talk about zero-knowledge hash functions. But unfortunately, I'm unable to attend lightning talks today, so I'll give a very short announcement in the very beginning to draw someone's attention. So you might have heard that last year some really fancy refiable delay functions were suggested based on RSA. And they are planned to be used in Ethereum. And this is actually a very important part of new Ethereum designs. And the Ethereum Foundation encourages analysis of new assumptions that were used to build these refiable delay functions. Well, if you want to write a good paper, if you want to analyze some assumptions, if you want to just bring up new ideas, they all will be rewarded. Just go to the website rsa.cache. OK, now for the talk. So this is a joint work with my co-authors, Arna Plorenzo, Christian Marcus, Sebastian, from many parts of the world, University of Bristol, Gratz, University and myself, from Ethereum and Dust Network. So this is about how we build hash functions that allow very fast set-inclusion proofs, but not only set- inclusions proof also, other stuff. OK, so why do we need different hash functions for zero knowledge? So first, a bit of an example. How private cryptocurrencies work. So how do cryptocurrencies work? Where are we going to hide how much we spend and who are the senders, who are the recipients? So it basically starts with someone's sign in a transaction, which has some secret inside. And this secret is hashed. Suppose that secret is k and there is some metadata there. And the result, the signature plus hash, is added to some miracle tree. Miracle tree is to make possible future proofs. So you add the miracle transaction to the miracle tree publicly, but then you are able to spend it privately. How do you spend it privately? Basically, you prove that after it has been added to the miracle tree, you prove that you know there is a leaf in a tree for which you know a secret. But you don't disclose which leaf it is and what the secret is. And to prove that, you have to provide a miracle path in a tree, but you have to provide it in zero knowledge. And this path consists of invocations of several hash function calls. And as heavy the function is for zero knowledge. As expensive are proofs. So no matter which kind of zero knowledge system you use, Pinocchio, Sonic, Plonk, Starks, Bulletproofs, these guys can be very expensive. And if you use hash functions which are not really suited for such proofs, you will have the effect of what Zcash had in the very beginning, where such proofs took 40-something minutes to be constructed because of Shatter 56. So traditional functions that we have been building for years are not so suitable for Snarks, the zero knowledge pruning systems, or Starks, or other. Why exactly? Because such proofs are constructed such that we express the proof verification algorithm as a circuit. And this circuit is not over 32 bits or 64 bits, which are natural for our computers, and over which such hash functions have been designed. But these are usually different objects, different. They are prime fields or binary fields. And the thing is, if you take a regular hash functions and you transform it so that it can be expressed over a prime or binary field, it becomes much more expensive in terms of the number of gates or the degree of polynomial if you encode it with polynomial or something like that. So the proof generation time is one of the most important metrics which we're going to use. And this effect of 45 seconds for Zcash proof, this was because Shatter 56, when it was put into a prime field, it was very, very expensive. The circuit is very large, contains a lot of gates. And we basically want to minimize that. We want to make the result in proofs much faster. Well, not, of course, not exponentially faster, but faster by a very large constant. So what kind of hash functions we need? We need a hash function that operates in a big prime or a binary field, so it can be used in different fields. Maybe it should be a family of hash functions. And it should be very good in certain metrics like circuit size or degree size product. And of course, it should be secure. So it should be collision resistant, it should be pretty much resistant. The thing is we haven't been designing such hash functions for years because we are so much concentrated on the performance on the 86 architecture. There have been some suggestions. For example, there was an idea to use so-called Pedersen commitments for hashing, where you basically have a couple of elliptic curve points. And if you want to hash two inputs, then you interpret them as integers, multiply one point by one number, second point by the second number, take some x-coordinate of the result. But the result in hash, even though it can be proven secure to collisions as long as the discrete logarithm is hard on this elliptic curve, it has still many problems. There are homomorphism properties. There are premish resistance, not ideal length extensions in the current form and so on. So these guys are not really great for the purpose of zero knowledge friendly hash function. And one of the first designs several years ago appeared, which was a candidate for better zero knowledge in the hash functions, what's called MIMC. And MIMC operates very, very simply. It's one of the simplest design you can imagine. It's basically, suppose you have a field, and you have a key, you have input x, and then you only sort, you add a key to the input, then you put it to the power of three, then you add, again, a key plus some constant, constant different every time, then you put to the power of three again, and so on, and you repeat this 100, 200 rounds or something like that until the degree is high enough. So basically, it works very nice and properties are good, but it's a bit non-trivial to generalize to a wider state. So currently it takes just one input of a field, and to make it a compression function is somewhat non-trivial. So, and we, our designs are called Pasadun and Starcut, two names because one is for prime fields, the second for binary fields, but the zero-knowledge space, zero-knowledge research evolves so quickly that when we designed it a year ago, there were promising zero-knowledge proof systems which operate in binary fields, but now they're not popular anymore. So most of the zero-knowledge proof systems nowadays operate in prime fields. That's why Pasadun is more important and I will talk mainly about it. So, we decided to start with the mode of operation and the sponge mode of operation is one of the first most popular than simplest ones, and it operates very simply. You suppose you have a secure permutation F, permutation, I mean, not that it permutes inputs, but it's a objective transformation, and you, we work in filled elements, meaning that the rate and capacity R and C are not bits anymore, but they are filled elements, and we count an integer number of filled elements, two, three, four, five, not that many because fields are quite big. So, for example, if you have 256-bit, filled or a bit smaller, then the capacity is just one filled element. And the idea is that if you were gonna hash a long message, you split it into parts and you add it repeatedly, you add chunks into the internal state, not touching the capacity part. Well, the advantages of this approach is that there is no key schedule. You can put the key into the input. The analysis is simpler and we know how to design good permutations. At least we know it for the binary case, for the bit case. Let's see how we can make it for prime fields. For prime fields, we outlined the following, maybe a bit sophisticated approach, but still it works very nice. So it's still substitution permutation, so S-boxes linear transformations, S-boxes linear transformations. So you are all familiar with this approach because it's very traditional, it's a matter of crypto, in AES you have the same, in Shathurini you have the same, more or less. And basically here what we do, the only non-trivial thing is that our rounds are not identical. We have S-boxes applied to each element of the field in the beginning and in the end, but in the middle we don't need S-box on every element. We take only one S-box. And why we do that? It's a very important design decision because apparently most powerful attacks are algebraic attacks and we want to protect from them. We want to increase the degree of a polynomial that the polynomial, if we express the outputs of the permutation as a function of inputs, they basically want that the degree of all this output wires is very high. And apparently if you drop S-boxes from here, the degree remains more or less the same. And by dropping this S-boxes, we can save on a number of constraints and our circuit becomes much smaller. So what are S-boxes? S-boxes are like in MIMC, they can be their power functions. So they can be X-cube, but well X-cube is not a bijection over prime fields which are given to us because of elliptic curves we're working in. So usually X to the five. You can also use the inverse, but inverse has a subtle issue with when you design a constraint, you have to put an extra constraint for zero. It might be not that convenient. And apparently there are attacks on this construction which works a bit better for the inverse. That's why we don't use inverse. We mainly recommend X to the five or X to the seven or exceptionally in some M and T curves. We have to use X to the 11. So, yeah, the motivation why we use one S-box, I already told, so apparently our luggage of crypt analysis we have accumulated over the recent years is mostly inapplicable. So all this differential and linear attacks they stopped working after just a few rounds because S-boxes are very big. But the thing is now the most powerful attacks are algebraic attacks, attacks which are interpolation attacks or Gromner basis attacks or higher order differential attacks. So they become much more powerful. They work on a longer, on a bigger number of rounds. And what is more important, we are not very good at estimating their complexity. In particular, the Gromner basis attacks we still kind of revise it and the algorithms that are the most effective for Gromner basis attacks, they as well their complexity estimates are not very precise. That's why the number of rounds we put in here, they're based on some pessimistic, optimistic estimates of how this algorithms work, but of course they can be revisions. So what else interesting to say here is that we have design that, and most of that analysis that we have conducted, it applies both to the prime and binary case. But apart from that, we have to provide really many parameters because the functions, they can be used in a very different number of contexts and there can be different proof systems. And actually if you switch a curve, you also have to switch hash function because the underlying field is different, then the constants are different and maybe the number of rounds are different. But apparently it all works out. So we support a really big number of modes. We support Merkle trees explicitly of different areas. We support variable length hashing, constant length hashing. We support even authenticated encryption and of course if you put authenticated encryption into a snark, you get some kind of verifiable encryption. So this can be also used for verifiable encryption context of zero knowledge proofs. Many implementations available, third party ones mainly. So what else? Yeah, I've told already that Poseidon is for prime fields which are now most popular star cut is if you ever have to work with a binary field, you take star cut, it's a bit different because some attacks start working for binary fields and the design is just a bit different. Yes, so sometimes there are questions how to use sponges and Merkle trees and basically we have already outlined it in the paper, but just for completeness, it works as follows. So basically you apply a permutation to the input where the capacity element is fixed and there are some message input and you basically take just one output element, you take it from all the branches, if you'd like three to one tree, you take one element here and they become input elements to the next a sponge call and it goes further. The only thing is that you have to be really careful here because you want to have the main separation between Merkle trees and regular hashing and maybe you also have to take care that some leaves can be void leaves and that's why we have played with the capacity element just a bit to have the main separation right there. Well apparently when you work with zero knowledge proof systems it becomes not so easy. We cannot put many elements together in one element. We cannot like for example put nones and part of the key into one field elements because when we prove that we know the key it becomes non-trivial to separate one from another. It puts additional constraints, it's just not convenient. So apparently we have to allocate extra elements for different parts of inputs. So for zero knowledge we are adapting it constantly for zero knowledge proof systems which appear and appear and appear and last year they have been like no, like 10 new systems have been suggested and well there are very technical things how we make it possible in Snarks. So in Snarks we create constraints and for these constraints we compute some linear combinations of elements which is also because of matrix multiplication and eventually we have as many constraints and there are S-boxes, relations through S-boxes. So how can it count? For example we can compare it explicitly with a Pedersen hash and if you for example have a Merkle tree of a billion of elements and this is quite natural if you work with a big number with the big sets like a set of transactions or a set of people or certificates whatever then the building is very natural and then if you depending on which tree area you use you can go down to 4,000 constraints in the Snark metric whereas for example Pedersen hash they would need 40,000 so like 10 times more. This effectively means that our hash functions will be 10 times faster than Pedersen hash for Merkle tree applications. So that was for Snarks, for regular Snarks like Pinocchio growth esteem. This are for bullet proof some numbers. That's actually very interesting. So when we submitted paper to this conference bullet proofs were still a popular design and we calculated numbers saying that we can make proofs verifiable within one second and bullet proofs is a very interesting proof system because it doesn't need a trusted setup but they are now I think going without a fashion so new proof systems like Plonk they're taking their place. So you see that we have different instances of hash functions for different curves and that's why those are the different numbers. Of course for 381 prime fields the proofs are taken longer to create and verify by a small factor. So we can adapt our system to many proof systems or adapt our hash functions to many proof systems that have just appeared like for example we can modify the Plonk itself to make it suitable for our hash function and we can get like 25 or something increase in performance because we can make polynomials much smaller and we don't need a wire layout as sophisticated as it is for generic Plonk. So we can with our hash function that you can have much faster Snark proofs. So there is also a very interesting system appeared recently called Redshift. It's interesting because it's both post-quantum and trustless because it doesn't use a trusted setup built entirely on Resolomon commitments and using some stark ideas. So the disadvantage of Redshift is that the proofs are very big, well quite big. So for benchmark for frequently benchmarked circuits of one million gates, the proofs are one megabytes but we can make our hash function smaller even for Merkle trees, I made some estimate and if we still want Merkle proofs for sets of one billion elements then we can have proofs as small as 12 or 15 kilobytes and this is post-quantum and this is trustless. So I think it's really funny and should be really fast. So we don't have, we haven't programmed for Redshift yet but I think it's a very promising technique. So, yeah, there are some ideas how you can do it for Starks how you can express other regulations between S-boxes. I think I will skip that. The one last important thing I would like to cover there's very recent, so some people have asked us to create an encryption mode and it turns out that if we combine, if we, if the shared key is created with some sort of Diffie-Hellman, Elliptic Curve Diffie-Hellman, then we can combine Diffie-Hellman with some alternate encryption mode like Spontrap and put our hash function, our permutation inside and effectively we can have a very fast and compact design. So basically the idea is that if a recipient has a key on Elliptic Curve which is built on the scalar field where the proof system operates so basically there are two curves. So one curve is on which is used by a proof system and another curve is called an embedded curve on which people have keys. And then basically the idea is quite simple. So first user create a shared key point using Diffie-Hellman on this curve and then you use this, you put this key point as input into the Spontrap right here. So where this key, basically this is the, all this input is here. So nonce, the key and the length of a message are just inserted into the Spontrap and then after one call you can use it as a key stream and after one last message call you output, after one last message you output a tag. So yeah, and these guys can be put into a snark circuit and if the curve is embedded then you can, then you use standard machinery for Diffie-Hellman circuit and this all works so with rather good set of constraints like I don't know, you can like several hundreds per call, all this including Diffie-Hellman step. So there are several projects that already use our design. Sovereign uses for revocation checks. Dust network uses for proof of stakes, you don't know the private proof of stakes. There are also two other guys where I've seen that they're using but I don't know the details, Luprink and Coda. Yeah, feel free to join and we have a number of instruments to help you select a good hash function. So there is a website, Poseidon-hash.info which will grow and will have more directions how to use the hash function properly. There are the parameter generator, it's like almost finished, I haven't finished it right by the top but I will add the link into the Poseidon-hash.info where you can input your curve, the output you need, the security level and that will tell you the number of rounds, the constants, the matrices and everything, maybe even test vectors and some links to sample code which you could use. Yeah, thank you for your attention. Any quick questions? I have a couple. So in your sponge construction, on your picture, you had two complete levels at the front and two complete levels at the back which made four. Where was it? There, yes. So you said rounds with full S-box layer and rounds with full S-box layer. So you have two and two but in one slide you seem to say eight. Is it really two and two or are they just simplified? This is simplified. The other question is can you use this in picnic signature type constructions? Yeah, yeah, yeah. Okay, yeah, yeah. Is it any better than picnic? I'm not that good at this signature thing but, yeah. What can you say about the construction of A? Is there more details that share or is it just a huge thing? Sorry? The construction of the A matrix. Well, it's just, I would call it the Cauchy matrix basically, just an MDS and every element is one over number of index of row minus index of columns. Standard MDS, you can use any MDS here. One last question. If you could go back to your benchmark slides. You had the one for comparison to rescue and Pedersen hashes. Yeah, this one. So, could I ask what Pedersen hashes you benchmarked there? Henry, could you take a photo? Can we quit? Yeah, the question is what implementation of Pedersen hashes did you use and what did you benchmark? I didn't benchmark. I just took this number from the Zykes spec. Okay, thank you. That's thanks, speaker and everyone from session again. We have some important information. After the next session will be the lightning talk session. Now, anyone who has not turned up at Real World Correction before will have no idea what we're going to do. So please listen. So what we're going to do is the lightning talk session, you will queue up here at the end of the last session, of the next session and then you will be called onto the stage to give a, depending on how long the queue is, a talk with no slides. You have no slides, you just have to speak and I will cut you off, depending on the length of the queue. So more people in the queue, the less time you have, less people in the queue, the more time you have. So if you want to game the system, get all of your friends in the queue to start with and then it's very long and I'll make sure everyone has a very small amount of time, then you get them all to leave and then you at the very end can have the entire time if you want. So you can game the system, it's adversarial, but that's how we're going to do it, but you will queue from this side of the stage by these steps, please. Thank you.