 Hi everyone, so my name is Yusef Hosnir, I work at Consensus on Gnok, and I'm gonna try to set you Gnok. Well, actually I'm gonna talk about some of the algorithmic optimizations we have in Gnok that makes it fast. So we are a team of five so far, and we are building two libraries in Go. So one is called Gnok, which is an easy-to-use open source library for Snarks, and the other one is Gnok Crypto, which is a cryptographic library in Go. So Gnok under the hood is basically composed of these components. So you have a front-end when you write your circuits, a back-end for pro-generation and pro-verification, and then underneath you have this Gnok Crypto library, which is pairing-based cryptography on the elliptic curve-based cryptography and finite-field arithmetic library. So we have all the stock written in Go, no dependencies. So in the back-end we have, so far, Go 16 and Planck with two polynomial commitments, so KZG and Fri. We have in the front-end a standard library with MIMC, ECDSA, EDDSA, banks in circuits, BLS in circuits. And we have both APIs for native-field and non-native-field arithmetic. And in Gnok Crypto we have a bunch of elliptic curves, BN, BLS12, BLS24, two chains with BLS12, BLS24, Twisted Edwards. We have fast multi-colour multiplication, fast pairings, KZG, Fri, PiluCaps, and we have already recently implemented SAM check protocol, GKR. And in the finite-field arithmetic we have different sizes ranging from 768 to 256, to cold deluxe as well, and it performs very well on different targets. So the usual workflow of snogs, so same in Gnok. So we have a circuit that we write in playing goals, so it's not in a DSL. And then we compile it to some constraint system, and then you call setup, prove, and verify API. So it is fairly easy to change the elliptic curve you want to use and the constraint system you want. So we just change, for example, here, BN, 2504 by BLS12, and L1CS, which is code 16 by SCS, which stands for sparse constraint system, which is plot. So we have a playground where you can play with it in the browser with code 16 and plot, to see how you can write your circuit in Go. And then you can download the constraints and look at what they look like, both in Go 16 and snog. So why Gnok? So we have no DSL playing goals, so no dependencies at all. We compile large circuits in a few seconds. So when you write your circuits in playing Go, you can use Go Toolings, standard Go Toolings to debug, test, benchmark your circuit. So we developed a cool thing, which we call constraints profiler. So by just adding two lines of code, you can have this figure here, where you see in each function how many constraints does it consume. Yeah, and several packages are already audited by Algorand and first tested by Geth for one year, I guess, now. And there is one code base that performs well in different architectures, CPU, mobile, wasm. So in the mobile, we are 70% faster than the baseline in the Z price. So the question is why Gnok is that fast? So here I give an example of Go 16 snog prover on BN254 curve. So this means MSMs computations, FFT computations, and parallelism. So I took the examples of two of the most used libraries. So ArcWorks, in Rast and Cerecom with RapidSnog back in the C++. So two kind of circuit sizes. So one is 65k. The other one is 8 million constraints. So this is a AWS AMD machine. And we see that Gnok performs very well for both kind of circuits. So there are some libraries that are heavily optimized for large circuits, for instance. Gnok is optimized for both. So this is for the prover side. For the verifier side, which is Go 16 snog verifier. So which is mainly pairing computation over BN254. So again, we see that on the same machine, Gnok is very fast. So it's a bit more than one millisecond on this machine to verify, to the two computer, to verify a probe, which is mainly a multi-expander, a pairing computation. Yeah, so it's a PDF. It was interactive, but anyway. So the question is why Gnok is that fast. So remember this diagram from the very beginning. So we have a front end, back end, and Gnok here too underneath. And the question is, so I will go through some of the algorithms that are highly optimized in Gnok. So we start by writing a circuit C. We generate the proof by C, which means that we will call Gnok to compute FFTs and mostly MSMs over. So yeah, so I'm giving the example of a Go 16 proof, just to concentrate over the algorithmic optimizations I'm using at 2J, so there is no wrong field arithmetic. But it works also with the wrong field arithmetic. So it calls an MSM over BLSY377, and then we write a circuit C prime of the proof pi. We generate the proof pi prime of the circuit C prime, so the aggregation, which means that we will call Gnok to compute a multi scalar multiplication over the BW6761 curve, which is the outer curve. And we verify the proof, step five, which means that we will call Gnok to compute the pair. So I will be talking about the optimizations in those points that are in boxes. So mainly MSMs and pairings and writing circuits. So MSMs over BLSY377. So this is a graph from the Z price submission. So I'm comparing it to Arcworks because it implements BLSY377. So we are up to 47% faster for a point surrounding from 2 to the A to 2 to the 18. So this is tested on mobile, so on a Samsung Galaxy. And we have two, so here I have two versions of Gnok, one using twisted Edwards curves and the other one is using short-wave stress curve. So the one is 40 to 47, and the other one is 20 to 35% faster. And the question is why? So both implementations do not use pre-computations but use parallelisms, but in a different way. I'm not going to talk about this. So we use two NAF buckets, which reduces the size of the bucket by twice. This is not using Arcworks. But most importantly here is the curve form and the coordinate system. So we prove that any inner curve can be written in a twisted Edwards curve with a equal to minus one. And we extended our coordinate system, so we call it custom X, Y, T in order to make computations faster. So why is that? So our color B bit MSM is an MSM of size N with scalars of B bit. So all libraries implement this variant of Pippenger, which is bucket list method. So it goes in three steps. So we'd use the B bit MSM into C bit MSM for some fixed C. We solve each C bit MSM efficiently, and then we combine the C bit MSM into the final B bit MSM. So the overall cost is this one. So minus one in blue is when you use this NAF encoding of the scalars. Otherwise, so you have two to the C. But this overall cost can be explicitly broken into what I call mixed re-additions, additions and additions and doublings. So for large MSMs, so what's the most important is the mixed re-additions because they scale with the number of points N. The others are constants. So if you look at all the shapes of the elliptic curves and all the coordinate systems that are over there, so you can look at EFD webpage. So twisted edwards with A equal to minus one with extended coordinate system have seven multiplication for a dedicated addition compared to, for example, 11 in arc works with Jacobean coordinates. So what we did is basically, when I say re-addition is, so those points are re-added in the buckets. So they are the same. So when you look at this unified addition, so which means that we will not have any FLS branches to handle exception case. So it is one multiplication plus, but the multiplication is a multiplication by a constant, which is two to the D. So we come up with a custom coordinate system, which is instead of having the tuple X and Y, you have Y minus X, Y plus X, and two times D times X times Y. So you can do unified additions without branching at the same cost of the dedicated addition to seven multiplication. Yeah, and this is basically one of the optimizations that makes MSM faster in Gnack. The second box was right in the circuit C prime of the verification of pi. So in the previous presentations, we've talked about pairing check inside a circuit and how much it is expensive, right? So there is a long line of research of pairing computation outside of the circuit, and we can do computations of a pairing of BLS2F377 outside of the circuit in less than one millisecond. But if you port Mutates, Mutandes, those optimizations inside of a circuit, the number of constraints is roughly 80,000 in R1CS. So we were able to reduce it to 11.5 constraints. So there are a few implementations so far. So there is one in ArcWorks, one in Lipsnark, so I believe one, a new one in the ZK pairing by 0xpark. In ArcWorks, which was the state of the art for this computation, it was 19,000 constraints. But we were able to reduce it to 11.5. So the paper is here. You can check it. But the main ideas are basically... So the inverse is not costly in R1CS. So you can do double and add in a fine. So not double and add, but double and add. And we have a different representation of the line so that we have sparse multiplications by the line in R1CS-wise. We use torus-based arithmetic inside of the circuit. So it uses inverses. That's why it wasn't used outside of the circuit, but inside of the circuit makes sense. And the final exponentiation is also using carbina sepulotomic square instead of conger-scot, which is not used outside of the circuit. And for both the milieu loop and the final exponentiation, we use short addition chains. But the trick here is that normally in a short addition chain of double and add, you would like to optimize the doublings because doublings are faster than additions. But constraint-wise, it's the opposite. Doubling costs more constraints than addition because the line slope has a square. So it costs two constraints instead of one, just which is division. So the idea is GOS-16 is only in 19k constraints. BLS signature is 14.8 constraints, the case constraint. And KCG Verifier is just 20,000 constraints. The last box. So pairings over BW6761. So we can compute this on an Intel machine here in 1.22 milliseconds compared to 1.71 milliseconds. And the question is why? So actually, we do not implement the same formula. So a pairing of p and q is this m of p and q, which is what we call the milieu loop. So it is a loop. And a finite exponentiation, which is the exponentiation by q to the 6 minus 1 out of r. So the optimization comes from the milieu loop. So the original paper of BW6761 has this formula for computing a pairing, which is f of u plus 1 and f of u, q minus u, square minus u. So which are two milieu loops of size u plus 1 and u, q minus u, square minus u. So when you see those, so they have bit size 64 and 190 and the Hamin weight in 2.5 is 7 and 31. So what we did in Gnark is that we observed that u minus 1 square, the Hamin weight of it in 2.5 is just 12 compared to 31. And we rearranged the equation so that we have this second equation. And basically, you have two milieu loops still, but the second milieu loop, this one, is using the result of this one. And it's not starting at 1 as in these two. And this is just a line computation. And this point is already computed in this milieu loop. So you have really just two milieu loops as in here, but with the loop size way smaller. The exponentiation by q are cheap because these are just for the news. So we have also a paper about this. It is on ePlanet. You can check it out. And we have a HackMD here, block, where it explains the changes between these two and other algorithms. Because this one is just for one pairing and for multi-pairings we have other algorithms. So a novel algorithm that we call Twisted Tates Edwards. Because in this kind of ellipticals, BW6, G1 and G2 are on the same field. So you can use states instead of 8. And you can use the endomorphism there and for multi-pairings it was way faster than what we had previously. So this is just a couple of optimizations that I talked about in Gnark. So it is really optimized in all the stack. If you have any questions, please feel free to contact us by email or Twitter. There are some useful links here. And I'm happy to take any questions. Thank you. Hi. Good talk. Thank you. I want to know why did you choose Go instead of Rust, for example? Good question. So this work was started before I joined Sector's Alloy Dingo. But, yeah. So we think that Go is still also fast. Also because there are many libraries and projects over there that are using Rust. And there are many blockchains that are using Go. So it was nice to have the Snark library written in Go so that we can have native integration with them. You mentioned you guys had support for lookups in Gark. Is that already released? Yeah. So it is in Gnark. But we do not have plonk with Pilookups. We have just the Pilookup argument in Gnark. That we will use for the ZK Evia. I guess, so you're saying you can use Pilookup, but how do you use it without the plonk integration? So we do not have the integration so far, but we have just the Pilookup arguments. Okay. Got it.