 Hi, my name is Karsten Baum and I will now present you the Mac and Cheese zero-knowledge proof system for Boolean and arithmetic circuits with nested disjunctions. This is joint work with Alex Malozomov and Mark Rosen from Galois and Peter Scholl from Orhus University. As a quick overview on a natural mac and cheese is a committed proof style zero-knowledge proof for arbitrary circuits over any field. It has specific optimizations that allow it to support nested disjunctions more efficiently and it's practically efficient. Our implementation that includes both the pre-processing and the online phase of the zero-knowledge proof system achieves 140 nanoseconds per proven end gate or 1.5 milliseconds per multiplication over a 61-bit field. And our proof system asymptotically sends approximately one field element for any kind of multiplication or three field elements if no amortizations are being used. Before I will go into the details of the mac and cheese proof system let me recap some definitions, which are necessary for understanding how mac and cheese actually works. So let us assume that we want to show that a statement x is in the language l then, you know, we could just consider that there is a touring machine which accepts x together with a witness w in polynomial time. An alternative view which we use in this paper which is much more useful and is generally being used in many zero-knowledge proof systems is that there exists a circuit c which can be obtained from the statement x and we say that x is in the language l if there exists a witness w such that the circuit when evaluated on the witness w evaluates to zero. So only if and only if the circuit evaluates to zero we say that x is in the language. And here w is considered as a witness. And if we just just write out that circuit and alternatively we could view that as many gates with inputs and outputs as shown here and the inputs of the top level of gates will be fed with the witness w with the individual field elements from the witness and then one evaluates each of the gates throughout the whole circuits until the output of the last gate has been obtained. And then we just consider if the output of that last gate is being zero or not. And we can use this kind of formulation in when constructing a zero-knowledge proof where the view is now that the prover knows the circuit c and so does the verifier and the prover additionally has the witness w. Now the prover wants to convince the verifier that the statement is true so that c on w evaluates to zero. So the two run an interactive protocol which has at the end the verifier saying that he either accepts or rejects. And this usually comes with three properties that we want for zero-knowledge proofs namely the protocol is complete if w is indeed a witness for c then the an honest verifier will always accept. Additionally we have soundness meaning that if the prover if the statement is not true then the prover cannot convince the verifier. In this work we will use knowledge soundness meaning that if a prover can convince a verifier with decent probability then one would be able to extract a true witness w for the statement. And a protocol is zero-knowledge meaning that all the verifier learns is if the statement is true or not but it in particular does not learn any information about the witness w. A well-known technique for constructing a zero-knowledge proof is the so-called commit-improve paradigm. Here one uses homomorphic commitments in order to construct a zero-knowledge proof and the starting point is the protocol due to Kram and Damgo which uses commitments based on discrete log assumptions. So let's assume that we have a circuit c and we want to convince the verifier that c applied to some witness w is zero. So what the prover does is it commits to the individual elements of the witness w all of them individual commitments then it computes the outputs of all the intermediate gates and it proves in zero-knowledge that the commitments to the outputs of each gate are consistent with the inputs of each of the gates. So that means by recursively applying this proof for each of the gates we in the end get that the output can be opened to zero if and only if the witness was actually valid for a certain circuit c. This approach for commit-improve actually only needs linearly homomorphic commitments and this is well known. Here what this means is that the verifier can compute a commitment to alpha x plus beta y plus gamma from commitments to x and y and publicly known values alpha, beta and gamma without any interaction with the prover. So the parties do not have to communicate in order to transform commitments into linearly related commitments. So if we want to evaluate linear gates or prove anything about the output of a linear gate then we only have to apply this homomorphism and do not need any interaction in order to prove this relation whereas for multiplication gates what one usually does is one the prover commits to a random multiplication triple as well as to the product z that should come out of this multiplication gate and then one uses the so-called Biva circuit randomization technique in order to prove that the commitment in z opens to a commitment of x multiplied with y and for this one uses the multiplication triple that the prover additionally had to commit to. These are the general ideas which you can also find in the Mac and Cheese protocol although we alter them in order to obtain a highly efficient zero-knowledge proof system. The foundation of our proof system are so-called vector-oblivious linear evaluations where oblivious linear evaluation or OLE for short can be seen as a generalization of generalization of oblivious transfers to an arbitrary field f. Here the sender inputs alpha and beta into our OLE box while the receiver inputs r and obtains the correlation alpha times r plus beta. Here the sender doesn't learn anything about r while the verifier doesn't learn the receiver doesn't learn anything beyond this linear relation and vector OLE is a version of OLE where if we have multiple oblivious linear evaluations then they will all have the same alpha across all the different instances and there exist very fast instantiations that allow you to obtain lots of VOLE correlations based on the work of Boyle et al and these in particular need a very small communication of let's say approximately 0.4 bits per correlation and a small amount of computation so approximately 80 to 85 nanoseconds for one vector OLE instance. In our work we instantiate the commitments using linearly homomorphic MACs which are instantiated from this vector OLE. So let's say we have a random vector OLE instance where we assume that the prover actually inputs r into a vector OLE whereas the verifier inputs alpha and beta. The prover additionally has the value m which we now consider as an information theoretic MAC on the message r under the key alpha and beta. This on its own is already linearly homomorphic over f which one can easily verify. Now how can this be used as a commitment scheme? Well assuming we can make lots of these correlations for random r then in order to commit to a fixed value x the prover can simply send the difference between x and the random value r to the verifier. The verifier locally updates his value beta and now one can easily see that by adjusting beta now both parties together hold a correlation which max the value x using alpha and beta prime as as keys and in order to open a commitment the prover simply sends x as well as the MAC m to the verifier who checks the relation with alpha and beta prime. This is only as binding as the size of the field so if we want to use this with for example bits then one would either have to repeat this multiple times or use subfield vector OLE where r comes from a subfield of where alpha and beta actually come from and also one can easily see that this is a designated verifier scheme namely the verifier has some secret randomness which is alpha and beta if the prover would ever get hold of alpha and beta then it could forge proofs so the verifier must keep this as a secret. Using these information theoretic max we can now outline a simple version of the mac and cheese protocol here the prover initially commits to all the elements of the witness and then evaluates the circuit gate by gate and proves that all the gate outputs are committed correctly now for the linear gates proving this is easy because we can use the linear homomorphism of the information theoretic max in order to prove that multiplications are done correctly what the prover does is an optimization of beaver's approach where it's in addition to the output of the multiplication commits to an auxiliary value c which is derived from a random value a and remember that random commitments can be done cheaply in vector OLE and a is multiplied with one of the terms of the multiplication that we want to verify not a verifier sends a random challenge which is used in order to create a and to open a linear relation and then the prover additionally shows that this linear relation evaluates to zero now the good thing about zero checks is that one only has to send the mac because the verifier already knows what to expect namely that the committed value there must be zero and in particular the verifier then only has to check if the value being sent is the same as the value that already has namely the value beta now this simple multiplication check sends three field elements per multiplication which one can easily verify but this only holds even true for large fields for small fields the overhead might be even bigger so the question is if we can actually do multiplication verification with less overhead and we showed that this is actually possible by using the recursive dot product check of bonny et al this increases the round complexity for checking multiplications from constant number of rounds to something that is logarithmic in the number of multiplications that we want to check this again can we get down to a constant by using the fiat-chamear transform and the communication improves to one plus epsilon field elements per multiplication or also per verified and relation where the constants is 0.008 for one million multiplications or one million ends that we verify at once and if you want to learn in more detail how this actually works I'd like to refer you to the paper for more details as already mentioned in the beginning a special feature of mac and cheeses that it can more efficiently deal with nested disjunctions so we'll now explain what these are and how we actually handle them in our proof system for disjunction we consider that there are multiple sub-circuits in this example c1 and c2 and our circuit will evaluate to zero meaning the witness is valid if the first circuit evaluates to zero or the second circuit evaluates to zero or maybe both of them at the same time and usually if we just feed such a circuit into a proof system then we would have to transfer communication for both of these sub-circuits now what we want to achieve is that we only communicate information that is proportional to the longest branch that means that if for example we have 100 parallel branches and only one of them evaluates to zero then our proof would only need to communicate one of the 100 branches and we would save a factor of 100 in communication now the key observation that we make is that the proofers messages that have to be sent in order to prove that one of the branches evaluates to zero well all of these are appearing to the verifier as random field elements right if you prove that the multiplication is is correct or if you commit to an additional value then all the verifier ever sees are random field elements so what we just do is that the verifier since it can't distinguish between the individual branches when messages come in well we only send him messages that can be used to evaluate the true branch but since he doesn't know what these messages are there for it will just use the same messages for all the branches at the same time right so in our example if for example the first circuit was true we only sent the messages for the first circuit but the verifier will also try to verify the second sub-circuit using this but it will never notice that it's just getting garbage there now in more detail how do we do disruptive proofs in mac and cheese well let's assume we have m different clauses c1 to cm where there's a witness that makes at least one of them evaluate to zero as mentioned the proofers since messages for evaluating this one that evolves to zero in this case it's called ci the verifier sends random challenges for all the branches as he did before now both parties evaluate each branch locally based on the random challenges from the verifier so this implicitly defines the commitments to the outputs of all of the branches what one can prove is that all of these commitments if if the witness didn't make this branch to zero then one would definitely get something else than zero even if one evaluates on garbage messages so the prover can't use this in order to cheat in the process and then so one of the one of these commitments must be zero the others are random but you know the prover can determine what they actually are then the parties just use an or proof which shows that there exists one of these commitments that is actually zero and this is a standard technique used already by kramma damgart and schoenlachas in 94 for sigma protocols which we adapt to commit and prove style systems now the overall communication of this is scaling in the length of the longest branch plus an additional o of m field elements which are necessary in order to perform the or proof and this improves by a factor of m actually over the naive approach ignoring the o of m for a second where for the naive approach we would have to prove all branches towards the verifier one can then also modify this approach to support statements where the prover tries to show that k out of the subcircuits evaluate to zero here one replaces the one out of m or proof with the threshold or proof of cds 94 again this improves the overall communication to a factor two k times the longest subcircuit whereas the naive approach only needs to communicate all of the subcircuits and then we can also prove disjunctions inside this junction so we can nest our disjunction proofs into each other and there can be multiple layers of these at the same time as mentioned in order to prove such a disjunction we have to communicate something that scales in the longest branch of this junction this seems to be unavoidable but additionally we have to communicate m field elements in order to do the or proof where m is the number of branches now let's see how one can reduce this overhead and in in order to understand what is going on there let's say instead of doing an or proof we do an alternative proof where we show that the product of all the output commitments of the individual branches is actually zero now this is equivalent at least for a simple disjunction to computing an or proof because only if at least one of these values where i is actually zero one will get to a value that is actually zero now why can this be done with less overhead well let's assume we have two branches where the first one is actually zero whereas the second one is some let's say random value then now using just one multiplication we can do the or proof right we just multiply y1 with y2 and prove that this is actually zero now if we have more branches let's say y3 and y4 and then we would now do the same thing but what one would observe is that combining y1 and y2 with a multiplication gate leads to a sub-circuit and y3 and y4 together with the multiplication gate is again a sub-circuit and we can recursively apply the technique from before meaning that we now take the output of the left and multiplication gate use this you know in the simulation of the right branch and now do a multiplication proof on both of these branches and again this will evaluate to zero if one of the two is actually zero and the cool thing is well we only had to communicate something for the true branches of this of this circuit whereas for the right top right multiplication gate we didn't actually have to send information and if one does the math then this means that one actually has to send something that scales logarithmically in the number of branches instead of something that is linear in the number of branches as was mentioned in the introduction we implemented mac and cheese and now also want to report on the efficiency of our proof system mac and cheese has been implemented in Rust by us both the offline meaning the vector or lead generation as well as the online phase in non-directed fashion using the fiat-chamea transform it's implemented based on the swanky secure computation framework by Galois and we plan to open source it as soon as possible although some things are currently being re-implemented our mac and cheese implementation supports input from the zk interface project we can support various fields such as f2 to the 61 minus 1 or also proofs that are just over f2 and our optimizations meaning nested disjunctions are used in our mac and cheese implementation and we also have these optimized multiplications which only communicate approximately one field element per multiplication we benchmark the performance of our implementation in a setting where proven verify are connected over a network with 95 millisecond ping and 2.25 megabytes per second of bandwidth our benchmarks include the runtimes necessary to pre-compute the vector all these and in this setting mac and cheese requires 1.5 microseconds to prove one multiplication gate in f2 to the 61 minus 1 or 140 nanoseconds in order to prove one multiplication in the circuit in a circuit over f2 and both of these are in the setting where we use our amortization technique that only communicates approximately one field element per multiplication to show that our improvement for disjunction is actually visible in terms of communication if we have multiple branches each of which are of one billion end gates then we can see that in order to prove eight of these branches we only have to send 75 bytes more than when we want to prove only one of these branches at the same time the proof runtime still increases because the proven the verifier have to evaluate still have to evaluate all of the branches locally but we get this improvement in terms of communication now with all of this let's put mac and cheese into perspective with respect to other works that do large scale zero knowledge proofs stack gobbling has been introduced by heath and koleshnikov and uses gobbled circuits in order to prove statements about circuits over f2 here we can see that mac and cheese actually communicates a lot less per per end gate while they also have disjunctions such as we have wolverine can possibly also be extended to support disjunctions and the performance of wolverine is actually comparable in terms of communication around complexity to our simple mac and cheese protocol means without the amortized proofs line point ck introduced by ditmer at all has a comparable efficiency to our batched mac and cheese approach in terms of in terms of communication over large fields and line points your knowledge has been further developed in the quick silver protocol which has a similar which has a comparable performance to ours so they get rid of the epsilon factor that we have and have twice as much throughput in terms of multiplication million multiplications per second than what we do but to the best of our knowledge quick silver can at least not simply be extended to also support these less than disjunctions that mac and cheese actually has in summary mac and cheese is a designated verify a zoology proof system that is particularly suitable for large-scale zoology so zoology with large circuits we use vector overlies in order to instantiate information theoretic max and get a committed proofs protocol essentially most of the communication only flows from the prover to the verifier so we get two constant rounds using a fiat-jamir transform and we can optimize disjunctions in order to reduce communication unfortunately not computation our communication is approximately one field element per multiplication gate for any field and as mentioned we have a highly efficient implementation that is as efficient as the state of the art if you want to learn more i'd like to refer you to the paper or if you have questions feel free to either contact us during the online talk at crypto or by email and with this i'd like to thank you for your attention