 Hello everyone, thank you for joining this pre-recorded talk for our paper titled Banquet, Short and Fast Signatures from AES, presented for PKC 2021. This is joint work with Carsten Baum, Daniel Callis, Emanuella Orsini, Peter Scholl and Greg Zeveroge. In this video, I'll be going through first some key facts of the paper, then I'll be explaining the idea of zero-knowledge proofs as knowledge from MPC. I will present our technical contribution for inverse verification, which will enable us to build the Banquet signature scheme. Finally, I'll touch about some implementation details. First with the key facts. Our Banquet signature scheme is built using an MPC in the H-based, zero-knowledge proof of knowledge, for which we use a Fiat-Chemier transform to achieve non-interactivity. And its EUFCMA security is based on the one-way function assumption of the AES block cipher with a slightly modified key generation algorithm in the random oracle model. One key feature here is that it has no public key assumptions, which is the same as for other MPC in the H-based signature schemes, which makes it quite attractive. Its work that follows the same line as Picnic, which is now in its third iteration as a NIST alternative finalist, and I will highlight here that the Picnic scheme is based on the low MC block cipher, which has about 600 end gates. And it's also a continuation of previous work called BBQ, which was the first attempt to instantiate Picnic-style signatures with the AES block cipher, which has about 11 times more end gates than low MC. Our paper presents several improvements over these two previous work. Not only do we have a better assumption in the sense that we use AES over low MC, but we achieve better performance in both signature size and speed. Here are some numbers. I've highlighted the Picnic-3 and the Banquet results. We see that our latest implementation achieves almost comparable running times, only one millisecond more than Picnic-3, for only seven kilobytes more in signature size, considering that we use a circuit that is 11 times bigger. We can further reduce our signature size by increasing this parameter N, which is the number of parties, which I'll explain later. But at the same time here, we have to pay for increased computation time. Now I will first describe the general idea behind the proof of knowledge framework. To build a zero-knowledge proof of knowledge from MPC, we are aiming to prove claims of this form, which is that we know a witness such that a circuit evaluated on a public statement and this witness will output one. And the proof of this claim is the ability to simulate an N-party MPC protocol which computes this circuit. In a simplified way, we can say that the prover generates and commits to the view of these N-parties. And the verifier will then ask to see some of them, and it will check that they are consistent. So that each message sent by one party to another was indeed received correctly and without modification. And it will also check that the circuit outputs the correct value. The soundness of the zero-knowledge proof will be guaranteed by the probability that the verifier sees inconsistent views. So if the prover cheated in the construction of one of these views, there's a high probability that the verifier will see this. And the zero-knowledge property comes from the semi-honor security of the MPC protocol because the verifier opens only part of the views. And therefore, the secret sharing will prevent knowledge leakage. So building signature schemes from the zero-knowledge proof was done before. The latest advancement in the picnic scheme uses the KKW technique, which was presented at CCS 2018, where here the idea is to compute the circuit with an MPC protocol. This circuit uses correlated randomness, so binary masks or triples, which must be verified. So the verifier must be convinced by the prover that these correlated randomness are correct. Here the picnic scheme uses a cut-and-choose technique, which then allows for a communication-efficient MPC protocol. The drawback is that hundreds of cut-and-choose triples have to be generated for only a few tens to be kept. Not only this, but if you want to compress the proof to a three-round, then the circuit has to be executed for each of the hundreds of cut-and-choose. And all of these executions have to be discarded when only tens are kept. So to give an idea in picnic, you have to generate 252 copies of the circuit, when only 36 will actually be challenged by the verifier. Now, once this is set up, the circuit that the picnic uses is this low MC binary circuit over F2, for which we have a plain text x, a cipher text y, and a key w. And the circuit evaluates to one if and only if the block cipher produces y on input x with key w. Now, when we first thought to do the barbecue signature scheme, the idea was to move away from low MC and use AES instead. Now, because AES is 11 times bigger as a binary circuit, we had to change the framework there. And we decided to use arithmetic circuit over F2 to D8 instead. This meant that instead of using N gates, we used inversion gates, which are pretty much equivalent to the S-box operation in AES. To do these inversion operations, we used a masked inversion technique where the input S is first masked with a random R, which costs one of these pre-processed triples. This masking is opened, inverted, and the R inverse is then removed locally by the parties. And this produces the inverse of S. This technique has two drawbacks. The first is that it requires the randomness to be non-zero. Since it's random, this happens with little probability, and the parties simply restart if that happens. The other drawback is that it requires the input to be non-zero. And this is where we modify the key generation of AES to simply state that the plain text x, the block cipher, and the key w should be chosen such that this never happens. In practice, this only reduces concrete security by one to three bits. The next paradigm that we can explore in MPC in the head is to move away from this cut and choose for verified randomness, and instead let the MPC protocol itself do the verification. This comes from sacrificing techniques in MPC where a lot of correlated randomness is created, and some of it is sacrificed to verify the rest. Here we would have the prover inject the result of multiplication as part of the witness for the relation. This means that there is no need to compute through the circuit anymore. Instead, the MPC parties execute a verification protocol. For example, they can satisfy one suspicious, they can sacrifice, sorry, one suspicious triple to verify another. And this leads to efficiency optimization, such as batching possibilities. So now we can have a quick sketch of a protocol where we see that MPC parties receive suspicious multiplication results from some god-like prover. And then they can verify them by sacrificing equally suspicious random triples. And this would result in a protocol which requires the communication of four elements for each nonlinear gate of the circuit, plus one extra. This, however, leads to a protocol which has inherently more than five rounds, which means that the noninteractive soundness cannot be proven straightforwardly. However, this first way of doing inverse verification is quite naive. So now I'm going to describe how we can improve on this. So first to describe a little bit this naive approach, in the context of having inversions, the prover here would be injecting these suspicious inverses into the MPC in the head executions. That means that at the end, the parties end up with M pairs, which allegedly should each multiply to one. So the naive protocol is then to cast each of these pairs as a multiplication triple with expected output one and verify each of these multiplications with a triple that is random. And this leads to this communication cost of four times the size of the circuit. So the beginning of our improved check uses these polynomials, S and T, defined using, well, the inputs S and the injected inverses T. The polynomial P is defined as their product. And we see that by defining the evaluation of P on these points, one to M, to be equal to one, this is where we are setting the relation that we are verifying. So if we wanted to verify a different relation between S and T, this is what we would change here. Now what we are left to do is to ask for the proof to check that P is indeed equal to the product of S and T. To do this, we can sample a random element of the field, open the three polynomials at that point, and check that the equality holds for this point. Our soundness bound at this point comes from the Schwarz-Zippel lemma, where we see that a non-zero polynomial has a bounded probability of being equal to zero when evaluated on a random point. In our case, the Q polynomial is defined as P minus S times T, which is non-zero if one of the inverses T has been wrongfully injected. So it's actually not equal to the corresponding input. One thing to note here is that opening these values of S and T at a random point will leak information because the polynomials are not randomized. So we do exactly this, and we add an additional random point to prevent this leakage. So this is how we can bound our soundness for this part of the verification by noting that P is of degree two times M, and therefore we need to ensure that if we choose a field which is much larger than the number of gates in the circuit, then this soundness bound will tend to negligible. So here we can have our improved protocol where we commit to a randomized polynomial S, which contains the inputs, and a polynomial T, which contains the inverses. The P can be defined for its first endpoints in the clear because we are expecting the first endpoints to be equal to one. However, the prover needs to specify the remaining M plus one points because P is of degree two M. So this adds an additional communication cost, and the parties can now open P, S, and T evaluated at the random point R, which has a further constant cost of three rounds. So in total, we see here that we have only two times the number of gates in our communication cost. So we've almost halved the cost of the communication already. And we've removed any kind of cut and choose, so we're not executing the circuit wastefully anymore, and we don't have any triples, except that we do. It's hidden in the point evaluated as zero required for randomization. Here I'll just note that the extra randomness in the polynomial S is prevented to prevent, is added, sorry, to prevent pairs interacting with each other and correcting off each other. But it doesn't add anything to communication. But now that we have this check, we can further improve it. And to see how we do this, first let me rewrite in some sense what we are verifying. So the horizontal vector here is the S polynomial with the randomized coefficients added. And what essentially we are checking is that the inner product of this randomized vector with the vector of injected inverses should equal to the sum of the random coefficients. What we can do is factor our number of nonlinear gates M into M1 and M2. And instead of verifying one inner product of size M, we're going to verify M2 inner products, each of size M1. So we just rearrange our S and T values to fit into these inner products. To do this, we're gonna define multiple S and T polynomials to match with the vectors I was just describing, but we're gonna use a single P polynomial, which is going to be the sum of the products. Now we can directly lead to this generalized verification protocol. Now the commitments to the T polynomials are still gonna cost M elements, but the commitment to the P polynomial is going to require M2 because its degree has changed. And when we open the polynomials evaluated at R, here we require more than before because we're evaluating more polynomials. But when we sum it all up, we see that the total cost is equal to the size of the circuit plus some terms in the square root of the size instead of twice the size. So for a high number of gates, this is significant. So now we can put all of this together into the final signature scheme. The key generation, as I said before, is that we sample an AES key and a plaintext of an appropriate length such that the execution of the AES algorithm doesn't present any zero input to its X boxes. So in the BBQ paper, there was an analysis already which showed that this reduced the security of the assumption by only one to three bits concretely. Then for the signatures, we have the parameters of M, the number of gates, M1, which is the factorization, the number of parties in the MPC, tau is the number of parallel repetition of the proof to increase soundness. And lambda is a lifting parameter because the AES is executed over F2 to the eight. But because of the soundness requirement for our check, we need to embed the values we are checking in a much larger field to ensure that the soundness is negligible. To create a signature, the prover then simulates tau parallel instances, each of them with N parties. And when it's running the parties together with a sharing of the secret key, the witness will also include the sharing of the inverses. This is the injection of the nonlinear outputs. And we use random oracles to generate the randomizing coefficients, the opening polynomial point, and to select the view, which leads to a seven round protocol, which we then make noninteractive. The verification is just a recomputation, which is why we achieve very similar runtimes for prover and verifier. We state here our theorem, which shows that the security depends on some random oracle assumption, some pseudo random distributions. And most importantly, the fact that the function which maps the key to the ciphertext block, because the plaintext is public here, this is a one-way function. So this is where our one-way function on AES, the assumption on AES lies. Finally, to discuss a little bit the implementation, to explain our parameter selection, we consider that an attacker can cheat by resampling the challenges of each intermediary round until they match a guess that it's made. So say that the adversary would guess tau one of the first round challenge, and tau two of the second round, then he will have to guess the remaining tau three to win all of the executions. Now, if we let PI denote the probability that he has correctly guessed these challenges, we can write the cost formula for a particularly given strategy. Then we simply require that this cost be greater than two to the cap for our security parameter for every possible strategy, and we increase our parameters until this is reached. And we furthermore added by choosing M1 to be about the square root of M gives an appropriate balance between speed and signature size. So this table demonstrates the variation in performance as we vary the parameters. So these are all for AES 128. Then below you can see the factorization of the number of S boxes that we use. And we see that by increasing N, the number of parties, we of course increase the computation time because the prover has to simulate more parties, but we also reduce the signature sets. And then each line further divides the results between the different lambda values, which is how high we lift the values during the check. And this affects soundness because we see that the tau requirement differs as well, the number of parallel repetitions. We've optimized the implementation in different ways. First, we noticed that all interpolation points have the same X coordinates, so we can pre-comput some coefficients. Instead of interpolating polynomials for each party, the prover has an overview of everything. So it can reconstruct the polynomials in an unshared fashion and then interpolate, which takes a factor of N out of the computational load. And also for the S and T polynomials, the inputs are the same for each parallel execution. They're just shared in different random ways. So we can use this common values across them to further optimize the Lagrange interpolation. And all of this reduces runtime by almost 30 times. And then using some further improvements, we can reduce to the runtimes that we've achieved. Finally, I've included a table which compares our banquet scheme for the three bottom lines with some of the other state-of-the-art signature sizes for post-quantum candidates. And as before, we see that we are competitive with picnic three in runtime and we're really not far off in signature size, especially with our higher number of parties. And I'll highlight again here that we use the AS circuit, which is much more complex than the picnic slow MC. So thank you for your attention during this video. Here is the imprint number for our paper. And please feel free to contact me or my co-authors if you have further questions.