 Hello everyone, this is Dragosh and today me and Samir are going to talk about the malicious DC cure matrix multiplication with applications to private deep learning. And this was joined to work with how mirror on the young soul and Samir. And this work started, I think, one year ago while me and Samir were, were interning at Microsoft research and Redmond. Good times. So what, what does multi-party computation and we have a bunch of parties Alice, Bob and Charlie want to compute the function over their inputs while keeping them private. This is basically what multi-party computation allows you to achieve. And in the private of, in the realm of private machine learning, you, we can replace those inputs of Alice and Bob with models and Charlie's input with a picture. And in this case, Charlie would like to get the prediction of Alice and Bob's model without revealing the, the, the beautiful cat. So, how are they going to do that, they're going to do that, obviously using multi-party computation. So what does that mean to compute f of a B and C. Well, it boils down to a bunch of private matrix multiplication and private comparisons. We're going to focus and we are focusing on this paper on obtaining private matrix multiplications more efficiently. So, in this sense, like, think that Alice and Bob hold the matrix A, which is secret, and Charlie also holds the matrix C, which is secret and they want to get the product of these. So our result is we can obtain more efficient convolution layers for dishonest majority to give you like a rough idea of what's happening there. But we're in this, in this talk, we're going to talk always about additive secret sharing. What does that mean? Well, this model, which you've sung on the previous page is additively secret share between Alice and Bob, such that each individual share reveals nothing about the final model. But if they combine the two shares, they can reconstruct the output. The same goes with a grumpy cat. You, they don't know they have a sharing of a grumpy cat. They can only see that if they put their shares together and then they can reconstruct the grumpy cat. So we focus on the, on the land of in the land of dishonest majority. So in that model is is replaced with the resident 50, which is a big convolution network. And we want to get the prediction on the resident network with a secret image. So we have a secret model, which is the resident model, and we have the secret image. When evaluating that model using the secret image, we can reduce the cost from five terabytes to 41 gigabytes so classically with known techniques, this would take the evaluation of that model on the secret image. It would take around five terabytes of communication between the parties, but now it can, we show that can take around 41 gigabytes. So this is this is quite cool. We believe. So there was some prior work done in this in this land in achieving matrix multiplication. Using secret matrices. So I think the first one that we recall is secure ML. So basically, we've got two parties, and they're both actually acting accordingly to the protocol. They want to multiply two matrices and they use this to do a bunch of training or also predictions on on secret models. So the their idea was to, well, if you want to evaluate the model in machine learning, you go to the basic block which is two matrices. So how do you multiply matrices? Well, if you if you manage to create a random matrix triple, according to random matrix triple, you can you can multiply two matrices very efficiently. So what I mean by random matrix triple is that party A. These two parties have secret shares of A, B and C such that if they all sum their shares, they can reconstruct the secret. So in this case, C equals a times B, but where a and B are random chosen randomly uniform from a field or ring. So you can obtain matrices matrix multiplication quite fast using using random matrix triples. Now, there's also the case where we've got more than two parties right we can have free parties and in the case of honest majority matrix multiplication can be done quite fast. Why, well, because dot products very efficiently in the honest majority. So, suppose you have like very big vector, and you have another very big vector and you do the dot product product product product can be done with constant communication overhead independently of those of the length of those big vectors. So that's the main idea. People use to obtain matrix multiplication with honest majority. Now there's also honest majority but with active security so supporting at most one corrupt party, which can arbitrary deviate from the protocol. And that can also be done using dot products because products are efficient to do with honest majority. There's some lots of recent work actually focused on obtaining these products as efficient as as we can. So what about this honest majority, right there. We realized that there's not much research about how to multiply matrix triples with with the dishonest majority meaning that lots of particle can arbitrarily deviate from the protocol. And we still want to protect on this party share. What we do is basically in a nutshell, we use the idea from secure ML, but we use it for convolution triple so instead of generating matrix triples we generate convolution triples. We add zero knowledge proofs on top of that, which is, which are the newer variant for homomorphic encryption is your knowledge proofs by bomb et al. And then we incorporate new homomorphic encryption based techniques for matrix multiplication. From two years ago at CCS. So, what does, what does MPC. So in our, in our talk in our paper, we work in the pre processing model so what does that mean. So if you've got a bunch of parties the same parties as you saw the last slides. And they want to compute on functional over the impulse they kind of, they can split this, this process into two phases. Right. So there's the first there's the face of, well, this is not. This is before the two phases you first want to roll the, the function F to a series of additions and multiplications. So for when you're going through F and you encounter a multiplication, you need to generate some sort of correlated sequence, for example, beaver triples. So, what does beaver triple mean is what what an additive be shared beaver triple is, is that each party has some random number or random numbers actually a one B one C one and so on. And if they all sum their shares, so if you sum the shares or reconstruct a if you sum the shares of B, then you multiply these two, we're going to get the sum of the sharing of C. So this is this is quite quite a crucial point on when you're trying to do MPC the pre processing model you need to have to use these beaver triples. So how, well, whenever you want to multiply for every nonlinear operation you're doing you fetch and you fetch a beaver triple and use it. We're not going to talk about how, how you're going to use it but the high level idea is use a beaver to use a correlated random beaver triple from the pre processing phase in the online phase. And we call it online because parties get together and share their inputs. So it's kind of the input independent phase and the input independent phase, the pre processing phase. Okay, so now we have this, what does do we need to understand the contributions. Well, homomorphic encryption. What does that mean. Well, we've got the party, Charlie, who has some data, and Charlie can send this data to the cloud. By first encrypting it. And then sending over the cipher text. So the cloud only sees garbage, but it's, it's encrypted. This garbage means it's encrypted data. So now the cloud can perform some computations, transform this cipher text into another cipher text, such that, in the end, it can decrypt to the function to some function f, which was applied over the plain text. So basically, the process goes like this. In the end, I encrypt stuff, send it to the cloud, the cloud does some computation associated to some function F. When I receive the plain, the cipher text back I can decrypt and I can actually see that my plain text was the function F, which was applied over the original data. So this is basically a homomorphic encryption. You can do plain text operations by just manipulating the cipher texts. And now I'm going to pass the button to Samir. Samir, can you catch this? Thank you, Drago for the pen. So let me start by giving the main result of this work. So prior work in particular in the triple generation phase has a large amount of communication overhead in particular. So if you want to do matrix triple generation of size n by n, you require order n cube communication. In this work, we demonstrate how to do this communication, how to demonstrate this triple generation, viewer triple generation for matrices in order n square communication. So let's take a look at how we do this. So the triple generation protocol, I'm going to present it, how speeds does it, which is the state of the art protocol we built on. And then how we change it to actually do this. So conceptually, the idea is very, very simple. Each party locally just generates their shares, a1, a2, up to a1 and b1, b2, bn. And because of the linearity of the homomorphic encryption scheme, these can simply be added to get shares of encryptions of a and b. And so a and b are like the underlying secret values. Now, given this, we need to generate a triple. We need to actually compute encryption of the product a times b. And this is where the homomorphic encryption part of this encryption scheme comes into play. So each of these parties simply broadcast their value since they are now encrypted. No one can decrypt it. And so in particular, a quick detail about this is like the encryption scheme is such that the public key is shared by all the is known to all the parties. But the secret key is distributed among all the parties. And so all the parties now distribute, broadcast their encryptions and then simply compute this, which is c, which is given by encryption of a times encryption of b. So because of the homomorphic encryption property, this is c is simply the encryption of a times b. Finally, the last bit of this picture would be a distributed decryption protocol which takes the ciphertext c, which is the encryption of a times b. And then it splits it into additive shares c1, c2, cn such that some of all of this is actually the decryption of the ciphertext. And so this is all prior work. The way we change this is is conceptually very, very straightforward. Instead of having a1, a2, an all be just shares of a single number. We actually have them all be matrices where the sum of all of these matrices is the matrix A. And what what we change is then we change this this this product into a matrix product. So under homomorphic encryption, you have a ciphertext which encodes a matrix. And then we actually want to perform a matrix product of this. So the first thing is when we are working in a dishonest majority protocol, we really have to ensure a lot of parties could really, really act maliciously. And so we need to ensure that the inputs are correct as well as the the noise in the plain text or the noise used to generate the ciphertext is bounded and so on. And because some of these could leak secret keys in the distributed decryption protocol. And so to do this zero knowledge proofs are used and they ensure like a lot of these properties which we read. Just to also quickly talk about this, since this is in the dishonest majority protocol, we also have each of these shares is authenticated share. And so we have the max on each of these and that's a detail which I'm going to skip because it's fairly straightforward. You just do another ciphertext product with the encryption of the Mac key with this which is done publicly and then do a distributed decryption protocol. So the biggest changes to some extent are the following. So first is doing the matrix multiplication over ciphertext over arbitrary size matrices. So this has some technical challenges as well as like a number of systems improvements some of which I'll talk about later. The second important contribution here is the elimination of sacrifice by using a larger depth homomorphic encryption scheme. So in particular we use a depth to homomorphic encryption scheme so we can simply do a number of public operate operations one after the other without having to do a distributed decryption or a resharing protocol in between. So I'm going to move into the zero knowledge proof part of this. So the zero knowledge proof. This is the starting point that each of these parties simply encrypt their local value and then broadcast it. The zero knowledge proof follows a very standard paradigm of a Sigma protocol where Sigma protocol has like three phases like commitment challenge and a response. And this is simply the commitment phase where each party simply broadcast their their own input. The second phase is a challenge where we build on a state of the art protocol. So this is the protocol from Top Gear where all these parties simultaneously act as a prover as well as a verifier to to simultaneously validate their own encryption as well as also ensure the other encryptions are well formed. And that's why they generate the the challenge randomness jointly like using everyone generating it locally and then using that. So here is modeled by using a fran functionality. And finally the response and verification is is where like the parties really just broadcast openings of certain variables which are generally statistically hiding the the secret values and then the verification checks for certain bounds and so on. And and that's the zero knowledge proof. The key differences here are also the fact that we we actually for the first time use BFV as the homomorphic encryption scheme and and show that it has certain advantages in particular when we use it or certain parameter regimes. We can reduce the communication over it off the zero knowledge proof by a small factor because it it simply changes from a statistically hiding scheme to to actually information theoretically hiding certain parameters. So for the details I again differ to the paper. With that I'm going to spend some time on actually talking about the the evaluation of the experimental results of this work. So so I think this is just a big chunk of the results on on both the LAN and the man setting. So just to briefly explain what this is on the left we have the matrix sizes which is going from 128 by 128 size matrix ripples to about 1000 by 1000 size matrix. On the right we have four columns. So we have total time one and total time 16 and then speed 16 speeds one. And so speed 16 speeds one referred to the prior state of the art protocol which is the speeds protocol but also with all the advances that have happened over the years. And total time refers to the time of our protocol the 16 and one refer to implementations over 16 course versus one core. And then the top table is in the LAN setting and then the bottom table is in the van setting. So and these are fairly standard network settings which we have given the details in the paper. So instead of presenting all these results I'm going to focus on four specific numbers and then kind of highlight some of the key results here. So the first one is a fairly straightforward result like what what is the performance compared to prior art. And so here we can see that in the LAN setting we are about 3.6 times faster in the van setting where the communication improvements which is really what this work is about really start playing a bigger role. That's where we are about 36 times faster than prior work. The the other cool bit about this is that the fact that these are asymptotic results. So as we go to larger matrix sizes so these are common for instance in the resonant like 1000 by 1000 is a reasonable size matrix as we go to larger size. The the overhead or the improvement actually gets better because it's just asymptotic improvement. So that's that's sort of the first result. The second one is also a very interesting result. So when we compare across these two tables so what we're comparing is like performance in the LAN setting versus in the van setting. Again the fact that the van setting has much worse network conditions. We have about 11 times slow down in in prior art however in in our protocol we just notice a 13% slow down. And this is a really interesting results because what it highlights is the fact that a bulk of the cost of our protocol comes from the computation cost as opposed to the communication. Because the only thing that changes between a LAN and a van deployment is the computer the communication because the computation is really the same in both of them. And so on the left our protocol is really compute bound and on the right is communication bound. And this is a really interesting it has important applications because one computation is it can be paralyzed we can have better hardware and can be improved there is hope there. But communication is is really dependent on the network infrastructure we have and so if we have to send data across like from the east or west coast of the US. We have to incur a certain like 70 or 100 millisecond round trip time which which is something that cannot be improved further. At least it's much harder to do it then then compute improvement. And so this is a really cool result that the protocol is really compute bound and I think it just shows like how how big of an improvement this this protocol does. With that I'm also going to talk a little bit about the details of the Cybertex Cybertex matrix multiplication here. So we use the state of the art protocol from 2018 CCS and what it does is essentially transform a matrix product into something which is homomorphic encryption friendly. So it in terms of dot products or additions and and the dot products also involves certain rotations in them. So each one is a transformation and this makes it this protocol really optimally uses the number of slots in the homomorphic encryption scheme. And in that sense it's fairly optimal. And so this gave us great performances. However, we also have a number of systems improvements and which is which I mentioned in detail in the paper. But I just want to mention that lazy key switching which is sort of switching the order of when we do the key switching like to actually improve the performance. Hoisting as well as hoisting is where we have to generate a number of Cybertex and we optimize there and blocking which is if you want to generate matrix triple of size thousand by thousand. But we do not want to blow up the homomorphic encryption parameters too much because it slows down performance. How do we do it in smaller chunks to actually sustain arbitrary size matrices. So those are just some of the techniques and these are presented in detail in the paper. So with that I'm quickly going to mention some future work on the next slide. So I think this is great because there are some really interesting question which this work unfolds. So the first one is extension to rings. So this work has demonstrated this triple generation but overfields but with rings are much more amenable to efficient implementation. And so with work such as speeds 2k which do this in the dishonest majority. It would be interesting to see how we can actually combine this triple generation idea with those works. The second really interesting idea is there is an emerging line of work which is on silent preprocessing or silent just generation of this preprocessing material. And can we apply this matrix triple side to do the same. And so in particular can we generate a matrix triple silently. And so I have a quick reference here for one of the paper which might be interesting. And we are happy to talk about this. Any of the authors like offline or in the question answer session. So with that I would like to thank you for your attention and happy to take questions. Our paper is up on E print and so feel free to reach out to any of the authors. The email IDs are mentioned there and thank you.