 Hi everyone, so this is a bit of a book report about what happened when I tried to convince some machine learning people that zero knowledge proofs were very relevant to them. So in that line, this is joint work with a bunch of machine learning professors. Okay, so the motivation we came to this problem with is that right now, today, a lot of the least trusted parties in society are running machine learning models that have broad societal impacts. Now you may think that blockchain is a very low trust space, but I bet that people trust Twitter, like even less. And I don't think people trust credit scores very much either. Nonetheless, these things are changing a lot of the way that we interact in society. A second thread is that machine learning as a service has been growing in popularity, especially recently, with the rise of large foundation models. So in this paradigm, you know, you're maybe making a slide and you want a cool image. So you'll send, hey, OpenAI, I want an image of teddy bears working on AI research underwater. And OpenAI will send you back this beautiful image to put in this slide. Now the core problem in both of these use cases is that, well, as the user, you have no insight into what the model runner is actually doing. You have no idea if an OpenAI actually ran, you know, Dolly 2, or they just have a form of digital artists that are really exploited. So we want some notion of trust and verification in this type of model inference. So if you ask the computer scientists, they'll tell you that there are actually many approaches to this beyond ZK. So we have to learn about them. And the first that they'll bring up is multi-party computation. So in this case, a number of separate parties will perform the machine learning operation together. And of course, because they're collaborating, this requires that the parties are online simultaneously. And there's a one-of-end trust model for privacy and correct execution. That means that you need to trust that only one of the end parties is actually behaving honestly. So it's possible for MPC to provide both privacy and validity. But unfortunately, it requires pretty high interaction and high bandwidth. And at least at present, there's a really high compute overhead if you want to ensure validity. So as a result, we looked into the literature, and it looks like MPC can only handle relatively small models today. Another technique that's pretty popular is homomorphic encryption. The idea here is you take your input data, encrypt it, and send it to some server that runs a machine learning model on the encrypted data without knowing about it. So in this case, homomorphic encryption is only targeted at privacy and not at guaranteeing validity. But the benefits are that you don't need to interact with the server beyond sending your encrypted input. But the downside is that there's a really massive compute overhead. For MPC, I said there was a big compute overhead, but trust me, this is really massive. And so as a consequence, MPC today, as far as I know, can only handle quite tiny models, I mean recognizing digits. Okay, so now we come to ZK. So what is ZK adding that MPC and homomorphic encryption don't bring to the table here? So with ZK, while the server is doing inference, it can generate a proof of valid execution. And with that proof, anyone can verify that the output was correctly generated. So in this case, ZK can ensure validity, that is that the model was correctly executed, and it can ensure a weaker notion of privacy than MPC and homomorphic encryption. That is for ZK, the model runner needs to know everything about the input, whereas that's not necessary in MPC and homomorphic encryption. But once you have the proof, the input can be private from the rest of the world. Another upside is that there's no interaction required, and a downside is that there's still a pretty massive computation overhead over the cost of inference. So in this talk, I'm going to talk about a method we developed to scale the application of ZK by reducing the compute overhead from pretty small models to relatively large models. Okay, so let me first situate you in the task and what I mean by small and big. So a lot of prior work in ZK for neural network inference focused on these two benchmark tasks, MNIST and CIFAR 10. The task in MNIST is you get a black and white digit, and you have to recognize which of those digits it is. The post office uses something like this. In CIFAR, there are 32 by 32 pixels of 10 classes of things like car, automobile, horse, boat, and you have to choose for each image which of those 10 glasses applies. So in our work, we scale things up to approach the much larger ImageNet dataset. So in this case, we have 1,000 classes, ImageNet happens to contain many breeds of dog, so actually there are 200 breeds of dog at 1,000 classes. The images are much higher resolution. They're 224 by 224. And this is really the first standard large benchmark dataset for this image classification task. Okay, so here's what we can do. We chose a model called MobileNet V2, which was developed originally to be run in low resource environments such as mobile phones. For a range of input resolutions, we're able to completely snark the execution of the model. And as you can see, the proving time for a single input is still relatively large. Okay, so even for the smallest input resolution of 96 by 96, we take over two minutes, but we are able to scale to the full ImageNet resolution in about 20 minutes. So this is using the Halo 2 backend with the original IPA proving system. As you can see, the proof size is relatively small, and the verification time, thankfully, is not as large as the proving time, although perhaps still quite large. Okay, so in the rest of the talk, let me talk a little bit about how we did this and what we can do with it. So when you think about snarking execution of a neural network, we came at it by dividing the problem into three pieces. The first piece is it turns out that the notion of difficulty of execution between neural networks run on GPUs, and snarking things, in a ZK circuit, is quite different. So we tried to select the best architecture for this task. The second is we had to devise a pretty optimal way of arithmeticizing this neural network inference operation into a ZK circuit. And last, we selected a proving system largely because we were most familiar with Halo 2. So that's not a very principled choice. But one thing that I wanted to note is we decided to go with a general purpose proving system instead of a proving system devised specifically for neural networks. And the reason we wanted to do that was to plug into the existing tooling ecosystem around Halo 2 and other similar general purpose proving systems. Okay, so to dive in a little bit more on what type of architectures are easy to do in ZK, the first thing to note is that it's very hard to deal with floating point numbers within a ZK circuit, since every variable in a ZK circuit is under the hood, a large prime field element. So that essentially forces us to work with quantized models. That is models that were specifically designed to have all of their weights be 8-bit integers. Now the level of quantization and machine learning generally gives you a trade-off between accuracy and how few bits you've quantized the model into. And in ZK, this translates to a trade-off between accuracy and proving cost. So we did a first pass at this among just off-the-shelf models. And it seems like models optimized for low-resource environments, particularly on the edge, are quite good at this. And so we chose to snark MobileNet V2. You can download this model from the TensorFlow website. The second thing we did to make our search over architectures a bit simpler is we wrote a transpiler from the TensorFlow Lite model format to Halo 2 circuits. And then we applied all of our optimizations to the individual building blocks of this transpiler. So this allowed us to handle the many weird low-resource neural networks without massive programming overhead. Finally, in the earthmitization backend, we used the plonkish earthmitization for Halo 2. And so we actually only used two types of features of Halo 2. We handle all the linear layers via custom gates. And we use the lookup tables to essentially build a lookup table with all nonlinearities. And finally, there's a small subtlety that we have to readjust the fixed point in the quantization, and we're able to do some optimization by shoving that into the lookup table. OK, let me now finish by just giving you a general sense of one application that we can do with this. So there are sort of four settings, as Jason alluded to, so where you can make the model either public or private. You can make the data either public or private. And you can combinatorially combine these. So we have a couple ideas in each quadrant. So if your model is private and your data is public, maybe you're trying to sell your model, and you want to demonstrate to the buyer that your model is any good without just revealing the model altogether. If your data and model are both public, then we think that this could be useful for on-chain verification, as Jason alluded to. And finally, if your data is private, then in both the private and public model format, we think that this could be useful for conducting an audit, for example, in legal discovery, without forcing revelation of the entire data set that you hold. OK, so just to give one sample back of the envelope calculation of how much it would cost to use our SNARK to do verified machine learning model accuracy, we made a very basic protocol where a model perspective buyer asked a model seller to verify the accuracy of a model on a randomly chosen test set. So in this case, you want to randomly sample from the test set, find the accuracy on the random sample. And from that, you're going to get some statistical estimates of your overall model accuracy. So we ran the numbers. And it turns out that if you want to know the accuracy within 5% at a 95% confidence interval, you need to sample 600 times. And that will cost about $90 to verify. If you want to be within 1%, that's going to cost you a little over $2,000 with our current implementation. And just to give you a sense of whether that could be reasonable, people generally are willing to spend in the high five figures, maybe low six figures, just to acquire data to train their model. So maybe it's worthwhile to pay a couple thousand dollars to verify that what you're buying is legitimate. All right, so just to summarize, we constructed the first snark we think that can scale to ImageNet level. And for this, we had to choose the correctly quantized model, write a compiler from TensorFlow Lite to a Halo 2 plonkish arithmetization. And then we ran some benchmarks on whether this sort of technique is reasonable for some concrete applications of verified inference. So going forward, we're excited to explore some more applications of verified inference and also try to scale this to different types of models, particularly transformers. Thanks, and I don't know if there's any time for questions. Thank you. I'm going to say there's time for one who's going to be. Oh, and what do you know? It's our next presenter. That's so smooth. I'll just hand you this mic and you can keep it. Wonderful. So one question. You mentioned a couple of optimizations you did. Can you reveal some optimizations that you think you could do in the future? That's a very hard question, since if there was an optimization we thought we could do, we generally just did it. One thing I didn't mention is we tried to reduce the number of lookup arguments by sharing them across layers. In Gen, this actually modifies the computation slightly. So there's a little trade off between the machine learning model accuracy and the proving speed.