 I think I'll change gears a little bit in terms of the presentations that we already heard. And in my presentation, I kind of will overview some techniques in advanced crypto and what has been happening with the efficiency of the solutions, somehow setting the stage for the fact that once we have this improved efficiency, maybe this is the time when we start thinking about the fact that these constructions will be used in practice and maybe this is also motivation for standardization in these areas. But of course let me define what advanced crypto will mean for us, or at least for this talk, advanced crypto will be everything beyond encryption and digital signatures, so beyond cryptography that thinks about protection of data at rest and communication. And in particular in this talk, I will touch on three areas of advanced cryptography which will include secure multi-party computation, differential privacy and zero knowledge. And these are the three of my choice because I think all three of them have been actually used in practical applications, including not only academic papers but companies looking at this and using them in practice. So then I have this graph creation of my own that for a while, research in cryptography has been going upwards and coming up with new functionalities, with new constructions and new capabilities, while in practice we've been stuck more or less using signatures and encryption. And I would say that maybe in the last couple of years we have seen an upward slope where also in practice we started implementing some of the more fancy functionality that comes from cryptography and my hope is that in the future these two slopes of research in crypto and use in practice will align. So the motivation of this talk for me came when I listened to the talk of Shai Halevi at CCS last year where he made the claim that advanced crypto is needed, fast enough to be useful and not generally usable yet. So when we talk about advanced cryptography there are many things that come into the mix. In particular we have to be thinking about trade-offs between efficiency and utility. We have to think about the specific set-up of parties that we are looking at, what are the assumptions about communication and computational channels, the available resources at each party and also try to put our finger on where the trade-off between efficiency and utility makes sense for our application. And some of the insights that I have in this talk also came from a workshop that we organized with Newrips last year on privacy preserving machine learning and we have another one of these with CCS this year. So the need for advanced crypto, I don't think there is much to pitch to this audience but basically our challenge here is that we have data which is arguably the most valuable resource nowadays and we want to use it, we want to analyze it, that's how many things work in practice. And then there are many challenges related to the privacy of this data and these advanced crypto techniques bring the promise that we can obtain utility from this private data without sacrificing privacy. So hence we are kind of striving for this functionality. And now since this is a standardization workshop, while I'm describing these solutions and their efficiency, kind of the main message is that many of these technologies are ready to start to be used in practice and standardization will definitely help adoption because companies like to have standards that they follow and they use some of these advanced techniques. These advanced cryptography notions have much more complexity notions so it's interesting and challenging how to convey those properties in a standard so that people who are not necessarily experts might be able to use them. Also there is a plethora of constructions and techniques for each of these primitives that give slightly different guarantees so when we standardize we have to think what exactly do we standardize of this multitude of techniques. And also these techniques have a wide range of applications and oftentimes to solve these real application problems we need to use more than one of these tools. So a question for standardization is whether standardization should be kind of driven or aware of the applications that we try to solve and that we are trying to solve first with this technique. So with this I will continue with my overview of kind of recent work and I need to do the disclaimer that I might have missed your work. This wasn't on purpose but this is just kind of an overview that gives you a flavor of what has been happening with efficiency of these techniques. And I will start with secure computation or privacy preserving computation where I will distinguish kind of two scenarios. The first scenario is the one where we have few parties that are participating in the computation and they have more or less equal computational resources and communication available among them and they are available for the whole execution of the protocol. The second scenario which I will refer to as federated learning is more of a setting where we have one central party that has a lot of computational resources and then many weak parties that interact with this main computational server. These parties cannot talk among each other, they can only talk to the server and then they are also very unreliable so some of these parties may disappear during the execution of the protocol so we would like to tolerate dropouts in this setting. So let's look what do we know in the first setting and I will start with a look at fully homomorphic encryption. We already heard about initiatives in this area. Of course, fully homomorphic encryption gives us the capability to take some data encrypted, give it to another party and then ask this party to compute any function we want on the encrypted data. So this construction has been known from 2009 and what do we know now in terms of efficiencies that of course the most expensive operation under FHE is multiplication and currently what we can do is we can compute a circuit of depth 20 so 20 consecutive multiplication in about 62 milliseconds. Also if we want to look at specific application which comes from this i-dash competition that was looking at logistic regression or training using logistic regression with 1500 patient records and 18 features one iteration of gradient descent was taking between 0.4 and 3.2 hours in this setting using FHE. So this is kind of where FHE, some of the recent FHE efficiency stands for another kind of application that has been looked a lot in practice is the question of private set intersection. Two parties have two sets and they want to find the intersection of their two inputs without revealing anything more about the data. There have been many, many works that look at these questions. So what do we know now in the setting of semi honest and malicious security is that in the setting of semi honest if we want to intersect sets of sizes 2 to the 20 we can do this in just a couple of seconds and if we want to do this with malicious security we have to do a couple of hours. And this functionality or extension of this functionality which is private intersection sum has been used also by Google for in the setting of some aggregate ad attribution. Another functionality that has been of great interest for people is private information retrieval where we have one database owner and we have a party that wants to submit queries and the goal is to respond to these queries without the database owner learning anything about this query. These type of constructions of private information retrieval usually rely on homomorphic encryption in either additive or fully homomorphic encryption. So one of the most recent work in this setting allows you to compute private information retrieval query again on a database of size 2 to the 22 in about 12 seconds. So this is in many settings maybe one other caveat is that here the sizes of the items in the database are 288 bytes. So efficiency of here also depends on the size of the entries. So these were kind of two or three examples of functionalities that have been in interest and have been used in practice. But of course we know that we can do general two-party computation where two parties with inputs x and y can evaluate any function that depends on their joint inputs. So this has been a problem of research of a lot of works in the last 10 years. And I'm talking only about implementations like MPC has been studied for over 30 years. These are two graphs that kind of give the slope of improvement in implementations for these two techniques in the setting of malicious security and semi-honest security. And we are looking at the question of securely evaluating AES. So one party has the key, the other party has the input they want to evaluate AES. So initially the semi-honest solutions 10 years ago were taking hours to evaluate this. Currently we can do the secure evaluation in under one millisecond. In the malicious setting we were going from months 10 years ago and today we can do this again in about 10 milliseconds. So kind of the message here is that we have seen this tremendous improvement in efficiency of two-party computation. And this kind of brings MPC in the realm of cryptographic techniques that could be usable in practical applications. And we should think about how we can make this more usable using standards. Of course, one of the most enticing applications of MPC is machine learning, where kind of my conclusion at least from the techniques that exist so far is that out of the box use of MPC techniques is not the most efficient. It's not the way to go if you want to use MPC for machine learning. If you want to combine MPC and machine learning you have to do effort in both directions in the sense that you have to look at your ML algorithms and you have to make them MPC friendly. Fixed point computation is what's efficient in MPC. Usually ML algorithms all rely on floating point computation. So changing the algorithms to be friendly for fixed point evaluations is a good thing to do. Non-linearity is expensive in MPC. Try to avoid the approximations that avoid non-linearity. Also, we have seen a lot of work trying to adapt MPC to the needs of machine learning computation. For example, taking advantage of the fact that we need only approximate final results in many ML applications. So we have seen MPC approaches that leverage this approximate correctness as well as FH construction that do the same. Also, the machine learning territory is a territory where we really can play with the trade-offs between efficiency and accuracy. Many of these regression-based algorithms that have many iterations allow you to do this kind of in a nice, seamless way. And one final point is that a lot of the machine learning wants to distinguish computation on sparse data. There are standards such as the sparse blasts that are focused on how to optimize algorithms when you're dealing with sparse data. So you could think of these also as something you want to take advantage in the context of MPC. So let me just give you a few examples of what has been done in MPC for machine learning. So one example is computing distributed linear regression. In this particular work, we were looking at vertically partitioned database. And we were experimenting with model iterative solutions based on fixed point gradient descent. So kind of one bottom line is that if you want to solve systems of linear equations of dimension 500 and you are happy with the approximate solution where CGD with only 20 iteration already gives you something very meaningful, you could do this in about a couple of hundred seconds. If we want to talk about solving the linear regression, if we have 500 records, 500,000 records, with 500 attributes, you can do this in about two hours. Of course, when we talk about machine learning, neural network computation is one of the favorite applications. So here we can distinguish between neural net inference where one party has the model and the other one has the input and you want to do the classification in an MPC manner. And the other one is how to do the training in secure computation where the inputs are partitioned between two parties. So here I'll list some of the most recent works and what they achieve. This is a work that looks at convolutional neural networks and this is the question of private prediction. And these experiments that they presented, so this solution combines MPC with FHA in an interesting and non-trivial manner. These are some evaluations using the MNIST dataset which is classifying digits, handwritten digits. And you can see that with different topologies for these convolutional neural networks, the runtime goes from a couple of milliseconds to about one second. And the main different factors, distinguishing factors, is really the communication which changes us. Of course, this MNIST dataset is considered like a toy example for real machine learning applications. This CFAR is a little bit further, so the evaluation with CFAR shows you that quickly your time goes up, but still it's about 13 seconds, which is acceptable in every sentence. And communication again goes up. Once the neural net topology starts to be more complicated, communication is the first thing that seems to go up. A more recent work looks at the same question, but they are trying to look at binarized neural networks and there are special types of neural networks that use only binary values, and then they show you kind of new results. One thing is that when you're doing these type of machine learning applications, you really need to look at the accuracy that you're getting out of your computation because without accuracy, this is really meaningless. So in this work, they managed to get very good accuracy for, again, the MNIST dataset, and they are improving on the runtime from the previous work. Now the runtime is approximately 0.1 to 2 seconds, and communication again is the main distinguishing factor when you are employing different topologies that allow you to go up in accuracy. Maybe the most recent work that I'm aware of in this setting is this work of potion to party neural network training and prediction. This work looks at binarized neural networks and in a nice way to leverage oblivious transfer for many of the computations in this context. So here they also look at several real datasets that include smartphone accelerometer and gyroscope sensor data, different datasets for predicting thyroid and cancer diseases, as well as some information for German credit card companies that tries to classify the transactions as bad or good from different clients. So here are their results. Maybe this is a little small, but they show that essentially if you want to run a neural network training, you will need several days if you are happy with smaller accuracy and if you really want to go close to the accuracy that's coming from the floating point, this will take you weeks if not months. But one kind of message here is that sometimes even in practice in the clear neural net training could take weeks for large datasets. So these numbers are not necessarily impossible in practice in view of what's happening with training in the clear. The prediction, they also show you how to do prediction that takes basically a few, usually a few seconds for smaller datasets and up to 40, 60 seconds for some of the larger datasets. Since I'm running low on time, I will just quickly say that we have also this work that looks at the sparsity and what can you do with MPC if the level of sparsity of your data is public and you want to optimize on this. And the main message is that you could get a lot of improvement. Many of these datasets are very sparse. You could have non-zero entries at the level of 1, 2, 3%, up to 10%, and you could significantly optimize your MPC for this set. So now let's go to the other setting for MPC, which was this federated learning which I mentioned where we have many parties interacting with one powerful server. So this is a setting that's really of interest to many of the companies. So Google has been having this secure aggregation protocol that allows you to accumulate gradient descents coming from smartphones in a manner that you reveal only the aggregate of these gradient updates. So the numbers that they have demonstrated show you that if you want to do vectors of size 100k and you have about 500 clients, you can do this in about 1200. And you can see that this scales more or less linearly with the number of clients and then on the other hand, if you want to fix the number of clients to about 500 and then you're increasing the dimension of the vector that you're aggregating, again, you have this linear scaling for the time and you can achieve this for the largest parameters about 500. So this is kind of ballpark runtime in terms of minutes to hours if you are running this fairly, if you don't demand real-time response but you're running this from time to time, it's still within the realm of usable efficiency. Another work that kind of looks at the same question but different architecture and it assumes that you could split your server in two or more parties that are non-colluding. Then you could apply new different techniques to do this. So this work which has been also implemented in the deployment of Firefox looks at this model and it shows you how to do the dimensional least square regression and basically this table tells you how much overhead of flow down the MPC introduces to the computation in plain text and you can see that depending on your regression dimension, this overhead goes from five times to about 12 times compared with the computation. So now I want to turn to the next primitive that I mentioned in the beginning which is differential privacy and now differential privacy asks a different question. So far we were talking about having multiple parties and computing on their input in a way that doesn't reveal anything more than the output of the question. Now equally important problem is asking the question how much actually the output of this computation and differential privacy is the technique that looks at this question and tries to give a meaningful measure of privacy for the evaluation especially when we are in a setting of computing over the inputs of many parties or databases of people, users, etc. So for differential privacy we have two main settings. One of them is called the central model which assumes that we have a trusted aggregator where everybody, all the clients send their inputs and this trusted aggregator computes answers to aggregate functionalities and then this is made public. The answers of this aggregate functionalities are made public but we are not protecting against the privacy of the individual inputs against this trusted aggregator. The other model in differential privacy which is called the local model wants to remove the trust in this aggregator and requires a mechanism that each party or each participant kind of adds noise to their data and the untrusted aggregator can compute this aggregate computation based on the inputs that they receive but without having to put trust in this aggregator. So you want to guarantee privacy of the individual input even with respect to this aggregator. And the notion of differential privacy requires that if you are doing this computation based on databases that differ by a single record the outputs of this computation are close and cannot be distinguished. So there has been a lot of work in the central model. We know how to do many, we have general mechanisms that achieve differential privacy in the central model. We have also many specialized methods for different functionalities of interest such as empirical risk minimization, stochastic gradient descent, Bayesian inference and many others. One could argue that really the second model, the local model is much more relevant in many of the practical applications and this is what companies like Google and Apple have been looking at especially when you are looking at the setting of collecting private statistics from your users. Of course this second model is the one that is much more challenging for coming up with good solutions. So one example that has been studied in the second model of local differential privacy is the question of heavy hitters or detecting if each of your clients is submitting for example different words or different values how do you detect the ones that occur most often. So one of the most recent works in this area gives you a way that essentially incurs error where root of N where N is the number of participants and this is kind of also the lower bound in this local model and still this new construction guarantees that you could get very good, fairly good accuracy between the true counts and the predicted counts when you are looking at the most often occurring element or the rank 10 or the rank 100. Also what this work achieves is that it has a much better hill so it gives you much better accuracy even when you go down to the most often counts. Another kind of functionality that you want to compute in this setting is frequency estimation and there the work of rapport from Google was the first solution that was implemented and people have been looking and comparing against it so this work from Usnex in 2017 introduces new hashing mechanisms that allow you to scale much more gracefully with the increase of epsilon which means essentially that once you start allowing more leakage in privacy you suddenly get much more utility because one of the biggest challenges in this setting is to get acceptable utilities. Local differential privacy usually gives you very good privacy but the challenges are related with the utility that you are getting in this model. So this work shows you how to do these trade-offs much more gracefully and also allows you to get much higher level of true positives so true positives is in a setting where you want to detect the counts that are above a certain threshold correctly and this work shows you how to get much more true positives. So there has been some direction of very interesting works that are called amplification of differential privacy so there is a general way to compose differential privacy with secure computation which essentially says that you can take a central model of differential privacy and turn it into a local model by using secure computation to implement the aggregator however in practice this often incurs quite inefficient solutions so what people have been looking at is this intermediate model which is called the shuffle model which assumes that your aggregator is untrusted but there is an assumption that there is a shuffle so all the inputs that are coming from the clients are first shuffled before sent to the untrusted aggregator. So how you implement the shuffle is a separate question but just to point out something is that in this model now you suddenly start to rely on other users honestly contributing their inputs if they don't do that then your privacy will be denied so this model kind of introduces further assumptions for the other users but allow you to get much better trade-offs between privacy and utility essentially what we know is that while the local model occurs square root of n error in terms of the number of parties now this shuffle model allows you to get only to logarithmic error and in particular this paradigm has been implemented using SGX for the shuffle and it has been demonstrated that you could do computation up to 10 million in about two hours and also another more recent work which will be presented here shows how to formalize further the shuffle model and get to much better bounds also extends the result for larger values of epsilon and kind of further gives you insight into this new model also people have been studied a way to combine this notion of secure aggregation with the differential privacy in a way that will allow you to distribute in a better way the addition of noise in the setting of differential privacy and one of the challenges there is that you would like to minimize communication that's going from the small devices up to the server so some of these works have been studying how to use different type of noises for differential privacy for example binomial noise that steel scales or converges to Gaussian noise when we have any client and essentially they have been studying how for different values of epsilon which is the level of privacy you could minimize your communication so this type of solutions also rely on techniques that are called quantization which essentially allow you or help you to round when you are using real numbers and this is the main point where this quantization doesn't usually interact very nicely with Gaussian noise but it interacts and composes quite nicely with binomial noise so this work has been looking at these questions and exploring how far we can reduce the communication the federated learning techniques so this brings me to the last type of functionality I wanted to discuss which is zero knowledge proofs and we already heard a lot about zero knowledge that we would like to have a way for example to prove that an encryption of some value is within some range and to produce a proof for this statement of course we can generalize this to any statement and we've had many solutions that are achieving this functionality why do we care about zero knowledge proofs? I think that's unnecessary to justify here of course blockchain applications have been tagged as one of the potential uses there are many more when you talk about machine learning maybe you care about proving that your machine learning computation was done correctly as well so you would like to produce a zero knowledge proof for your inference or for your training of course when we talk about zero knowledge protocols there are many measures that you need to consider this is prover efficiency, verifier efficiency, distinctness how long is your proof whether you need an interaction or non interaction for your construction and of course whether you need a trusted setup a trusted setup essentially says that you need to compute some PRS or some common reference string that depends on secret information and if this secret information is revealed then all soundness guarantees of your protocol discipline so often times as Ran mentioned in his protocol SNARKs or the succinct non interactive arguments of knowledge are of primary interest and then other solutions are not so much but of course there are settings in which they have applications as well so for SNARKs most of our understanding is that we require a trusted setup there are some works that are looking also at hybrid solutions that offer functionality that could function either as SNARK or have a way to back to back fold to a regular construction that don't require a trusted setup so here I will kind of give you what we know in terms of efficiency of SNARKs first so most of the implementation of the case SNARKs rely on this work on quadratic arithmetic program that are implemented and instantiated in different ways and this work from usenix last year looks at the way to distribute the prover in a SNARK construction so they are trying to parallelize the work of the prover and see how they can optimize the prover work so here what they managed to achieve is that the overhead for the prover for each gate of computation is about 10 microseconds and the verifier time in this zero-knowledge SNARKs usually the verifier is very efficient that's one of the appeal of this construction here the verification time is 2 milliseconds plus 0.5 microseconds per input for the zero-knowledge statement if we want to look at some of the application that they consider they can do matrix multiplication the prover time for matrix multiplication is 700 by 700 is 74 seconds or if you want to do linear regression this was one of the examples telling you in the context of machine learning if you want to do a linear regression on 20,000 points in dimension 500 they can generate the proof that the training was done correctly in 95 seconds so still in the setting of zero-knowledge with trusted setup there is another work that really aims at improving the prover time so most of the zero-knowledge SNARK constructions kind of require the prover to be quasi-linear in the size of the circuit that you are evaluating this work really brings the prover down to be linear in the size of the circuit that you are computing and they are trading it off with the size of the proof so now here the proof becomes logarithmic of the circuit and linear in the depth of the circuit this work is based on the GKR technique and they essentially show you how you can improve on some of the SNARK where the main improvement you should be looking for it in the prover they manage to get a little bit better than all the improvements in SNARK in terms of prover time and then we have the comparisons with verification and proof size where still the SNARK constructions this where their properties achieve better efficiency in terms of verification they also looked at some examples for image scaling where you are changing from height to low resolution by scaling your image and they show that you can do this for images of size 10 to the 6 pixels about the prover time is about 100 seconds then the verification is about 10 to 20 seconds and the proof size is also very small so if we want to move outside the realm of trusted setup so this trusted setup is a requirement that often kind of brings up issues when you have practical implementations like in particular who is going to generate this trusted setup for you so if you want to look at zero knowledge without trusted setup there has been a plethora of work in this setting the proof size starts to vary from logarithmic to linear in the circuit size the verifier work also varies from logarithmic to linear the prover's work is quasi-linear oftentimes polynomial in the size of the circuit so we have different constructions that are based on discrete log on MPC, on this IOP model and we can see often times the canonical example that people are evaluating with is computation of a Markov tree and most of the examples go to about 2 to the 8 leaves in the Markov tree and you can see that prover time in the setting again takes about 100 seconds depending on the different implementations the verification time the verification time again varies a lot across different implementation but some of them give you close couple of seconds and some of them really scale with the with the size of the prover so the message here is that we can handle these Markov proofs for these sizes of Markov trees with times that are fairly visible and one last work that I wanted to talk about is this work on ZK Starks that look at the different paradigm of computation which is attractive oracle proof and they also claim kind of new tradeoffs compared to the existing ZK Starks works as well as some of the works that don't rely on trust ZK Starks don't rely on trust but maybe most interesting is to look at one of their application which is DNA profile matching essentially you have a database of DNA profiles which goes to different sizes their actual experiments go up to the 32 database size of DNA profiles and then what you want to compute a proof for is that you have a DNA profile that is not included in the database so for this type of computation depending on the size of the database the prover for databases works in several hours which maybe is usable in settings where you want to prove once that you are not present and then verification time is quite efficient you can do this as then again the tradeoff goes into the size of the proof which scales kind of linearly so with this I think I'm out of time so I will wrap it up that we have lots of advanced crypto techniques that have achieved efficiency that is very close to what we can start using in practice using these techniques often requires to have an expert who knows how to use this technique so this kind of brings a challenge when we want to standardize how to go around this issue that we need this expert for using these techniques and what is the best way to standardize and offer some insights that maybe people who are not experts could use as well so there have been these standardization efforts that have been discussed and there are also many sources for MPC, for zero knowledge and for differential privacy and then I'll conclude with Sheikha Levis words that it's time to put these tools to you thank you I have a question to you so as far as I know there are no non-interactive zero-knowledge protocols in the plain model without setup that's Goldreich Orrin so there are no non-interactive zero-knowledge protocols without trusted setup they don't exist right so what I mean is that one should say that all these protocols are in the random work model there are non-interactive they claim to be zero-knowledge so there have to be some assumption on the model it's not just you but I think there are a bunch of papers here in the titles in this workshop let's say non-interactive zero-knowledge without trusted setup which is something that doesn't exist it doesn't matter there is a proof that without any trusted setup non-interactive zero-knowledge does not exist it's the basic way the unstructured string versus the structured one so maybe the term unstructured setup without untrusted setup is misleading it is a setup that's a fair point so in many of the settings where this non-trusted setup has been used is essentially assuming this unstructured reference string but for those, this is the classification that was used in my slides as well