 Okay, guten Abend meine Damen und Herren. Ich mache es wie bei der SBB, damit Sie sicher sind, dass Sie am richtigen Ort sind. Wir haben heute die dritte Vorlesung der heutige, die serienreihe der Einsteinlexus Mathematik gewidmet und Frau Tretter wird kurz jetzt noch einmal die Referentin einführen. Meine Damen und Herren, auf der behalf des Mathematik Instituts und der Universität von Bern, ich möchte euch wiederkommen, oder für das erste Mal, zu den Einstein-Lekturen 2019, und es ist mein Glück, uns auch wieder zu willkommen, unser Einstein-Lekturer, Prof. Schafi Goldwasser, einer der größten, 50 größte Influentialen, die ich über Computer-Scientisten von unserer Zeit bin. Es ist sicherlich ein bisschen für das Gespräch, aber seit ich wissen, dass einige von euch hier für das erste Mal hier sind, habe ich mir einfach durch die Impressive Nummer der Verwaltungen, die Schafi hat. Sie hat eigentlich zwei Professoren, Weitzmann und MIT, und seit 2018 ist sie die Direktorin der Siemens-Institut oder Siemens-Institut für die Theorie von Computing. Amon der Erwärmung, die sie erzielt hat, ist in diesem Monat, in der all diese Nobelpräzise erholt wurden, die ACM Touring-Award, oder AM Touring-Award, die sie in 2012 mit Silvio Meccali erhielt, weil das die Nobelpräzise in Computer-Science, die nicht existieren, als die Nobelpräzise zuerst erholt wurden, natürlich. Jetzt von den anderen Verwaltungen, Schafi hat viele Nationale Akademie erhielt. Sie war für eine Briefung von den US-Kongress auf Kryptografie erhielt. Sie hat viele honorary doctoral degrees erhielt, so viele, dass Webseite nicht mit ihnen halten. Ich habe heute noch einen, in addition to the one from her Alma Marta, Carnegie Mellon and the University of Oxford. I found another one in 2017 from a place I visited myself, Ben-Gurion University of the Negev, and there may be even more. So this is really, really impressive and we are very, very happy that you agreed to come here to Bern and to share with us your great enthusiasm and I would say revolutionary ideas about mathematics and computer science, which inspired many people, might be a shock for some too, but that's necessary to make progress. So we are very much looking forward to your third Einstein lecture, Schafi, please. Hello? Yeah. Okay, hi everyone and thank you for coming again or coming for the first time, as Christian said. So this is my third lecture and the topic is cryptography for safe machine learning. I gave two talks before. The first one was called I guess the cryptographic lens, where the focus was to give you some examples of how fairly basic ideas certainly were motivated by basic questions in cryptography, have made an impact on technology and on some scientific disciplines. And then yesterday I talked about something completely different. Today I'm going back in a sense to continue where I left off in the first lecture and that is to talk about another area, which is more a futuristic area, where I think that cryptography is a room to play. And that's what I call safe machine learning. So again, last time I talked about electronic commerce. I talked a little bit about quantum computing, but the fact that we have to prepare for an age where quantum computers may hurt the existing crypto systems. I haven't talked about cryptocurrencies. I think people know about Bitcoin and so forth. But I mentioned zero knowledge proofs, which are technology that can enable doing, having cryptocurrencies with anonymity. And talked a little bit about cloud computing, because remember we talked about delegating computation to the cloud or an interactive proof between a user and a cloud, where the cloud is doing the computation. And then it proves to the user that he did the computation correctly. And there's an asymmetry in terms of the computational power. The cloud may have lots of computing power. This user has a little bit and he still is able to verify correctness of these claims that the cloud makes about the correctness of the computation. But I really think that the next frontier or frontiers is how to enable safe machine learning. And of course in order to do that, I need to explain what I mean by safe machine learning. So first of all, in terms of what machine learning is, there's certainly a lot of articles people read, even in the popular press. So I just have a few also very general slides about that. So in terms of a field, machine learning lies somewhere in the intersection of artificial intelligence, statistics, and theoretical computer science. So it's a field that emerged from these three different fields, or at least people from these three different disciplines, ideas from these three different disciplines play a role in machine learning. And if I had the most general way to describe it, I would say it's a field that explores the construction of algorithms, that can learn from and make predictions on data without being explicitly programmed. So rather than the traditional way where you write a program and you might prove properties of this program like running time and correctness, here you are given data. And from the data somehow in some automated fashion, you come up with what is called, you learn something. The machine learns from the data to make predictions or other things. We'll see some examples in the future. And the idea is that you usually take this data and you build some model from sample inputs. We'll say a bit more about, be more concrete, but let me just say that there are many machine learning models. But regardless of what the model is, they usually are, or always are two stages. First stage, and I'll refer to the talk, is a training phase. So Phase one, what happens is you are given some training data. Training data may be examples, let's say example of a cat, that's an example that machine learning people like to use. Maybe there are a lot of cat lovers, or there's a competition between the cat lovers and dog lovers. In any case, training data might be a picture of a cat, and then it says that it's a cat. Or, that's what I mean by a label, that's labeled as a cat, or a picture of a dog, labeled as a dog, a picture of an airplane, labeled as an airplane, and so forth. It could be also data that's unlabeled, just there's lots of data that's available, and there might be some clusters, there's a way to cluster this data, even without labels. But there's always some training data of one form or another. And often you want to assume that it's drawn from an unknown distribution. So there is some underlying distribution, we don't necessarily know what the distribution is. And in the first phase, given this data, you want to generate some sort of hypothesis. Sometimes it's called a model. For example, in the case of labeled data, how is the labeling of the data done? And you test the quality of this hypothesis by comparing it against the data, to see if this model is a good predictor of the labels. So the data is large, obviously the model has to be sort of a small model that has to be consistent or a small hypothesis, small to describe hypothesis that has to be consistent with this data. So let's phase one, and we'll call it training phase. And the reason I have this error on the side is because often it's an iterative process. You get some training data, then you get more and more and more, and you modify your hypothesis till the time that it somehow converges or it stabilizes. And no matter how much more data you give, the model doesn't change, doesn't predict any better by looking at more data. And then people talk about accuracy of the model. It might not be 100% explaining the data, but it might explain it 90%, 99%. So this is the general idea. The second phase of machine learning is that whatever model that you developed in the first phase is used now for the future. So for what type of tasks? So here are an example of three tasks. It might be used to classify new data drawn from that distribution. So in other words, if your model for some reason, I think what happens is it keeps, the wireless keeps turning on. And that disturbs, so let me just try to see how to turn this off. Turn the Wi-Fi off. Okay. So, you've generated a model. And now, let's say this model classifies pictures into cats, dogs and airplanes. And now you want to come up with some new data and use the model to classify whether it was a picture of a cat or dog or airplane. That is one task, which is a classification task. Another task might be that you just want to generate new data similar to what you've seen. So in other words, you have seen lots of cats and dogs. Now you want to come up with a model that what he does is generates pictures of cats, not the cats you've seen, but new cats, things that would be seen by humans and classified as cats, generate pictures of dogs, generate pictures, maybe essays, maybe generating paintings. But it's a generative model rather than a classification model. Another thing might be that it just would explain the data. So it will tell you what's an average, if it's a distribution, what's the average distribution, what's the standard deviation, what are some of the moments. So these are typical tasks of a machine learning phase two. And we will call this the classification, you can call a generation or explanation. In this talk, for simplicity, I will focus just on training and classifying. So let's not talk about generation, but the same kind of treatment would be useful for also talking about generation or explanation. Okay, so let's be concrete. So I said that every machine learning model has these two phases. So let's take a particular example of what our problem could be. And it would be, let's say there is this box, this black box. The reason I'm calling it black box is because you can't look inside. But you are told or you believe that there is some sort of formula in this, in this box. This formula here uses ands and ors, doesn't matter, some formula. So I call this box C. So it's a formula C or concept C that takes variables x1, x2, x3 and then gives it label out. So these variables, if you like, could be pixels of an image. And then the answer would be cat, dog, airplane. Or other examples might be C could be abuse to answer, things like is an email message a spam. So you get an email message and then you have a determination go to spam box, don't go to spam box. It could be a picture, an MRI of a tumor, a suspect tumor and it would be malignant, non malignant. It could be, should a student be admitted to college, that would be an application. It could be should a bank loan be approved, should a suspect be released on bail. So this is a very generic description of what it is that we're trying to do. Again, so there would be a x1 through xn would be a vector of features. And usually we call them features of inputs and then there will be a determination. Now, this box works in mysterious ways. Okay, you assume there is such a box. Okay, you don't know, you don't have a way to look at the internals of this box. Obviously, it would be fantastic if you could love to learn this box. You would love to learn how it is that this magic box is able to make determinations of whether you should release someone on bail or not and then they would not be a flight risk. Or you would love to know how the right box should accept students to college so they would be successful and so forth. This amorphic box that may, that exists. But the question of course is how that seems quite hard to learn this perfect box. And maybe the more interesting question is what is it that you know about this box? What kind of query access do you have to this box? Are you able to feed, you know, pictures and get an answer? Are you able to feed a suspect description of a suspect record and get a perfect answer, whether you should release them on bail or not? Or do you just get random examples of what has happened in the past? So what kind of access you have to this box of course determines how easy it's going to be to learn it, okay, whether it be possible to learn it or not. Okay, so we're all in sort of vague, vague world, but that's good enough I think for our purposes. And so still, you know, as theoreticians or mathematicians, you would love to define this more exactly as a mathematical problem. So you want to define what it means to successfully learn, what's the query model, how can you access this box? And in 1984, there's this very famous paper by Leslie Valiant, which was a theory of the learnable. So he came up with essentially just a definition of what he means to learn this box. Okay, so it's a theorist's definition. And there's a lot of words, let me just read them. You don't need to read them. But he says he defines something called probabilistically an approximately correct learning. And he says, given examples, x and the value that the box gives on x, so labeled examples, drawn according to some unknown distribution, a successful learning algorithm generates a new hypothesis. So you start with c, you generate a new box h, if you like, a new hypothesis h, where h agrees with c approximately and with high probability on inputs from d. What does that mean? It means that if you look at the number of labels on which h and c agree, there is the probability that the labels disagree is small. So in other words, it's possible that you disagree, but you're requiring that this is going to happen with small, it's an unlikely event. So this is a definition. In some sense, this definition is just putting down mathematics for what I said in words in the previous slide. And then there was a lot of work, different type of concept classes, which you know if the formulas are simple, how easy or hard it is to learn, lots of papers. I think probably it was more optimistic to begin with. There will be a lot more things which will be easy to do, but it's actually a hard task for certain, even simple looking concepts. And now, what does cryptography have to do with it? So I just want to say that historically, going back to the 80s, there was actually a relationship between cryptography and this machine learning as defined by Les Valiant. Maybe because, you know, he was at Harvard and it's close to MIT. In MIT, there was Revesta Miran-Adelman, and Revesta at least, and they were friends and they sort of went back and forth. And the place that cryptography came in was really in some sense is saying, okay, it's very nice. What can we learn? What cryptography was used is to show what you cannot learn. So an immediate question, a complexity theorist or a cryptographer asks, is it really true that you couldn't learn everything? Are there concepts which are not learnable according to the definition of Valiant, which are not PAC learnable, probably unapproximately correct? And in fact, very soon after, once you asked that question, and Valiant and Kerns, who was at the time, a graduate student of him, showed that there's lots of concepts you cannot learn, that are impossible to learn, okay? And the sort of more interesting results of this nature are under the assumption that secure RSA encryption. So I talked about encryption in the first lecture, this problem of communicating between two parties without ever meeting public encryption. And what they showed is if, actually you don't even need it to be public key, that if encryption of the type that I talked about exists, then actually it's impossible to learn certain concepts. So what is good for one field, that is good cryptography, implies hard to learn concepts. And it's, once you think about it a little bit, it becomes kind of clear, because good cryptography means that the adversary cannot learn what you've sent. So there is some learning problem that's implicit in good cryptography. And by showing good cryptography, you show that it's impossible to learn for the learning theorist. Great. So in some sense we can summary it to say that lists for cryptos nightmare for machine learning. And interestingly, there were a lot of these results, showing more and more things that you couldn't do in machine learning using more and more sophisticated type of cryptography. And then there was a paper, actually, that came, I think it was 93, it appeared in a cryptography conference, which learning people wrote. And they said, you know, modern cryptography had considerable impact on the development of learning theory, the impact being showing the limits of what you could learn. And virtually every hardness result in valence model comes from origin, his heart in a cryptographic construction. So saying, you know what, in this paper we're going to give you results in the reverse direction. We're going to show you, what they did is they said, here's a problem that comes from learning theory. It's not a problem that you cryptographers think about. But it seems like it's hard for us. So it's not, you're giving us a hard problem. We are telling you, here's a natural problem that's hard for us. Maybe you can use it for cryptography. Okay. And in fact, that problem that they came up with is related to something I talked about in the first lecture. So the problem that came from learning theory was this problem about solving a system of equations. In the, from learning theory, there was this vector s, for people who remember from the first lecture. Otherwise, you can ignore this next two slides. I just wanted to make this connection. So there is some secret s, which is a vector s1 through sn. This time it's zero one. And one of the learning theories that people in learning encountered was, how do you solve a system of equations in these zero one variables, if instead of knowing the real solution, zero one, zero zero one, you actually took the right hand side and you flipped it zero to one or one to zero with some small probability. Suppose I didn't give you the equations exactly correctly, I flipped the answers on the right and I asked you to solve it now. That seems to be a very hard learning problem. And they came to cryptographers and said, hey, you can use it maybe for crypto. Maybe you can embed a crypto system here and get something out of it. And in fact, it happened to be the case, I'm just jumping to the next slide, that very problem that I showed you in the first lecture, where instead of looking at zero one, you're looking at numbers, and on the right hand side there isn't zero one, but there are sort of numbers and you're adding Gaussian noise, is a problem that now we're using cryptography all the time. So starting from people from learning theory, we had some natural question data couldn't solve, bringing it to the cryptographers. And then cryptographers changing in a little bit. We actually now have this learning with error problem, which I talked about in the first lecture, which we are using now as a candidate for coming up with encryption schemes, which are robust against quantum computers. So this is in some sense interesting, mostly for those who study the history of science, that the fact that these two fields talk to each other has helped in a strange way to come up with a problem that now as cryptographers we use all the time. It is a candidate to battle quantum computers. It has a lot of versatility. It allows what we call homomorphic encryption and so forth. Okay, now we move to really the topic of this lecture. So this was in the 80s. All this development of these hard problems and something that's hard for learning, is good for crypto was in the 80s. But today sort of in this last couple years and going forward, which is the main topic of this lecture, we're going to show that impossibility results are actually not negative news or nightmare for learning, but they are going to be very positive news for machine learning of today. So somehow whereas the past, they will sort of show good result for one field, it's a bad result for the other. Now we're going to show that good cryptography, interesting cryptography, is actually going to propel machine learning forward. So why and how. So the way I want to kind of proceed from here is explained to you, the cryptography is really necessary for machine learning. So first observation is, I said that there was the pack model of Levalient and it put beautifully clean but hard. The many concepts cannot be learned. Still we all know the machine learning is now hailed as the avenue for progress. And it seems that the difference is that we have a lot of data. The fact that there's huge amounts of data that's being collected worldwide has made it so that learning models that existed in the past, like neural nets and so forth, where people didn't know that they would be successful. In fact, I think people knew about this 30 years ago, but they weren't successful. All of a sudden with the advantage of having lots of training data, huge successes came about. And that's true for health, for finance, for economic growth, for infrastructure, for traffic patterns, for energy usage, for vision, for natural language understanding, for threat prediction, for policing, deciding which neighborhoods to police, for bail, deciding who's a flight risk, for credit rating. All of this, these are fields that people are using lots and lots of data, training data to build models, hypothesis, and then they get good accuracy with respect to this data and can use it for prediction in the future. Or they are using it for prediction. And in fact, some of these things are not just papers, but I think that in New Jersey, there is actually already in the courts, they were using algorithms to decide, like in Judges are using them, at least as aids, to help them decide whether to let someone out on bail or not. I think in New Orleans, there is a system that the police is using, when they have to decide where to send cars to which neighborhood, because they don't have enough police cars, so they want to use it wisely. And this is all based on all data of where crimes were committed, you know, who was a flight risk. But in any case, my point is, of this whole slide, is that all of a sudden, there's a sudden shift of power. So decisions that were made by, in different ways, are now made or at least aided by these automatic systems that are learning from data of the past to make predictions in the future. And in fact, if you look at sort of the largest companies by market cap, it used to be that in the 2000s, there was oil companies, energy companies, now they're all data companies. So if you think about sort of the market, it's Amazon, Apple, Microsoft, Facebook, and so forth. And there's all these quotes that you can see, where they say data is the new oil, data will become a currency, and so forth. So this is all very good and well. The thing is that this sudden shift of power that happened very quickly in the last couple of years in some sense, does have the risk of leaving us unprotected, unregulated, and having this power at the hands of entities, which weren't really mandated to hold all that power. And these are clear. But I'm not saying this in order to be a fearmonger, because there's always a risk and that the public says, oh, this is terrible, we shouldn't do it. I'm not mocking it, but somewhat, because that's sort of a reaction that's immediate. So we have to wait a minute. The reason that this data, the reason there is this rush is because there is a lot of power in data. And one should try to think of how to use it in a positive way. And it's not so great in the systems that we have today don't work perfectly either. So if we think about letting people out on bail, you know, many people, at least in the United States, are incarcerated sitting, waiting for someone to make the decision if they go out on bail or not. So there is something about having an efficient system that makes decisions that has some advantage to it. So I guess what I want to say is that the new task that is ahead of us where cryptography in my opinion could be of huge help is to ensure that this power or machine learning algorithms is not abused. And the kind of cryptography that I'll talk about, I'll mention during the talk, it's developed in the last 30, 40 years and it concerns, it doesn't concern communication. So often people who don't do cryptography think that cryptography really is about sending secret messages or logging into computers with passwords. And that was somewhat the focus of my first lecture. But really in the last 30 years or so, the focus of cryptography has been not on secure communication but on what we call secure computation. So we know how to encrypt, we know how to do passwords quite well. I mean, maybe it's not quantum resistance or we're switching to quantum resistance but we give the definitions good techniques. But secure computation is something else. The idea here and I will explain as I go along is that computations on data, on input, also there is some security aspect that you would want to guarantee as we will see when we come along. While you are processing the data, the data might be left unprotected and can you protect the data even while it's being processed? That's sort of a key concern when you talk about secure communication. Another concern is when you want to share data among several parties and you don't want to give away your data. And a lot of work since the mid-80s has been on secure computation rather than secure communication. And all of this work suits very well these problems in machine learning and can go a long way toward this safe machine learning where abuse is not done or at least it's not done to the extent that it could be done. The way that I want to proceed with this I will tell you some challenges some sort of immediate challenges that you need to kind of think about when you think about machine learning and tell you how cryptography for some of these challenges is very obviously useful. So the first challenge is this. If it just a straight clear deduction if the power of machine learning comes from data and the data is data about individuals or about then like how people drive what illnesses they have how what energy use they make and what relationships they have. If this is what enables the power then we want to make sure and we don't want this power to be abused we want to ensure privacy of both the data and the model that's being learned from the data so we can maintain this power at the hands of those who generated the data and also maybe it's not just a question of Privacy is also a question of value so in other words if my data why shouldn't I have the say in terms of who monetizes it and why shouldn't I be able to monetize it myself. So there are sort of three places that are sort of clear where privacy concerns come up during the training phase where you are taking data and using it to train the algorithm remember I said there was training and classifying during the classifying phase and even later after you've already have a model you might not want this model to be stolen by someone. Why? Where first of all maybe you've put a lot of effort into building the model but also sometimes you can actually just by looking at the model even though it's a succinct description of the data that you saw you can use it to reverse engineer and find the data that that model was trained on. So these are sort of three clear places where privacy can be lost and people are addressing how you can do machine learning how you can do training and classifying and using a model without losing this privacy. So in particular how are we going to do that? Well we're going to do that obviously new ideas come up all the time but there is 30, 40 years of cryptography and a lot of what's being done in the field now is sort of taking these toolbox that's out there and trying to see which tool fits which problem. So here there's a lot of tools I will talk about each one of them during the slides so just sort of a laundry list or Chinese menu so there's something called multi-party computation and this is a tool that enables different data centers that have different data to talk to each other and compute together some outcome that's defined as a result of all of the data without giving it to each other Ormomorphic encryption is something I talked about the first time the letters got confused here it's a way to encrypt data and then still while it's encrypted to do evaluation on it without decrypting it and then give back the decrypted result there's something called secret sharing which is kind of an interesting magical tool where you can take a piece of information that you really want to keep secret and instead of storing it somewhere because you're afraid that maybe somebody's going to break in to this place you store it you can sort of break it into pieces think about a puzzle where you break it into puzzle pieces you store each puzzle piece in a different computer and you do this mathematically of course with the guarantee that if unless they see most of the puzzle pieces you're not going to be able to put it together or find anything about the puzzle so this is a way to share a secret so that it attacks on individual pieces don't yield the entire secret there's another method that's useful in this field another type of method is something called garbled circuits and that is if I have a program to a way to somehow rewrite that program in kind of a garbled fashion so that even though you can run the program you can't tell really how the program works again one of these uses without looking inside and something called differential privacy which is yet another field which sort of tells you how to take data and modify it by adding noise to it so that you can't really recover what the original data is or even know whether a particular piece of data participated or not but you can still get statistical signal out of the data so you've added small enough noise so still it's true that if you have a lot of data each with small noise and overall you can learn something from the data but you haven't learned about particular data entries all right so these are going to be all the methods and now let's say specifically how one each such method I'm not going to use all of them is used all right so let's go back to this idea that we have training and classifying and here I wrote some schematic where shows who the what the stages are so there's a training phase and let's say this has happened this google's training or apple or some large server they have training data and then they work on it and they come up with a model so this is one phase and another phase is the classification phase where the model is already created and now you have in some sense a game between the server that has the model the prediction model say for cats and dogs and so forth and now there's a piece of data might be a particular picture or a particular MRI and what is happening during the classification phase this client gives the data this server comes with the model and at the end the prediction and lands at the hands of the client okay so this is kind of the schematic of what goes on and the privacy question are two separate privacy questions so one in the time of training often this data is sensitive and valuable so if we go back to the medical examples it could be immunization record medication allergies genomic data and this is something that maybe by regulation by desire you want to keep secret you might even want to keep secret from the company who like google who's building the training model but that's not the only entity you would want to keep secret from but also often this data comes from multiple sources it's not one place that's giving google the data there's lots of sources an example that people often talk about is on apple when you type text they kind of predict what the rest is it's extremely annoying because sometimes they predict it incorrectly but how do they come up with this prediction model so apparently they use a lot of typed queries requests of many users and they build a model of what is what is likely to be the next word in your sentence and these models are also personalized but they need data from many sources so what you want is to make sure that while you are going and giving this training data that these sources don't necessarily tell each other their data so there's you know we have to think about it but there's two problems here one is that you don't want necessarily the server to learn the data and two is that maybe you don't want to tell each other what your data is if there's many entities who combine data together so what's useful for it so for the problem of combining data that comes from different sources there is this secure multi-party computation with this picture and what we know is still even from the 80s that there's so sort of this universal theorem I'll call it which says that this trustful parties these people really may not trust each other at all to keep privacy can compute any program F on the data via distributed protocol when I say distributed protocol I'm saying they do computation locally they send messages to all the neighbors they get responses they do more computation they go in phases and at the end they say okay we have the answer and they at the end you reveal the result and nothing but the result there are some conditions on these whether this is possible you need encryption for example to exist sometimes you need depending on the setting you say maybe there's no encryption but I know that a majority of these people is honest and under this assumption that there's less than a majority bad guys you can compute the function without telling each other the data and in fact it's made so much headway that even there is a congressional bill in the US Congress which talks about this and it says the term secure multi-party computation means a computerized system that enables different participating entities in possession of a private sets of data to link and aggregate their data sets for exclusive purpose or performing a finite number of pre-approved computation without transferring or otherwise revealing any private data to each other or anyone else so that's in English really what the definition would be it's actually remarkably accurate and not only that you can define it as I said you can actually use mathematics to achieve it and I will of course tell you how but the basic idea is really from using sort of arithmetic on polynomials so here's one way to think about it suppose I have a polynomial and I can first of all the point is that I can take a program and compute it to it can take a program and transform it to the following I can think of the inputs of the program as variables and I can and then I can take those inputs let's say I'm I have data and I want to do a distributed protocol with you I don't want to give you my data so what I do is I define a polynomial where if there was enough interpolation points on this polynomial you could reconstruct my data but I give each one of you a different interpolation point if all of you get together you can reconstruct the polynomial figure out my data but if enough of you don't get together then my data is still all polynomials are likely and equally likely and you don't know what my data is and then you can think of the steps of the program as operations on summing or multiplying essentially polynomials and people can sort of do summing and multiplication of polynomials by working with their interpolation points rather than with the entire symbolic polynomial so this is just for the mathematicians in the audience just to give you a hint that there is actually something behind this that it is possible to do it so at the end the all the parties who proceed doing this kind of recipe will hold essentially interpolation points on a polynomial that if you could reconstruct that you would have the output of the result okay so let's go back to privacy to the general picture so this is during the time of training and this is how many parties can combine together their inputs and enable a server to learn a model in fact in I know that in Google there is a whole a division that works on something called federated learning and federated learning is essentially these multi-party computation with a new word and without using this type of mathematics but using a sort of more dumb down version because their requirements are not as strong as what you can achieve really in theory so the other part is what happens during the classification stage so now the model has been trained already you've got a succinct beautiful model that can predict cats and the thing is somebody has really their favorite cat or maybe their MRI and they don't want to share their MRI with necessarily the company that's built the model but they want to use the services so how could you do that how do you go through a classification stage where the server keeps his model private and the client keeps his data private so this is there are two two concerns here the server doesn't want to give the model the client doesn't want to give the data and you want to achieve both of them getting correct predictions and the idea here is that so again for an example might be a hospital that has lots of a great model and then there's a doctor who might be a rural doctor who wants to use these services of the hospital and send over the MRI results or the medical records and this is almost by definition what we call a two-party protocol there are two parties here the server the client and they want to do a two-party computation like in that picture with lots of data servers there's just two where they both have data if you can think of the model itself it's also data it's that describes the model and they go through a two-party computation and their results even from the 80s by Andrea on how to do this the thing is that it's not that we froze since 1986 sometimes you wish you froze since 1986 by any case there has been a lot of research since on this creature which I mentioned in the first lecture called homomorphic encryption which is this way started in 2008 by Gentry lots of other papers one of the best systems is by Brackersky Gentry in by Kontanathan and Brackersky and by Kontanathan students of mine once it's in Weizmann once it's at MIT now the faculty members where it's possible now to encrypt data in such a way that the on the other side here if you think about these is two parties this might be the hospital which I drew as a cloud and this might be the client which is the doctor with MRI it's possible for the doctor just to encrypt the data and the hospital to run the machine learning model which is just a program on that data without encrypting so what he returns then is the encrypted prediction of the model so this is right exactly what you wanted you encrypt the data and the other side runs the model on that encrypted data and returns you the encrypted prediction and since you know how to encrypt it's your key you know how to encrypt the data now you know how to decrypt the prediction and it's pretty remarkable that it can be done of course in practice it means that you have to look at particular type of programs that do prediction you have to look at particular ways of representing data you have to make sure that your encryption does this efficiently so there's a lot of data structures how to represent how to process how to batch these things you know there's a lot of questions of bandwidth and so forth but it's already an engineering problem not to make light of engineering problems without doing these engineering problems correctly nobody's ever gonna use this did you describe to me that Einstein had a patent or someone here on refrigerators so this idea of cooling existed but the idea of actually using it for something that we all use today is remarkable but apparently he sold the patent economically for him yeah okay maybe that's another requirement of basic research that you actually make zero money in that case I'm really hopefully that I because of the startup that I have that I won't share in that fate but so far anyway okay so I've been in huge development since 2008 in this field both in speed even in deployment you can do things called multi-key homomorphic encryption and that is it's not just one client could be lots of clients each one encrypting data in a different way and yet the cloud can compute program on all of these different encryptions and return a result encrypted result so that if they collaborate through a multi-party computation they can figure out the prediction okay so this is really there's a group here I know that works with Christian Kaisen that does cryptography so just for you anybody else wants to listen so there are these two technologies there's these multi-party computations there is this encrypted homomorphic encryption and these are two technologies to solve the same problem but I say technologies I mean two theories two bodies of papers that solve the same problem and they each have advantages and disadvantages one of them is more efficient computationally so it's faster but requires a lot of communication and one of them is less efficient a so high computation cost but efficient in terms of communication so the amount of bits that you have to send from the encryptor to the let's say from the data owner to the server and back is small so you have these two different technologies one of them uses sort of Boolean circuits is their native model one of them uses arithmetic computations this one is very linear algebra friendly so if you are working in the space of machine learning where a lot of linear algebra is used then it might be the thing of choice so this is just to say that this is kind of a whole field where you could debate which mathematics to use in order to solve this problem but the basic problem is the same all right okay so I want to say that when people talk about machine learning there's lots of models right you could even talk about least square regression or logistic regression or any kind of regression model that's machine learning but these days people talk about these deep neural nets okay so what's a deep neural net so this is a dog as you can tell a kind of angry dog or at least a dog that has to go to the dentist I have a toothache I think maybe that's why I chose it but in any case so what do these networks look like so these networks look like this this is an input so if it's a dog it's the pixels that describe this dog picture and then so these are the inputs and then there's this there are these phases they call deep because the number of levels as it grows it becomes deeper and deeper and what these what these what happens on these lines is that you essentially have some weights and you take a at least a random weights and you take some random linear combination of the inputs and at some point you have a result and what happens in these yellow circles is that there is some sort of activation function that tells you sort of if the result is bigger or smaller than some value then you decide let's say 8 is 0 or 1 usually it's real numbers but that come in and you do some calculation on them and then decide what's the what comes out of these yellow circles input to the next stage so this are just dumbing down to that what these operations really are they're linear combinations where the coefficients are random weights and then you get some value and you ask is this value big or small and so forth and this big or small is not just you know how you make sure that whether you've determined whether something is big or small is usually called an activation function and these are you know often logistic functions tanh function and so forth so these are you know you want some properties from these functions which is nice and the truth is that as far as I can tell people have experimented with different activation function and they saw that some of them are come up with networks that have better accuracy and prediction let's say cats and dogs some of them have less and this is how they've decided which activation function to use so this is what the model looks like okay this has no cryptography in it but what does it mean for cryptography for cryptography let's say in the time of classification what it would mean is this is that the dog is encrypted so these images is an encrypted is going to be given to this network is an encrypted image and so this is to protect the client the dog the one who has pictures of the dog but also the weights here or you know all the values on these wires are also encrypted in some sense because this is to protect the model and if you think about what I said before how does this whole thing come about using the methods from before with respect to this kind of computational model and have people tried to do it or is it just in on in paper so there are some two at this point there's a lot more work but the first two works as far as I know was one of them was by a group in Microsoft in 2016 which they use homomorphic encryption based on lattices in a paper called cryptonets and their idea was they said well to actually use cryptography as is on this problem is going to be too slow because cryptography usually uses you know bits and doesn't use real numbers and these neural nets they work with real numbers and accuracy is a real problem so what they're saying is let's convert real numbers sort of fixed precision real numbers to integers because this is what we know how to work with with encryption methods and then they say these activation functions you know those function that made the decision again those are functions which are hard to implement on encryption because my inputs are now encrypted so they said let's take a different function which also determines bigger or smaller than zero but it's not the ones that achieve the great accuracy with respect to machine learning but it will be good enough so my point being that there's sort of two new ideas and one is if you want to do it securely if you want to do it while the data being encrypted then maybe you're going to have to sacrifice some accuracy if you want to do it efficiently so there is some sacrifice let's say from 97% accuracy on a set of images you would go down to 90% accuracy if you want to provide privacy so there's some cost to it it's not necessarily inherent maybe it's possible to meet accuracy and still do encryption quick but right now the state of affairs is that you have to sacrifice so I was told I speak too fast all right so I'll slow down so trading accuracy is one thing in order to get privacy and the other big idea is I know or big challenge is maybe one when you're developing machine learning models as a machine learning person you should think about how to develop cryptographic friendly models so in other words to begin with as you're developing a model you say to yourself there are certain operations which are more friendly to cryptography in the sense that if I wanted to use it on encrypted data these operations would be much better can I build my machine learning models using these type of primitives and I you know I don't see in the outset why not but I don't there was a semester in Simon's Simon's summer semester on the theory of deep learning there were really people I think the biggest experts in the field and they were talking about all kinds of concerns in machine learning I gave a talk toward the end of the semester about this and surprisingly this is not something that is necessarily that they necessarily think about because you know we're all specialists right you want to do the best accuracy you think there's some combination of people who understand both fields it could be a great benefit in any case so there was a paper that used homomorphic encryption to do neural nets with privacy there was a paper that used multiparty computation called deep secure which used I think it's from visa maybe and they used maybe not I think so from a research lab and they used this multiparty computation a technology and in both of them they sort of did this trading accuracy for efficiency step but you can ask yourself when is it you know this seems very ad hoc-ish when is it can you sort of have it have a it almost engineering to tells you which crypto primitive to use for what problem and there's sort of a rule of thumb which says that if the computation is linear okay and the compute a this is the time and the circuit size is super linear then fully homomorphic encryption would be better than multiparty computation doesn't matter it's already kind of for the specialist so there are some reasons why you would use one versus the other and in fact there's a new paper out from 2018 which shows how to use both maybe you can sort of combine both methods and in fact that's what they show how to do they say there are these net there are these layers of network and their advice is to use fully homomorphic encryption on the linear layer and then to party computation on the nonlinear layer where these activation functions in any case so there is something to study here how to use these methods in combination how to change the methods to enable private machine learning okay now another thing is since this is always very new okay and there is the theory you know let's say that started with homomorphic encryption 2008 then people were writing papers and it's just 2018 it's 10 years then there is the machine learning that's progressing at the same time then you want to figure out which theory to use for machine learning how quickly can this move you know how much time does this take so the answer really is that actually I wanted to have this slide before the next the answer kind of really rapidly changes so whereas a year ago I would say there's you know it starts with feasibility which is sort of that there is a theoretical paper then there are papers who talk about the concrete efficiency so they write down you know exactly how much it would take and then there is actually building it which apparently is much different than writing a paper about it and there's a lot of proofs of concept so a proof of concept could be graduate students writing a system could be a lab a research laboratory coming up with a system and publishing as a part of a paper in a conference giving some graphs but if you really want to deploy it in a real world in applications it's more than a proof of concept and in fact even that is happening so I think today you know last five years there's also libraries sort of libraries that you can go online and find out how to use a monomorphic encryption and there's more than one library the first one was a library developed by people in IBM now then there's this a library developed by some people in Korea which is very interesting because what they've done is that they have really taken this idea of sacrificing accuracy for efficiency sort of much bigger much bigger step than others and they've used their knowledge of analysis even though I think a lot more can be done in order to be able to come up with approximations of these activation functions which are friendly to cryptography and then there's a library that we in our startup use is dualisade Microsoft is very active so these days there's actually a lot of work out there in the systems world trying to build these things because the recognition is that you want to do machine learning and you have to keep privacy okay so now I told you about privacy during classification training I didn't talk about model stealing you know that would be if you invite me for some other lectures if you can next timer but instead I want to continue with the challenges so this was the challenge of privacy how to do machine learning training classification while keeping privacy but privacy is not the only concern so what are other concerns so another concern is okay now let's say that you've trained the model keeping privacy that you know how to classify keeping privacy what about tampering so what do I mean by that you know if these models are so important you know that they were gonna decide bail they're gonna decide if you're gonna get credit school credit for bank they're gonna decide if you are going to get a medical treatment or not because whether the tumor is malignant or not malignant okay then there is also big potential for tampering maybe for introducing for control for profit for bias who knows okay I mean in some sense it's a big tool and it's out there then you could imagine there also be bad actors so what would you like to make sure is to come up with methods to minimize the influence let's say of training data then someone who may have provided just having in mind with the idea that they want in the future to make profit and there is work in optimization theory how to make sure that the training date to minimize risk of the training data from cryptography you know a thing you might want to do is let's say now that a model has been developed and the company that you hired to develop this model the hospital claims this model is consistent with their data is there any way to check it when it goes out into the world can you actually prove to the client that this model is consistent with the data that exists and can you do it to a client that doesn't possess the same amount of computing power you have doesn't have access to data like you have maybe can only you sample random data maybe has access to lower quality data so if we go back to this idea of proofs you know where there's a powerful entity which might be have access to a lot of data a lot of computation there's a weak entity who wants to check that this model actually is consistent and when we say check I mean if it isn't they will catch with high probability so it's again the same kind of paradigm that I talked about yesterday and the day before where you have an interaction possibly what's different here is that the input now might be data samples it's not you know just a graph or a formula but it might be samples of a distribution and you want to prove consistency of a model let's say with data and finally the last thing for systems people obviously you need to develop secure infrastructures to run these models because if they're going to run on insecure infrastructures then all this mathematics is very nice but then something completely different can happen so there are new methods that have been developed there's a field called robust statistics that deals with this minimizing the influence of maliciously chosen data so saying statistics assume you can sample random samples from a distribution but what if some of the samples are corrupted how do you come up with robust statistics methods that are robust to it there is so it's not a new field but it's I think there's novel results when you talk about data of higher dimension so if you think about it if you have a bunch of data item and you want to compute the median of it the median is very sensitive to somebody changing data right but then you could take the mean it's less sensitive but if you're doing this for higher dimension data this is a harder problem and there's a lot of new methods for it and then there's also new interactive proofs for the statistical analysis which is exactly what I said the verifiers might have just few samples and they still want to check consistency of the models and sometimes these verifiers this is some work I'm engaged in with my students sometimes these verifiers are very simple and you could even imagine that they would be would be a way to check models using hardware anyway this is sort of futuristic research or current research what's another challenge so I think this also is linked to GDPR which is comes from Europe so suppose a large company is supposed to not use your data is supposed to sort of take your data and they can use it but they are supposed to make sure that nobody can trace the fact that they use your data so that's part of the GDPR requirements so the requirements are stated in English it's kind of hard to understand exactly what mathematically they require but you could try to define it but in any case what if they don't follow how they claim they follow how can you tell do you need a major fiasco you know in order to be able to tell so this is a very interesting question how can you trace the unauthorized use of your data and model so this is just this rule in California which is a similar rule maybe even stronger than GDPR it's called digital privacy law granting consumers more control over and insight into the spread of their personal information online so for example I think that this law says also that you have the right to be forgotten so say you gave your data and now you say no I don't want to I want to to erase it I want to take it back that you have a right to do so it's a very interesting question how do you do it for one thing what if this data has been given to other agencies well there are methods that could guarantee it but of course they would slow everything down still they exist and what interested me here in this slide is how do you check they did it suppose they claim they use the methods that exist how do you actually verify that so this is what I want here with this tracing I want to develop methods the trace training data used for learning model unless you use some privacy preserving mechanism so you sort of want an if and only if if they use the privacy preserving method then you cannot find people's information but if they didn't then you can trace it you can sort of catch them so you want sort of to embed some kind of way to catch someone if they don't use privacy preserving again this is work in progress but the goal I think should be clear you know how do you embed some kind of thing into your training data so that later if people did not anonymize your data or make it noisy you could by using let's say the model in a clever way rediscover this thing you have embedded and this is kind of a proof that they didn't do what they were supposed to do sometimes it's called watermarking you know you could sort of watermark possibly the data so that later you could find watermark and therefore they didn't process the data the way they should have what's another challenge another challenge ist a lot of these systems especially these neural nets with the angry dog in general learning machine learning they use randomness the reason they use randomness is because in some sense you know they say there is we're trying to find some global minimum or the best machine model and we're somewhat groping in the dark we'll start from random points then we use something called gradient descent whatever and we'll converge to the to the right solution randomness is very important so the question is where is this randomness coming from it seems to be key but the assumption is you use perfect randomness again what type of randomness is good enough this randomness have to be secret does it have to be unpredictable we have to make sure that it's really non manipulative how is it generated so these are questions in some sense I understood that the last talk can be more futuristic these are kind of open questions but they're not open questions of the sort that you can't solve them you know if you're young and you've got the energy and you're hungry I think these are the type of problems that are beautiful problems that can be approached I don't mean hungry for food okay what's another challenge the other challenge I said already define specialized cryptographic functions which are sort of machine learning complete so if you could do them quickly then you could solve machine learning problems quickly with privacy and I decided that today I'm gonna finish early in contrast with the previous times so I think that one should think about these machine learning challenges as opportunities the opportunities for using essentially what we've developed for 30 years there are opportunities for things that we did them we developed them in theory really because of kind of basic interest to use them in practice and for developing new theory I think both for cryptography and for machine learning and my last thing is to say is that I am the director of the Simons Institute at Berkeley and we are dedicating a lot of effort to this direction both the machine learning and to privacy is a sort of different threads we have three semesters this already happened data privacy this one is coming is happening right now is really have to do with blockchains and this is next semester it's gonna be how to use integer lattices to come up with applications cryptography and algorithms to solve lattice problems so those of you are interested they should should write to me or look at the website and come to the workshops thank you