 It is my great honor to introduce Shafi Goldwasser. As her image here shows, she is currently the director of the Simons Institute at UC Berkeley. That's the top one. She's a professor at MIT and a professor at Weizmann. But Shafi's greatness is not in the places that she is, but rather in how she's totally shaped our field as we know it with such amazing results. But this is only a limited list of things, zero knowledge proofs, information theoretic multi-party computations, semantic security, PCPs, multi-provers. And the list just goes on and on and on and on. She is the 2012 Turing Award winner. She has two Gettle prizes. This is very rare. There are people who have one, but two is really very unique. She's the ACM Grace Hopper winner, RSA award. She's a fellow of the American Academy of Arts and Sciences, the National Academy of Sciences. She's an ACM fellow, and she's also a fellow of our organization of the ICR, which we're happy. In 1980, my mother went to a pool and met another woman in that pool. And that woman told my mother that she has a daughter who is a graduate student at Berkeley and is very talented. My mother listened. But the truth is, this was Shafi's mother, of course. But the truth is, I think that even Shafi's mother underestimated this very unique, extraordinary, and unlimited talent that Shafi has. And really, couldn't even imagine how Shafi would go on to change the world. So please, help me honor Shafi Gulbasa. Thank you, Tal, for the introduction. I have to say that my mother underestimated, but I also had a father. And he was always overestimated. So I think maybe on average, maybe they got it right. But in any case, they're both gone, but they were quite something. So this is the second time that I'm supposed to give this talk. And some of you know, but this time I'm actually going to give it. And in the meantime, I managed to change affiliation or add an affiliation. So the title of my talk is Cryptography and Machine Learning, What Else? So there's two ways to interpret this, What Else? One of them is that, what else do people talk about these days, except for machine learning? But what I mean is, what else is there to do in addition to what people in the cryptography community are doing when machine learning is concerned? And I've learned a tremendous amount, preparing for this talk. I'm not an expert in machine learning. I'm an expert in cryptography. But if you learn even a fraction of what I've learned in preparation, then I think I would do a good job. But be aware that this is work in progress. So I just want to say that I actually attended the first crypto, which was in 1981. And there's a subset of people here who were there as well. Not a large subset, but some. And it was extremely exciting and extremely informal. So by informal, I mean both in the interactions between people, but also in the type of work that people were doing. They had great ideas, and they were presenting them. There weren't definitions yet, or in a sense, what people agreed on, what should be the definition. But it was incredibly fruitful and creative. And sometimes people like to say it was when it was art, and then later maybe turn more into a science form. And the reason I'm mentioning that is because I think we're in a similar place in some areas of machine learning today. So I just want to say that just to illustrate the fact that really crypto has gone a long way from being informal to being a part of theoretical computer science, that in the Simons Institute, which is for the theory of computation, it's an institute dedicated for theory of computation, there are going to be these three programs, one on essentially data privacy. Maybe you can think of differential privacy when you think about data privacy, but there might be other definitions. And other semester is going to be about blockchains. And then a third semester is going to be about lattices, fluid-momorphic encryption, other applications of hard problems on lattices. So it's become really a part of theoretical computer science to such an extent that there's sort of a three-semester continuation exploring this as part of theoretical computer science. But not only, and in fact, probably this audience, to some large extent, knows all these consequences of cryptography on the world, right? Electronic commerce with RSA, cryptocurrency, which are all the rage, even quantum computing in some sense was really jump-started because of Peter Schor's factoring algorithm on a quantum computer when he was sort of trying to break RSA. And that was just so fascinating and illustrated the strength and power of quantum computers if they were built. And it's also become a part of theoretical computer science, sort of ideas from cryptography have led to probabilistically checkable proofs, which essentially gave NP, the class NP, a new definition, sort of random number generation. So the impact of sort of cryptographic research has been incredible on the world, on the world of theory. And I guess my point in this slide is that what is sort of the next frontier, in my opinion, or next frontier, is really developing cryptography which is friendly for machine learning. And that's kind of gonna be the thrust of my talk. And I think there's, which is gonna be my ending slide, but I'll say it now, for those of you who are just gonna taper off, is that there are two opportunities here. One, which people are already doing, and that is using sort of cryptographic machinery that's been developed for many, many years in the space of machine learning. But the other one probably is developing theory, a new theory for cryptography, which maybe is motivated by machine learning applications. So the outline of the talk is, first of all, there's gonna be some historical part about the connection between machine learning and cryptography which does not start today, but starts in the mid-80s. Then I'll talk about this field which I'll call safe machine learning, that you could call another name, but you'll see why safe is the word I use when I get to it. And that is sort of the challenges that are present today and the opportunities that present for cryptography. And finally, sort of sampling of what is already done about these challenges, which is no small thing. In some sense, it would have been better to give the talk last year, because there are about 15 new papers which I kind of had to at least skim about in this area, within a year, that is. Or should I say that Vinod had to skim them and explain them to me, that to be very disclosure. So in any case, for those of you who need an introduction, I have a very short one, and that is machine learning is, there's artificial intelligence, machine learning is somewhere in the intersection of artificial intelligence, statistics, and theoretical computer science, the way I see it. And if I had to give just like an explanation to my mother or my father or other non-technical literate people, I would say that essentially it explores how to come up with algorithms which can essentially learn and make predictions on data without explicitly programming these algorithms. So without writing the algorithm, somehow the algorithm will emerge from the data and then can be used to make predictions on future data that we may encounter. Okay, so essentially there are lots of machine learning models. We'll talk about a few of them here in this talk. But regardless of how you look at it, it seems like they all have two phases. Phase one is called usually the learning or training. So I'm gonna use those terms again and again, learning or training. And what happens then is you're given some samples of input. So, and you call this training data. These are input which are usually labeled. So there's sort of some notion like if it's a picture, that there's a cat in it or a dog or a giraffe or if it's a bank loan, if it's a feature vector that describes an applicant to a bank for a house loan, then it might be labeled by let's grant a loan or not grant a loan, okay? Or it might be labeled by I'm gonna grant a loan at a certain level and so forth, okay? So there's some notion of an input which describes, which might usually be a vector of features and then there's a label, sort of a decision about that input. And it's drawn from some unknown distribution really. So let's explicitly call this distribution D. This D is gonna come up again and again. And what the machine learning should do in the first phase is generate some sort of hypothesis about this data that has been seen. Sometimes it's called hypothesis. Sometimes it's called a model. These days it's called a model. In the 80s we call it an hypothesis. And what do you do with this hypothesis? So what you do with this hypothesis is phase two and you may do different things. So I enlisted three things here. You may essentially use this hypothesis to next time when you get a new piece of data, not part of the training data, you will assign, it's like a feature vector again in the examples that we talked about, a picture that may have a dog or a cat or a giraffe, a feature of someone applying for a loan and you're gonna use the hypothesis from the first phase to just make a decision on this new data. Another thing you could do is actually not worry about not being interested in classification. Maybe you wanna just generate new data which is similar to the data that you've seen in the past. There's something about this distribution. You don't know exactly. You would like to come up with maybe distribution D prime which is hard to distinguish from D, okay. And a third thing you may wanna do is just explain the data. So you've seen all this data. You don't necessarily wanna predict new data or generate new data, but you wanna understand it. So if it's a distribution, let's think of it, let's say it's a geometric distribution, you wanna know sort of what the parameters are. It's Bernoulli, you wanna know what the probability of flipping the coin is and so forth. So this is the way I see that there are these three tasks that people are working on, classification, generation and explanation, okay. And no matter which model we're talking about, these are sort of the two phases. So I'm not trying to put you to sleep. I know most people know this, but I'm just trying to put you in a good mood so in any case, so let's be a little bit more concrete. So say there is this black box and in it there's a formula, okay. Let's say it's a DNF. So it looks like a formula in three variables and it's an or of n's, okay. And this particular formula, and you know that you could feed in the x1, x2, x3 and output would come in output, but you don't know exactly what the formula looks like. This formula is extremely expressive. You could say that it could be x1, x2, x3 could be pixels or something that's being analyzed by a doctor to tell you whether a picture of a tumor is malignant or not. It could be a bank loan which would be approved. If a suspect should be released on bail or whether an email is spam or not, okay. So this is extremely expressive. So obviously being that it's so expressive wouldn't it be wonderful that we could learn how we could learn C. It's hidden in a box. I don't know exactly what they do with x1, x2, x3 so that at the end they can give me these correct answers. So I'd love to learn how C works, but that might be pretty hard. So how hard is it? So in order to talk about this even theoretically and formally, we need to really define what does it mean to successfully learn something, to successfully learn this box. And what information is actually made available to me about this hidden C? Is it just that it's locked in a box and I see examples, can ask it questions, which is called membership queries. Maybe I can even see some information would leak about it. Maybe I can specify some of the variables and see the entire concept. There's many different types of access you may think of. And in 84 there's a fundamental paper by Les Valiant called The Theory of the Learnable where he introduced a formal definition of what he means to successfully learn. This definition incorporates complexity theory so he talks concretely about complexity theory and probabilities, okay. And this is a beautiful kind of anchoring place to look at in terms of how would you go about defining success and failure and so forth. So this is the definition and it essentially says given these training examples, according, drawn according to some unknown distribution, he would say that a learning algorithm was successful, okay, if it approximately and probabilistically, sorry, if he generated a hypothesis which agrees with the real concept approximately with high probability. So there's sort of double hedging here. He says, you know, I only want with high probability and you know, I don't want it to be exact. It just has to agree most of the time and he specifies the parameters epsilon and delta. Great, and efficient means polynomial time, polynomial in what? In n, the length of the input and the size of the description of the concept. So he allows us time to run in the size of the description of the concept. Great, so that's a definition. And in the original paper, I think violence is very optimistic, you know. So he first of all, he shows a bunch of positive results, you know, for monotone DNF formulas, some other classes, fairly small, simple classes. And he kind of, at least if you read it today, it seems like he's optimistic. Optimism is good, yeah. So in any case, as far as DNF, which he already defines as an open problem in his paper, it took many, many years for some progress to be done in, I summarize here what has, you know, it's NP-hard in the general case. If you are requiring that, not only did I come up with a description, an H, an hypothesis that describes what happens in the box, but this hypothesis is also the DNF. But if you say, you know what, I don't care that it's a DNF, it could be another thing. Some other algorithm expressed in some other model, then we can make some headway and essentially the best algorithm known, speaking of headway, is two to the cubic root of N. But if you furthermore restrict the distribution to be a distribution like the uniform distribution, you can do a little bit better, enter the power log in, and if you further say that the distribution is uniform and you're allowed to ask questions, you're allowed to ask membership queries, then Jackson in 94, about 10 years later, showed how to do this in polynomial time. So here we have one type of class, DNFs, okay, and this is the progress that's been done over the years and there's an extensive effort still for people to understand this problem. So all this talks about machine learning. We are in crypto, 2018, not 17, okay. So where does the history of cryptography in machine learning start? So it really starts fairly soon after Valiant comes up with this paper. And because he asked a question whether it was in the paper or it was informally a question that he asked Ron, that's why I have here. There's Harvard, where Valiant was sitting and there's MIT, where Ron was sitting and there was Michael Kearns who was walking between the two places. I think he was a graduate student here and he went to do a postdoc with Ron at MIT. So the question came up in, can we, okay, DNF maybe we can learn, other models we can learn, can we show that there's something we cannot learn? So there is something in a box that has a polynomial representation, maybe even a simple representation, but we probably cannot learn it because it's complexity theorists, we kind of wanna know the bounds of where we're working on. And in fact, there is a series of results. The first one is by Valiant and Kearns, which show that if RSA is secure, so factoring is hard, also is another candidate, then yes, there is some concept that cannot be pack-learnable, okay? And the proof is if you think simple, essentially what are the examples? It's an RSA encryption with the modules and the exponent and the labels, maybe the least significant bit of the X that you see X to the e mod n. And we know that this is as hard as essentially breaking RSA. So here we have a model which if RSA is hard cannot be pack-learnable, but what about if we could ask membership queries? Then there is sort of a formal treatment, even though informally sort of people understood this by Pitt and Wormith where they say, you know what, if you have a pseudo-random function, okay, based on whatever assumption that you do, then you can ask questions. And we know that even though you can ask questions, you cannot compute the label or the value of a pseudo-random function on another X. So that's another kind of immediate result you get. And what does it mean? Well, the more efficient your implementation of the pseudo-random function, the more it implies that there are lower complexity levels where this maybe this thing hidden in a box can be computed extremely quickly. And still, this doesn't mean that you can necessarily learn it because this shows that there are such concepts which you cannot. And then there's lots of other results, some of them I was involved in, which essentially show that the more we, let's say we think of pseudo-random function and we make it constrained like the last talk was talking about, that is you make it so that you have a key that allows you to compute the functional sum inputs but not on other inputs. In a sense, you can think of this as, I even give you a partial description of the model that's hidden in a box, okay? And yet, you cannot learn the model on any, on other values, okay? So in any case, there's, in some sense, every time that we have a cryptographic construction and the more powerful we let the adversary be and we still show security, in some sense, it's like saying, let the machine learning algorithm have more and more power, still they can't do it. Okay, so the thing is, so far I just talked about the difficulty of classification, right? Because you have some examples, you wanna give it a label of plus or minus, what about generation? So those of you who maybe have seen, you know, there are these beautiful papers coming out of Google Brain or whatever, where they say, we can come up with, we see lots of cats, now we're gonna come up with lots of artificial cats. That is, we generate new cats. Not really new cats, but pictures of new cats or new dogs or new giraffes or maybe successful college essays or CVs that will get you a job or slides for a keynote talk, that would be good. Or plays by Shakespeare. You know, maybe we could just see a lot of examples and now we're gonna generate something that looks indistinguishable. So this sort of idea has been defined actually before this current sort of success with cats and so forth. By Kearns, Mansour, Rubinfeld, Ron, Sally and Shapiri. And there are, in fact, they define it formally. There's a distribution in a box. You can get samples, you can press a button, get samples of cats if you like, if you think of cats. I don't know, there's obsession of cats. I don't even like cats. Let's say dogs. You can get lots of dogs. And now you'd like to come up with your own small algorithm which generates things which cannot be distinguished between the original distribution and your distribution. And of course there's a question of how you define that you cannot distinguish, okay? Is it your model that cannot distinguish? Is it any formal time algorithm that cannot distinguish? I will leave that aside. But even that, you know, we could show, in fact, there's some paper by Naor which is more sophisticated than what I'm saying is, but essentially saying that if there are digital signatures which are secure against chosen message attack, then there exists a family distribution which are hard to generate, okay? But even though you can actually, in some sense, recognize the distribution as valid. So there's some notion of evaluating the distribution is correct or incorrect, even evaluating the probability that something comes up and he shows a special signature scheme which violates the ability to do it for this particular concept defined by the signature scheme. All right, so far it looks like what's cryptography done for machine learning? It's told them what they cannot do, okay? Well, something. And in fact, machine learning has sort of returned the favor in spades. In a paper in 93 by Lipton, Blum, and Blanky. Lipton, Blum, two others, they introduced the problem called learning parity with noise. So they actually write explicitly in their paper, if you look it up, they say, you know, modern cryptography has had considerable impact on the development of computational learning theory. Virtually every intractability result in valence model comes from some cryptographic hardness. So now they're saying we're gonna give back because what we're gonna do is we're gonna give results in the reverse direction by showing how to construct cryptography from some assumptions on the difficulty of learning. Okay, so they're saying here's a problem that we have focused on in the learning theory, okay? So natural for learning people to look at. And we claim that you can use that as a core for building cryptosystems. So what's the problem? The problem is this. You've got a bunch of equations in variables. Sorry, they're all s's, secrets, or x's. Well, they should be either all s's or all x's. But I think, depending on when I was working on the talk, I changed my mind. In any case, there are these secret variables, s1 through sn. We don't know what they are. They're binary, it's a binary vector. We've got some equations in these variables. The thing is, if we really did have equations, we could do Gaussian elimination and find these secrets. But we don't have it. We have noise equation. What does that mean? It means that we take the value of the equation and we flip it with some probability row. And that's what you see. You see the coefficients, you see the answers, noisy answers here, and you don't see the variables. That's your goal to come up with s1 through sn. So here's a problem, beautifully clearly stated problem. Of course, it becomes easier or harder depending on this noise. So the closer it is to a half, the harder it is. But they define it through constant probability, let's say with probability 1 third or something like that. And it shows some cryptographic constructions based on them, which I'll talk about in a minute. But what do we know about this problem, really? There's RSA, let's say it's based on factoring, it's been around for a long time. What do we know about this problem? What we know is that since 93, the best algorithm known runs in time two to the order n over log n, essentially exponential. And this was a breakthrough because exponential is trivial. And it took many, many years. This is work by Rakhersky, Lubiansky, by Nathan and Wicks to show more cryptographic-like properties of this function. And what they showed is sort of worst case to average reduction, but with very, very high noise. So this was not really a constant noise, but pretty close to one half. But it's just saying when the problem is really hard, we can show worst case to average, which worst case problem, actually not such a hard problem. So in some sense, you know, the noise that makes it a very hard problem and the equivalence is to an easy worst case problem, but it's something, because it's taken a very long time to show anything. Okay. Now, the biggest real revolutionary implication of introducing this learning parity with noise comes with this work of Regev in 2005. Because Regev says, you know what? Forget about P equal to two of working over a billion secrets. Let's go and work on a field in ZQN here. So everything is mod Q. We take all our integers and all our variables of MZQ, our coefficients of MZQ, and the noise is no longer flipping because we have an element in ZQ. So let's add Gaussian noise. It's some sort of bounded intervals, small Gaussian noise, okay? And he says, you know what? This problem generalized the previous one, right? Because Q just went from two to something larger. Boolean went to Gaussian. What about this problem? Well, it's certainly not any easier, okay? The thing is that now we can start working. So what Regev shows in his original paper is that this problem really is as hard as some hard lattice approximation problem, not an easy one, and that this is actually also hard on the average. This is dovetailing on work by ITI, which showed worst case to average reductions. So we know now that we have a problem which we can actually prove something about. So it's as hard as a problem that's been studied in geometry. You know, it's finding a short vector in integer lattice, actually in an approximate sense, but still it's kind of something one can bite their teeth into. But teeth depends on the dentist, I guess. But in any case, it's still the best known algorithm is two to the, you know, big O of N over log N. It's a different algorithm today, but the running time has not been improved from the LPN. So this is not surprising sometimes because this is a harder problem, but there's a lot more machinery, maybe a lot more structure and so forth. The thing is this revolutionary about this introduction of Regev is not just that he can prove things of worst case to average, is that this extremely versatile cryptographically. So this is the problem really that allowed people to start thinking and actually achieving fully homomorphic encryption, not just an homomorphic encryption for one or two operations, but really for any polynomial time algorithm. So for any sequence of operations, we can do things like link a resilient cryptography, functional attribute encryption, and much, much more. So it's really revolutionary in the sense that this is a problem one can work with and do things we didn't know how to do before, okay? So in fact, if you look at, this is a slide that, thanks to Daniel Masny, sort of he on one hand talks about what you can do with learning with errors, another place what you can do with learning parity with noise, and these access talk about the noise level. So as the noise becomes essentially smaller, the problem becomes easier, okay? No, sorry, as the noise becomes, yes, okay, becomes easier, and therefore you can do more, because in some sense you can do more, probably you can break it, and so forth, let's not go there, but in any case you can do more, whereas with learning parity with noise, you really can do stuff only in kind of the high noise regime. So what I mean when I say do stuff, you do the usual thing, you start with let's say sort of random number generators, which can do in one way functions, then you go to public encryption that requires some sort of trapdoor ability, then you go to hashing, and then you talk about sort of random functions and morphic encryption and so forth. So in more and more things we want to do, we can do them with learning with errors, with learning parity with noise, people are working very hard to do them as well, because the feeling is that that's kind of, why actually, what is the feeling? So the feeling is that it's much more efficient, it's not a feeling, because working mod two is much more efficient. When we are working mod Q and Q is large, we're talking about large integers, the whole arithmetic costs more, so there's some very attractive idea that we can make all of this work with the learning parity with noise, but everything is bullying. Great, one more thing, there's huge quantum significance with this whole development, because as you know, we can build quantum computers, I don't know if you were in Scott's talk, but it's really around the corner, in fact he says it is around the corner of downtown Santa Barbara. In any case, Google, Microsoft, IBM, many other companies are out there trying to build quantum computers for a significant number of quantum bits. You know, this is the cartoon that always comes up here, is how's your quantum computer doing? Great, the project exists in a simultaneous state of being both totally unsuccessful, not even started, and he says, can I observe it? That's a tricky question. So it's a cartoon, but there's a lot of truth in it. It sort of brings up the open problems. How do you observe it? How do you verify it and so forth? And what do you do with it in any case? So the thing is that that's all very nice to laugh, but if this actually takes off, we've got all those applications, electronic commerce and quantum, and cryptocurrencies and all that, and that has to be based on something which will be quantum resilient, and therefore NIST has come up with this call for post-quantum cryptography, and what does this have to do with my talk? It has to do with the fact that essentially all the candidates come that are in one version or another of learning with errors, of basing on learning with errors, because this problem that came from learning theory seems so far to be quantum resilient. So whereas we know how to factor on a quantum computer where it to exist, we don't know how to solve these problems on a quantum computer where it to exist. So at least it's a candidate that we can sort of bite our teeth into and develop signatures, encryption scheme and so forth, that could replace possibly what's in store right now. All right, so who is that? It seems like what I said is that the bliss for cryptography is a nightmare for machine learning. And if that were, when my talk would end, I would choose a different topic, because you know, okay, so they can give us hard problems, we can tell them they can't do anything. That's not where the story ends. That's kind of the middle of my talk. I hope I have more time. Yeah, so the reason I did go through that first part, where we showed impossibility results, is because I think that these impossibility results may be positive news for the second part of the talk. The fact that there are some tasks, the machine learning algorithms cannot solve is not necessarily a bad thing. Okay, so let's just keep that in mind. So now let's move from the 80s, okay? So that all was 86, even though 84, even though lots of results continue, kind of the thrust of the definition is from 84. So what's happened since? So in some sense, I'd like to compare where we are in cryptography as a field and how I see machine learning, and Adi is sitting here and his son is one of the kings of machine learning, so probably he would disagree with me, but I don't care. So in any case, I mean, I do care, but we'll discuss it later. So in cryptography, I think that if you think about theory and practice, okay? I think the fabulous part about this field is that over the years, they've gotten closer together rather than apart. In the sense that people in practice are actually implementing things which satisfy some theoretical definitions and quite well, and it's only getting better. So as a field, we have accepted the role of theory, and in some sense it's fundamental to the question of cryptography because without sort of rigor, without proofs, what are you giving? You're giving me a system and you're claiming it's not gonna break, and the more it breaks, you have to give me some analysis, you have to give me some formal guarantees. In machine learning, it's more like algorithms, right? If it works, it works. But what does that exactly mean is unclear, but in any case, if I had to use the same cartoon of theory and practice, the theorem machine learning is doing great, but the excitement right now, right this second, August 2018, is really in the practice. You know, essentially, with these deep neural nets, and to a large extent, it's lacking theory and has to be developed. So there's some kind of a departure here between theory and practice, and the thing is, this is a terrible slide, but I have to read it, that the practice of machine learning is really too important to leave for practice, okay? Why is that? Because people are claiming, and it's used in many ways, and I believe it also, that it's going to help us for disease control, to predict financial markets, for advertising, for economic growth, for traffic control, for facial recognition, for speech recognition, for threat prediction, for even computer viruses and spam and so forth. And if you noticed that last three bullets, okay, about policing, bail, and credit rating, you know, that's a bit, we're going, we're kind of sliding to a kind of a dangerous zone, where this is, there are, if you read the popular press, next time you apply for a loan, a machine learning algorithm will decide if you get it or you don't. If you are police chief, and you need to decide where to send your police unit, you will run a machine learning algorithm that will tell you where to send them. And where, if you are a judge, and you need to decide whether to release someone on bail or not, and you're lazy, then you just run a machine learning algorithm that analyzes the data of the past. In fact, this is not just my slide, but apparently in New Jersey, you know, the judges are using these algorithms developed by some company that had access to all the information of judges deciding on bail in the past to decide on bail. And other states are considering it. Another thing is that I think this is in New Orleans, where they're using machine learning algorithm to decide where to send the police cars. So this is actually being used, okay? Now what does that mean? That means really that there's a sudden shift of power, things that have been decided before by people, and you say people, who people, we don't like them, they're not smart enough. Still, we have to realize that now it's an algorithm that's gonna make decisions about things which are quite important. So there is a sudden shift of power, so we just have to kind of recognize that, and maybe it's a good one. But it's certainly a fact. And in fact, this is a slide from my Aunt Farnia, so there's some analysis about the largest companies by market cap, used to be the oil companies like Exxon, and Banks and so forth. And now in 2016, these are all, these are all high tech companies, you know, Apple and Facebook and Amazon and Microsoft, and there are these quotes saying that data is the new oil, you know, these data companies, and data will become a currency. So why am I saying that is because this shift of power is going to those companies, it seems, that have a lot of data, since a lot of this prediction is based on knowing the past and having a lot of data about the past, and essentially this can leave us sort of unprotected, unregulated, and so forth. So these are things that you hear all the time, and now you hear it again. So the thesis for the rest of my talk is that this is, essentially cryptography is a field for many, many years, we have paid attention to how to ensure the privacy and correctness of computation, right, so we have developed lots of methods, you know, whether they're using a multi-party computation, or when you talk about computing on private data, or interactive proofs when you talk about verifying that the computation was done correctly, we got a lot of tools. And this means that we are really, we should be able to play a central role in ensuring that the power of algorithms is not abused. Okay, I think that it's very, very clear that that's the case. Now the question is just to do it. So what I've done is essentially I came up here with like eight challenges, okay? Some of them are actually challenges that are already being attacked, and let me just go through them as quickly as I can so I can tell you actually what's being done today. So first of all, if we think about this as power of machine learning, where does it come from? It comes from data, data about whom? About us, about individuals, right? So data about where we drive, who we talk to, what we like to buy and so forth, and our genetic profile and so forth. So obviously ensuring the privacy of both the data and the model during the training and classifying, even if it's not mandated by regulations, okay? Is in a way to maintain this power to the people. So privacy is crucial because machine learning stands to really change the way life goes. And secondly, you know, okay, so let's say that the inputs are private and now there is a model that some company developed for the state of New Jersey or the state of Louisiana, okay, to use for bail or police. But who says that that model actually fits the data? And who says that it hasn't been tampered with? Maybe for bias or some sort of trap door so that if you want some critical juncture to make sure that somebody who's important to you goes out on bail, gets a loan, doesn't get tracked by the police. So this is clear, you know, it's obvious, right? We have to make sure that these things are done in a tamper-proof way. And the question is how? The extra benefit of these two challenges is that it's an opportunity for using stuff that we for many years were writing lots of papers and we had schemes, but who's using them? Well, now because of the blockchains, because of this application, I think it actually will be used. But it's a side benefit. You know, it's more important really to address these challenges. That's a lot of things that people are doing and I will talk about in the third part of the talk. All right, next thing. So you, Alex, Madri put this slide up. So people have probably heard this term adversarial machine learning. The idea of adversarial machine learning is that, let's say there's a model and you can either look at it, okay, and try to come up with an example that you feed that model and input. This is after the training phase and the second phase that it's gonna misclassify. So here it seems useless. Why do you want to misclassify the pig as an airliner, like a kosher airliner or non-kosher airliner? And so it's not at all, why would you want to do that? Well, that's not, that's entertaining for talks, but you know, people talk about self-driving cars. You don't want them to misclassify a swap sign for a yield sign or even things close to home. What if you have software that's intended to find something as a virus or something as a spam and someone can just tweak their spam message or tweak the virus so that your classifier doesn't catch them just by playing with it a little bit. So when I say playing with it a little bit is essentially, these results of turning the pig into, you know, I don't know, united is these results don't require actually having a lot of information. They require being able to ask questions back and forth from a classifier of images, you know, the pig recognizer. The thing is, of course, it's not totally honest because when they ask questions back and forth they ask a lot of questions. That's one, two, when you get an answer it's not just telling you pig or airline, it actually tells you the probability that it's a pig or the probability it's an airline. So it actually gives you a bit of information about how you're underlying machine learning algorithm is working. But the fact is that you can misclassify things galore and, you know, what do we have to do with this? Well, as cryptographers we really do have a vast experience in mathematical modeling of adversarial behavior. So they're playing and it's incredible because this very small perturbations of these images can yield these incredible consequences. But how do you actually go about fixing this? Okay, so you would think that you would need to define formally what are the class of changes that can be done and then prove security with respect to these class of changes, okay? And in fact, there is work by Madri that he talked about yesterday which is exactly what he does. So he looks at images and then he defines a class of attacks which are domain specific. So what would you do to an image? You may rotate it, you may shift some pixels. So he defines this class and then he proves things. What does he prove? So first of all he has this notion of robust training where when he's training his algorithm he's also giving it some adversarial examples. Adversarial according to the class that he's defined before. And then he shows that once he does that misclassification is much harder. Misclassification of course restricted to the adversarial behavior that he proved against. Then he shows that some bounds. He says, look, I could do this, okay? And it's a good thing to do but how much work is it? Training a machine learning model is not simple especially not the kind of models that he's looking at with image classification. So how much more data do you need in order to train a robust network, a robust to those transformations that he's defined? But my point in all of these results which are fascinating is that he defined the class of transformations and then he proved security with respect to or adversarial robustness with respect to that class. And to me it's extremely reminiscent to the days of leakage resilient where we had those beautiful papers on timing attacks and cash attacks and give me another attack. Acoustic attack. No seriously, so these things were amazing. They like would break RSA or they would break these schemes and then what you wanted is to actually define larger classes and of course you want them to include these attacks and prove something mathematically with respect to those attacks. And it seems very similar even though I know that it's different but in spirit I think it is similar. Okay, next thing. Yeah, okay, so the holy grail in this adversarial machine learning I think is when I think everybody knows that is that we want to build somehow a model that will embed in it a challenge so that in order to misclassify you would have to solve this challenge. Nobody knows how to do that. It's very difficult, right? Because everything is so empirical. A challenge, what? A challenge that looks like a plane and it's in regular schemes. I mean regular planes, you know they should be classified as planes without having to embed cryptographic challenges in them. I'm not saying I know how to do it but it's clearly the holy grail and this relates to the Debbie Downer thing because if we think about what we talked about in the first part of the talk it was how to come up with counter examples to learning. So in some sense this might give us some sort of guideline of how to start working toward this holy grail. Okay, next. Next challenge. So we've left adversarial machine learning behind. So there's all these laws coming up which are good laws. I mean they're not necessarily well written but they have good intentions. So there's GDPR and now there's a California law. A digital privacy law which essentially grants consumers more control over an insight into the spread of their information online. So regardless of what the law actually says what is the point here? They're saying it would be that the consumer should know if a company has its data that it's giving it to another company or at least it's giving it in such a way that it's not gonna be able to be traced back to him or harm him in some way. So in other words we want ways to try to unauthorize use of your data and of your model and it means that it would be very interesting to develop methods which could be used for tracing data. Of course without introducing more vulnerability because it seems obvious that if I introduce ways to trace my data I'm actually introducing a way to find out about me. Okay, so wait a second. But you know these type of problems don't really bother us because these paradoxes we've solved before and here is a conjecture from the reception yesterday. Discussion with Mary Waters and Guy Rothblum is that maybe you could show that data tracing is possible unless some sort of privacy preserving learning algorithm was used, so it's a double-edged sword. Maybe you could show that if you can't trace it it's because they use, I don't know, differential privacy or something else and then that would be good. It would be almost a proof that they use a data privacy preserving method with your data. Okay, next. What about tracing that unauthorized use of the model? So this is a beautiful work, a sequence of works. By the way I am going to be really, a lot of people are gonna be upset at me because I don't reference many things but you really should not be upset at me because I can't, okay, so, you know. It's too much and I can't remember the names and so forth. But in any case, here's one paper that appeared in Usenix, I remember Benny Pinkas, just last week where they show how to watermark a model. So their idea in some sense and they have this beautiful title turning your weakness into your strength. By the way, that's one thing that this field has. The name's fantastic, you know. Every system has a beautiful, I don't know, there's so many acronyms, I'm at odd, people coming up with these names. But in any case, what they mean is, is that they watermark deep neural nets by training the network to accept some planted adversarial examples. So in some sense, the fact that there are adversarial examples in the kind with the pig in the airline, they're saying, you know, that's a good thing, I'm gonna put something in there that only I know that I put. That's gonna misclassify something galore and it's gonna be sent to my watermark. Okay, five, fairness, accountability, and debiasing. So right now there's this whole community, it's called FAT, fairness, accountability, and transparency, where they come up with definitions and algorithms of how to take machine learning models and make them fair, okay? And Cynthia talked about that yesterday. I think that we have some crypto-style definitions which could be useful, where they talk about similar people have to be, similar people have to be classified in a similar manner. What does similar really mean? How do they define that? They have some sort of metric. But to me it seems that definitions like simulation-based computation, and the instinctual might be very interesting here, because in some sense what you wanna say is that with one person you can do what you can do with the next. You know, I could give a whole talk on this, but I won't. Then there's a question of randomness. So at least for the machine learning models that I'll talk about in a minute, the neural nets, randomness seems to be very important. And I've never heard any discussion about what kind of randomness do you need to guarantee success? Is it a computer graphically strong? Is it unpredictable? Is it just something else? How does it affect stability? So I think that's a very interesting domain. As we know at the end, randomness for generating secret keys, if it's done incorrectly, can be detrimental. And they talk often about the brittleness of this, that if somebody comes with a neural net and then somebody tries to reproduce it, they can't. Maybe it has to do with the randomness. Or maybe the fact that you can translate adversarial machine learning examples from one model to the next, which is surprising, you attacked one and then it seems to attack another, is because they use the same randomness or closely related randomness, I have no idea. But it's worth a study. Finally, I think that this is probably something that's very tractable and that has defined some specialized cryptographic functionalities which are sort of machine learning complete. So some cryptographic things, if you did it well, you could implement sort of safe machine learning. And of course, these have to be functionalities which are efficient for cryptography. All right, so I wanna say to end here and then go to my last part, is that there's a real opportunity here for developing new theory. And I think that that's great, like how exciting. So this is the third part and last part of my talk and obviously I'm not gonna get through all of it. There is a lot of work on ensuring the privacy of both data and model during classification, during training, and some work on model stealing as well. So what do I mean by that? So I'll say in a minute. I wanna say that lots and lots of works, as I said, 50 or so, and really in some sense, if you think about it, we know a lot of these things are possible just by general results, right? How efficient is it? We know asymptotically. Then there's a question of how you write a system and you wanna, you can analyze the concrete efficiency. Now we're in that stages of proofs of concept. I don't think any of these things are ready to be shipped off. But this is very natural progression of how these things go. So I said that you could use cryptographic technologies of the past. What technologies? Everything. It's like the kitchen sink. Garbled circuits of the album 82, homomorphic encryption from Gentry and on, secret sharing of Shamir, differential privacy, and multi-party computation. So all of these techniques come into play. So which one? Well, that's a really good question. It seems like each one has their merit depending on whether they were doing training or classification and what trust assumptions we're willing to make. And really what people are doing, it's like a Chinese menu, even though it doesn't look like Chinese food, but in any case, it's sort of a pick and choose approach. You sort of have all this technology out there. Maybe I'm gonna use this and then I'm gonna connect it here. And voila, a system comes out. And I mean it in the best possible way. So let's just see sort of very briefly the kind of things that are being done. So here's kind of a picture. There's a training phase, as we said, and there's a classification phase. So if you think about classification, what are the security issues anyway? There are two. On the side of the client who wants to sort of classify whether he's getting a loan or not, or I don't know, something else, they want to keep their data private. So in this picture, it's a doctor that has the images, medical records of her patient. And then the hospital was developed a model because it had lots of patients and knows how to classify tumors as malignant. Not wants to keep the model parameters or the hypothesis secret. It worked pretty hard to get it. So there are these two competing concerns. And it seems like, okay, what do you want? Let's go home. Two-party computation. Yeah, I already did it in 1982. Why do we have to even discuss this, okay? Performance, performance, performance. So essentially, I'm gonna skip this, but you can sort of talk about the pluses and minuses of using two-party computation versus using encryption that has some sort of homomorphic properties to it. The big line here without looking at all these pluses and minuses is that homomorphic encryption allows small communication. So if you think about it, you're encrypting the input and you're encrypting the output, okay? Whereas garbled circuits and multi-party computation, the communication is very high because it's proportional to the size of the circuit or the computation that you're doing, but computation is more efficient. So that's the tension here. There's other tensions, not the only one. The garbled circuits, they work on binary inputs. It's really for Boolean circuits. The homomorphic encryption, they work for arithmetic. So depending on whether you're a computation, you're a naive thing that you're doing starts being Boolean or starts being arithmetic. And in fact, often it will start working with real numbers, okay? You may prefer one technology over the other. All right, so I was gonna talk about some work that I've done in 2015, which is working on very simple classifiers, okay? Like linear classifiers, decision trees, naive bays. In which case, it's very clear that you wanna choose an encryption scheme where you can encrypt m and encrypt m prime and you can compare whether m was greater than prime or not. That's kind of the basis of what you want or of that sort and that's the kind of encryption scheme you would use in order to develop a classifier where both the input of the being classified is maintained in a secret and the hyperplane. So what classifies as zero or one is maintained secret. But the real interesting game in town is not that. You see that dog? He doesn't look very happy, but I don't know, I think I was in an aggressive mood when I was downloading these pictures. So the real interesting question is what about these deep neural nets, right? Which everybody's saying is the future and has been utilized. What is the challenge really in classification and also in training? So the way that these things work, there's this dog and he's not physically being input. Obviously it's a picture of a dog which is a bunch of pixels and each pixel has some kind of value depending on what the color or some sort of gray scale if you wish is inputted to the neural net. Then there's something that is called usually layers and then at the end there's an output which is a bunch of nodes which have probabilities in them. Was this a dog? Was this a cat? Was this a man or was this no one? And neither one of these three. And obviously if this is perfect, the top blue one is gonna have a one. It's a dog, okay, at least I think it's a dog. Although it's an angry dog and then the others are all zeros but that's not always gonna be the case. Now what is the issue with cryptography? These intermediate orange balls, okay, what do they do? They essentially, there's some weights on these wires and what they do is they multiply these weights by the input variables and then they compute something called an activation function on this value which is a nonlinear computation, okay? So that's what we have to remember there's sort of a bunch of linear stuff that you weighted some and then you do something nonlinear. Why is that important for cryptography? Because, so what's an example of these nonlinear functions, logistic functions, essentially a max, a hyperbolic tangent, in any case the point is that our cryptography, our fully homomorphic encryption and MPCs are really good when what we are computing are low degree polynomials. So that was on the slide that I skipped really quickly. This is not a low degree polynomial which means that if the inputs come in encrypted and then we want to sort of evaluate this whole thing under encrypted input and like they are fully homomorphic or MPC whatever, it's going to be hard because these are functions that are going to require a very, very, very deep circuits to compute and we don't have, essentially it's beyond our capacity because it would mean that we would need to do bootstrapping, we need, that means it's big parameters, a lot of noise and we can't do it as is. So the beauty is still that there is work and I, this first work that I've seen it was Microsoft, a group by Kristen is that they actually do it anyway. So what do they do? So the first work is crypto nets. Essentially first, a few problems to address. In these neural nets, you have these fixed precision, real numbers and you need to convert them to integers because that's what the homomorphic encryption works with and you're going to multiply, you're going to add, these things are going to grow. You somehow have to make sure that it don't grow beyond what you can handle. Then there is a question of this logistic or non-linear function. What do you do then? So they say, okay, let's not use it. Let's use a load degree polynomial which will approximate it well and they propose to use a squaring function. The point is, so there's a z square there. So the point is that that's, I think that's a really big idea. Sort of trading accuracy in some sense for efficiency. So in other words, that's, and there's two ways to go here. One is to work really hard and to be able to do the logistic function, okay? And the other way is to go to the machine learning people and tell them, listen, this is better for us, okay? You want to have secure ML? Why don't you do your neural nets with these kind of functions? Whether it's this function and another one is a different question. So to my surprise, for example in Madrid yesterday who's extremely savvy in all this, didn't know about this work, about the kind of functions that are used in intermediate neural nets in the crypto work, okay? I think that Ohad Shamir and a few other people actually worked on analyzing the squaring function and how well it does compared to logistic and so forth. This is the kind of work that should be major encouraged. Okay, this is not only FHE that's being, or homophic encryption that's being used in this context but also people in this work on deep secure, two garbage circuits and optimized implementations for sigmoid, for tangent. Those are activation functions that are actually used by the machine learning people. So they didn't switch to some other functions. They took the one that are being used in the machine learning algorithms and did implementation specific optimizations. So then there is this rule that we notice telling me when is the FHE better than MPC? Because here it's FHE, here's multi-party computation. So here's a rule of thumb. If the computation is linear and the circuit size is super linear, use homomorphic encryption because your circuit, the computation is too large, okay, to use garbage circuits and yet it's small enough to use homomorphic encryption successfully. And in fact, there's this work that was also in Newsweek, just last, not Newsweek, UseNICS, same difference, by Daikon Tadeitha and Chakon, whatever, Ananta and Chirag, which they combined it to approaches and essentially what they do is they say this, you know there is that linear layer, right, so the encrypted inputs come in and first of all we sum. That we can do under homomorphic encryption. Now comes the non-linear. So the suggestion now is that now there will be a protocol between the classifier, the model holder and the user who has the key for the encryption and somehow they'll do a two-party computation to compute this non-linear level and now we go to the next stage. Of course, this whole thing makes sense only when either the model is something that you don't know or that there's some efficiency that's gained by this approach versus you yourself computing the model. So because you could imagine the model guy giving you an encrypted model and then you go do it. Anyway, the issue is that it's a very interesting work, good performance, and it keep here is to combine these two technologies. I know that I'm kind of out of time, I'm gonna take five more minutes, okay? Two? Ah, thank you very much. So this is classifying. So linear classifier is simple. Deep neural net's difficult, okay? What about training? So training is a nightmare as far as I can see because before we had to go through the system once and compute this non-linear function at every level of the network. But now when I do training essentially this is the canonical picture, so this weight one and weight two, we think about these weights on the wires that you do the weighted sum with. When you train, you don't know what those weights are. So you start with something random and then you improve those weights, pair what you learn from the training data and you improve, improve, improve till you find a place where it's optimal in some sense. The thing is that in this picture here these are the original weights, this is sort of the loss, so how good these weights are. And let's say you start somewhere here and you wanna really fold away here to find the right W1, W2 in this case, but usually there are lots of weights that minimizes the loss of the classification. But how do you do that? You have a training input with the weights that you have right now and you run it through the network and then you see whether they classified well or not and if it did, that's good, if it didn't, it tells you how to change your weight and then you use another input and so forth. Of course they don't need one input at a time, there's lots of inputs but they take batches of inputs and do that, okay? But that's a lot of non-linear operations. It's a lot of operations altogether so I think this is incorrect, it's worse. So if you compare training through classification, it's gonna be the size of the data, this is not the same as before, the size of your data set, how many examples on your data set and more, okay? So very difficult. Second of all, we need lots of training examples and often we need to take them for many different entities, maybe individuals, maybe companies, maybe hospitals, but the training data doesn't come from one source so it's not two-party computation anymore, whether somebody holding the data and somebody who wants to train. It's actually lots of people who may be contributing data and one that wants to train. So if you wanna keep the privacy of those lots of people, how do you do it? M-P-C-E, that's right, actually true. So there is a paper called Federated Learning or Concept Cold Federated Learning where they say, okay, let's say that your training data is what you've typed on your laptops and the machine learning wants to predict the next word. So if I write crypto 2018, the word keynote comes out or something of that sort. So the thing is they want to have lots of users typing and develop a model from all those users' inputs, but they don't want, the user, let's say, doesn't wanna tell them what they're typing. So the idea with this is that you first, what they suggest is, you know what, each user can locally train a neural net, okay? So they have some weights, so there's some initial weights that everybody knows and then each one improves them depending on their inputs and then they send the deltas in the improvement to the server and their first argument is, look, I'm just sending the delta of the improvement, I'm not sending my inputs, so I'm really better a little bit. It's not good enough because these deltas can leak information about my inputs. So the second idea is instead of sending to me the delta in the inputs, we're gonna split, so over here, there was one server, we're gonna split this to several servers, let's say three or four and we're trusting somehow, so Google splits into three agents and we're trusting that they don't all talk to each other, at least somebody doesn't talk to, there's some non-collusion going on and now they, each one, this guy secret shares his delta of the weights among those servers, they all get together and do a weighted sum. Weighted sum we know how to do, very efficiently. So that's the idea there. And what else do I wanna say? I wanna say that even though I had that beautiful picture in saying that training is a nightmare, it's being done. I, like hats off if I had a hat. It's unbelievable, sort of, there's a lot of work on training approximate logistic regression and other kind of regressions. The main idea there, which I get is a big idea, is how to essentially approximate these standard computations like logistic and so forth by other functions which have similar performance and are friendly to cryptography. In fact, there's a very beautiful new homomorphic encryption which the title is homomorphic encryption for approximate arithmetic. The idea here is you kind of say, you know what? We have always wanted, insisted that homomorphic encryption has to give me the same answers as the unencrypted computation. Let's relax that. It's not gonna give the same answers. It's gonna be approximately the same answers. When I'm any way are using this for some application, we'll only need an approximation and I don't need exact computation, that's good enough. And that can make my homomorphic encryption much faster. So I think this is beautiful work and this is an example showing how this machine learning in my opinion and logistic regression motivated the invention of a completely new homomorphic encryption scheme. So it's really an example where this kind of goal has changed in my opinion cryptography. Okay, I think that I'm gonna skip the model stealing. That's differential privacy. I just wanna say that we're really not done at all because in all of these things and all these solutions, it's all honest by curious models. We're trusting people to follow the protocol. But why should we trust them? What about people who are trying to modify things so that they can later get qualified for a loan? So I think it's a fundamental question and the stakes are too high to pretend it doesn't matter. You know, there's sort of three parts to it. How do you verify everybody is doing the right thing during the training? How do you make learning robust to adversarial inputs? So there's beautiful work on distributed optimization saying suppose some of the inputs are actually bad. Okay, that you make your optimization problem still robust against those badly chosen inputs. And finally, how do you verify the model has not been modified post-training? Okay, you've been extremely patient. And my bottom line is this, this whole machine learning thing, fascinating for cryptography. It's an opportunity to use things we've developed for many years, but more importantly, it's an opportunity to develop new theory, both for crypto and for ML, so that they work well together. And finally, I wanna thank all these people who I kind of tortured with questions on this topic. And if I didn't mention you, you know who you are, so thank you.