 All right. So what's the motivation for this? Basically, as corporations and governments, both foreign and domestic, increasingly traffic in human data and as algorithms reach ever more deeply into our lives in ways that, say, determine our views, not just of products, but also of world events, and our relative eligibility for jobs, we see that the adversary is ubiquitous and is often Byzantine. And to reiterate Alex's point, this is a tremendous opportunity for crypto, and we really, really need crypto, and we need crypto on steroids, hence the title of the talk. As a scientific community, cryptographers, I think, are uniquely equipped to engage in the framing of these questions, just as Alex said, and also addressing them. Now, even the good guys can have adverse, possibly unintentional, consequences. So for example, there are legitimate and well-intentioned data analysis tasks, such as analyzing over-the-counter drug purchases for early detection of an epidemic, or analyzing loan data for detection of systematic racial discrimination. These things can have serious privacy consequences and for the data subjects. So I'm going to start this talk with differential privacy. Some of you, maybe some of you don't know it. I'll start with differential privacy, which is a definition of privacy and a collection of algorithmic tools that are tailored for the statistical analysis of large data. And I'm going to weave it then through other subjects. OK, so here's our general model for the talk. We have a data analyst, and there's a, let's think of it as a central repository of raw data. And it has all sorts of very sensitive information in it, and of course, a lot of utility in it. And the data analyst will interact with it through some sort of a mechanism. And I just want to point out here that the analyst is interacting in an adaptive fashion. So let's say she asks a question, Q1 gets an answer, A1, then depending on that answer, she formulates question Q2 and gets an answer to that, A2, and so on and so forth. So it's adaptive. And I can think of the question as something simple, how many people like muffin tops, or I can think of it as some framing of a large study. Carry out the following analysis on the data set and tell me the results. Conceptually, it really doesn't matter. So the centralized model is the model, for example, that's being used for 2020 census protection of confidentiality. The Census Bureau will be collecting a bunch of data and will be analyzing it. They have access to the raw data, and they'll be publishing statistics, and they'll be using differential privacy for those publications. There are other models of computation. So for example, in the local model, the individuals roll privacy and they somehow randomize or more specifically, they apply a differentially private transformation of their data and then send the data into a central source. So the central repository now has only randomized data. This is the model that's being used, for example, in Google's report project, in Apple, in iOS 10, and in Mac iOS. Microsoft is using it in Windows. And corporations seem to like it because in the usual parlance, we have rolled privacy in. We've pushed the trust boundary, sorry, out to the client. And that may reduce the responsibilities that the corporations have for the data that they collect because privacy has already been rolled into it. OK. Another model is a federated model, which you could imagine, let's say, a collection of a few hospitals. Each hospital has raw data for its patients, and they can't transfer these data to other hospitals for confidentiality reasons. But they would like to cooperatively compute something, some kind of analysis of the data among a set of hospitals in an area. And that's a federated model. OK. So what is differentially private data analysis? What is the differential privacy guarantee? Roughly speaking, it says simply in English that the outcome of any analysis is essentially equally likely independent of whether any individual joins or refrains from joining the data set. So in other words, we'll learn the same things from a data set whether or not I actually choose to have my data included in that data set. So what this does is it sort of separates harms that can come from the teachings of the data, those things that you learn when ML is working really, really well, from the harms that can come from the participation. So a harm that can come from participation, of course, would be something like somebody leaks your personal data. But a harm that would come from teaching is different. So let's say that we do a data analysis and we learn that smoking causes cancer. And now, as a result, a smoker's insurance premiums rise. Now, that smoker was hurt by the fact that the insurance companies learned that smoking causes cancer. But what differential privacy will guarantee is that this smoker is equally likely to be hurt whether or not the smoker actually participates in the study because the same thing that is smoking causes cancer is learned independent of the participation of any individual or small number of individuals. So again, differential privacy separates the harms that come from the teachings from those that come from participation. It protects against those that come from participation. And of course, learning that smoking causes cancer is the whole point of medical analysis and it encourages the smoker to stop smoking. So we want to encourage that. Okay, so this idea that you learn the same things, whether or not any individual opts in or outs out of the data set, that's a kind of stability guarantee. And for those who do know something about machine learning, stability is something that is essential for generalizability. It says that we're not overfitting too heavily to the data set. So we usually think of privacy and utility as being at odds with one another. We have to maybe add some noise to get some privacy. But here's a sense in which privacy and real learning, which is what you really want, generalizability are actually aligned. So here's the formal definition and I need to explain a few things. First of all, the privacy is not sort of binary. It's shaded and the grade that it gets is called epsilon. And our algorithms are typically called M and they're, oops, they're randomized. And we need the notion of a pair of adjacent data sets. So those are two data sets that differ just in the data of one person. In other words, everybody in this room versus everyone in this room except for me. Those are adjacent data sets. So an algorithm gives epsilon differential privacy if for all pairs of adjacent data sets X and Y and every possible output event S, the probability that we observe S is essentially the same when the database is X as when it is the adjacent database Y. So when epsilon is small, e to the epsilon is about one plus epsilon. So the probability of this event is not changing much. And the probability space is not over the choice of the data sets. This is a worst case notion for worst case over all pairs of adjacent data sets. The probabilities are over the coin flips of the algorithm. So the things that are controlled by the good guys, that's us. And the parameter epsilon serves as an upper bound on our measure of privacy loss. So the actual measure of privacy loss, as you'll see in the later slide is the ratio, the probability of M of X is in S divided by the probability that M of Y is in S. And this is a bounded ratio requirement. All right. So the key properties of differential privacy, no matter how it's implemented, are first of all, that it is future proof. It is resilient to any kind of post-processing or future auxiliary information that you can glean from some other source. In other words, once we have carried out a differentially private computation, even somebody who knows that the database is one of X or adjacent Y, that person still can't figure out which one it is, no matter how much post-processing they do later. It also has an automatic group privacy consequence from the definition. So if you have an algorithm that is epsilon differentially private and you have a group of say, size three or four, then the algorithm is automatically three or four epsilon differentially private for that group. So if you're happy taking a survey in terms of your own privacy and you're wondering how do you feel about including the data of the rest of your family, this is what the guarantee tells you. Finally, it composes gracefully and automatically. So that is, we can understand how privacy loss develops as additional questions are asked to the system or opposed to the data set. That is a property that gives us something that to my knowledge, no other privacy protection mechanism gives us, which is programmability. So we can create a couple of simple differentially private building blocks and now we can start programming with them. You can create differentially private algorithms for much more complex analyses by programming. You put the primitives together in the right way and we can understand the cumulative privacy loss. We can also control the cumulative privacy loss by playing with the parameters epsilon so that they add up or compose the way we want them to. So just an example of why this is a definition that is a little bit beyond crypto. Imagine answering the following two questions using let's say homomorphic encryption or secure multi-party computation. The questions are, how many members of the House of Representatives are cooperating and how many members of the House of Representatives other than the Speaker of the House? Oops, sorry, those should have been the same predicates are cooperating. So assuming that they were both written as cooperating, if you knew the exact answers to both of them, you would be able to determine whether or not the Speaker was cooperating. So these are, it's a differential privacy will ensure that the privacy of the Speaker is protected. This is a problem that I've got here in red that exists even when all of your crypto and all of your security is working perfectly. So it's addressing a new thing. Now when we have a pair of adjacent databases X and Y and we're running our algorithm on let's say X, we can look at the privacy loss of a specific execution of M on X with respect to Y. So the privacy loss of this draw is just the log of that ratio, the probability that we observe the output C when the database is X divided by the probability that we see it when the database is Y and then we take the log of that. And the privacy loss can be positive or negative according to whether C is more likely under X or under Y. And it could also be infinite if C never shows up when the data set is Y. And in the definition, this so-called pure differential privacy definition that I just gave you, we simply guarantee that the magnitude of this is always bounded by epsilon. Now privacy loss is actually a random variable because we ran our randomized mechanism on the data set. And when we start looking at privacy loss as a random variable, it gives rise to relaxations of differential privacy and things that permit even more powerful composition results. In other words, things that let us do a lot more computation before we have to turn off the data set and say that's it, no more access. So in fact, it leads to variants of differential privacy that have accuracy that matches the lower bounds for something that's known as the fundamental theorem of data recovery, which basically says overly accurate answers to too many questions is blatantly non-private. So we can dance right there at that boundary between overly accurate and non-privacy on the one hand and security on the other. So here's one differentially private mechanism. And this is called the exponential mechanism. Now, a simple way of ensuring differential privacy, especially for real valued functions is to add some noise to the output. But there are some situations in which adding noise doesn't make sense. So for example, consider an auction of digital goods. So we can make any number of copies and we're auctioning them off. The algorithm that chooses the price takes as input bitter profiles, how many they're willing to buy at which price, and outputs a profit maximizing price. Now, that's a perfectly well-defined real valued function. If you add a little bit of noise to the price, you risk taking the price over the point that some entire stratum of people are willing to buy at, dramatically reducing your revenue, your profit. So the exponential mechanism not only addresses this, but it also provided the first collusion resilient mechanism for this kind of auction. More generally, it operates with a finite set of discrete outputs. So in this particular case with the auction, it might be prices and the discretization might be in terms of cents. How many cents are you going to charge for something? But sometimes you're doing other things in a privacy-preserving way. You might be choosing experts inside some learning algorithm, or you might be choosing strings that are heavy hitters, which are the URLs that are used very frequently, things like that. So it operates with a finite set of discrete outputs and it has a function, which is the utility function, which says for each of the possible outputs in any database, what is the utility of that output for this database? So if the database is a collection of bidding profiles and the given output is a price, the utility is just the profit at that price given this vector of bids. And Delta U is sort of the maximum over all adjacent pairs of data sets and all potential outputs of the absolute value of the difference between the utility of that output for database X versus the utility of that output for adjacent database Y. And the way the mechanism works, and it's almost given here in full, there's a slight detail that I'm glossing over, is that we output an element psi or with probability that's proportional to exponential in the utility of that element for this data set times epsilon divided by this bound on the change in the utility Delta U. And the math is very simple. Since the difference in the utilities for U of X psi and U of Y psi is bounded in absolute value by Delta U, this expression is bounded by E to the epsilon. And the part that I've glossed over is a normalization term. I used proportional and I didn't say exactly this probability and that costs a factor of two. Okay, so this is an example of a concrete mechanism and we'll see it later. So there's a big algorithmic literature by now talking about how to do many, many different kinds of things in a differentially private fashion and the goal is always to get as much accuracy as you possibly can for a given privacy cost. And I should mention that since we've been talking about gradient descent and stochastic gradient descent, there's a lot of differentially private machine learning now, differentially private gradient descent and stochastic gradient descent. And there's also all sorts of things involving more classical statistics questions like finite sample confidence intervals, as well as kind of the bread and butter of a lot of statistical queries, which is counting queries, which also has a whole learning theory that's associated with that. That's the statistical queries learning model. Okay, so differential privacy, we feel, tells us something fundamental about computing in a robust way. And there are deep ties to robust statistics, but other kinds of robustness do arise. And I completely agree with Alex that there should be some application of these techniques specifically for adversarial learning. But I don't yet know how to do it. Anyway, so here we have two images. And one of them is very lifelike and the other one is a little bit blurred. And the question was basically, which one is the truth? So that brings us to an application of differential privacy to adaptive data analysis. And maybe some of you are familiar with this article why most published research findings are false. There are many, many papers about this particular problem. And the issue is adaptivity. So adaptivity arises naturally. So natural learning procedures like gradient descent will adaptively query the data. But more insidiously, let's say that we're in a situation where there's a very large corpus of data. And there's no way, it's completely infusible to go off and find another data sample that's comparable to this one. So what you end up with is a situation where many groups of researchers study the same data set. And they also read each other's studies and read each other's papers and formulate their new studies based on what they've seen. So studies that are conducted by researchers who have read papers that use the same data set have to be considered adaptive as well. Now, recall that the differential privacy property holds under adaptive composition. So the ith question depends on the answers that were received in the first I minus one queries. And the whole thing is differentially private by composition. So the query QI is chosen by post-processing of A1 up to AI minus one. But we had closure under post-processing. And then we can now, so QI is sort of independent of the data to some extent, but depends on the answers. Then we ask it and we now have under composition that QI plus one will end up also being closed under composition, private under composition. So differential privacy addresses adaptivity. If the mechanism ensures differential privacy, then the choice of the ith query cannot depend too much on the database, cannot reveal too much about the database. And so differential privacy, I already noted at the beginning that it gave us a kind of one shot generalizability. It actually gives us generalization under composition. So I wanna give a little bit of intuition about why this is the case. So fix a query, say what fraction of the population is over six feet tall. All right, that's my study. I'm gonna go out and I'm gonna recruit a bunch of individuals using absolutely proper sampling techniques in order to try to answer this question. So a statistic is a quantity that is computed from a sample and using good sampling discipline. And what I expect is that almost all large data sets will give me an approximately correct reply. So the way I'll phrase that is most data sets are representative of the underlying population with respect to this query. This is why statistics as a field works. If this weren't true, statistics would be nonsense. We do statistics because we think they're telling us about the population as a whole. Now the way I described it, I framed my query, I gathered the data and I know that my query is independent of the data because I framed it first. And then I apply the query to the data and I see the answer. Now of course I could have done these in the other order. I could collect the data first but not let the analysts see the collection of samples. Now the analyst chooses the query still independently of the data because she hasn't seen anything. So again, the database will be representative, very, very likely will be representative for the population as a whole. Now the reason I'd like to think about that in that order is what do we have? We have that the database is collected, the adversary doesn't know anything about it, she formulates the query. The database is good for her query with very high probability. Now suppose in the course of interaction with the data set, multiple queries and responses, the analyst in fact sort of quickly found a query for which the training set was not representative. Then in some concrete sense she's learned something significant about the data set. She couldn't do that at the beginning when she didn't know anything. So if in the course of interaction she managed to find some query such that this data set was not representative, she must have learned something significant. So what that says at least on a very intuitive level is that if you preserve the privacy of the data set then perhaps you will prevent overfitting. How am I in time? I have 20 minutes left? Okay. So I'll say just a couple words about how this argument goes or one version. So suppose that the adversary asks her first question and she gets an answer and now her job is to formulate her second question. So we have a joint distribution on data sets and on queries and we can define for any in fact joint distribution, D and Phi. We can define what we call the max information as the least K for which for all D and support, little D in support of big D and for all queries in support of big Phi, the probability that the database is little D given that the query is little Phi is at most two to the K times the probability that the database is little D. So again, D is sort of the data set random variable and Phi will be the query that's chosen as post-processing of the first answer. So there's a fact which is that this quantity is symmetric in its arguments and the first part is an argument that differential privacy bounds this quantity, the max information and this is coming more or less from closure under post-processing. Differential privacy turns out not to be the only way of bounding max information but it's a way that has a whole lot of machinery, a whole big artillery of algorithms to use. And then the second part of the argument shows that if you bound the max information, the other direction, then that will ensure that the data set will remain representative for the query that's chosen and the query, in other words, doesn't say much about the data set and generalization persists and then you can repeat using the composition property. So the two things that we're using is this post-processing and composition. So here's a use of this. You're all familiar probably with hold-out sets. So you have your data set, you split it into two pieces, the training set for your algorithm and the hold-out set on which you verify your conclusions. And you learn on the training set. So suppose you learn something and you go and you check it on your hold-out set. Maybe you'll find out that you were right or maybe you'll find out that in your training you over fit to the training set. Either way you can't repeat the use of this hold-out set. The reason that it's meaningful to test against a hold-out is that the hold-out set is basically fresh data samples. So you're checking, did your conclusions hold for the population as a whole, as represented by the hold-out set? If instead of just looking flat out at the hold-out set, you check against the hold-out via a differentially private mechanism, then future exploration of the data will not significantly depend on the hold-out set. And so the hold-out set stays fresh. And what's going on here conceptually is just, again we have our adversary and she's interacting with a data set. The data set she's interacting with is called the hold-out set and the training set is hardwired into the brain of the adversary. And so the fact that differential privacy protects against adaptive analysis then tells us that we can use the hold-out set multiple times. Okay, so last weaving through of this stuff. I'm gonna talk a little bit about algorithmic fairness. So we have a population, our population is diverse. It has ethnic differences, religious differences, geographic, medical, so on and so forth. And the concern is, was framed nicely in this 2010 article in the Wall Street Journal that gave this scenario. At the time that the article was written when you went over to visit a website of a bank, the bank would receive detailed user information about you. You'd probably already been placed into one of some few dozen demographic bins. And what you see depends only on what bin you're in. In the world in 2010 when this was written. But it's a very clear example. And the concern that was raised in the article is the illegal practice of steering minorities to credit card offerings of less desirable terms. It is an illegal practice. And so you want to be sure that that demographic bin isn't somehow or other a code word for race. Now, one thing that people might want to do is to hide sensitive information from the algorithm. And that doesn't work. And it doesn't work in part because there are many redundants and codings of race or other sorts of sensitive attributes. Also culturally aware algorithms can be more accurate. So you might have, and I'm going to use now for the rest of this section, I'm gonna say that the population is split into two groups according to what herbs they like to use for their food. We have those who like to use sage, who like to eat sage, and those who prefer thyme. And so my sets are gonna be called S and T. And let's say that the sage eaters are the minority group. So you might have a situation in which let's say hearing voices is a common religious experience among the sage eaters and it's diagnostic of schizophrenia among the thyme eaters. This is a real example. So if you were trying to diagnose then having your algorithm understand the culture of the individual is very helpful. The other thing that you may think of as well, we train these data on historic, we train these algorithms and classifiers on historical data. And of course we will then imbibe the prejudices of previous decisions that are present in the data. We'll bring those into the algorithm. So if you have historical discrimination against sage eaters that will persist if you train only on historical data. And there's a general problem that there's no source of truth. Where's truth going to come from for you? So algorithmic fairness. The first thing that we need to do is to define what we mean by algorithmic fairness. And so for the purposes of a classification algorithm there are different kinds. What does that say? 15, okay. There are different kinds of fairness guarantees. So a common class of guarantees are group guarantees. So group fairness properties are statistical requirements. For example, a statistical parity says, requires that let's say we're trying to decide whom to admit to college. So we get yeses and nos. It says that the demographics of the people with positive classification should be the same as the demographics of the general population. If sage eaters are 25% of the population then sage eaters should be 25% of the admitted group. And similarly also for the rejected group. That's what statistical parity says. It's often meaningful in the breach. If they sage eaters are being accepted at a much, much, much smaller rate that may say, hey, you should look at this and see whether something's going on that you don't like. That's not right. Unfortunately, it also permits targeting the wrong subgroup of S. So if you're running a spa and you do not want to see sage eaters at your spa and you have to advertise proportionately to sage eaters, what you do is you advertise to the sage eaters who can't afford to come to your spa and you advertise to the time eaters who can't afford. So your spa remains segregated but you seem to have satisfied statistical parity. It also permits fairness gerrymandering. So you can have a situation where you're advertising proportionately to the sage eaters and the time eaters and you're advertising proportionately to the coffee drinkers and the tea drinkers but maybe the sage eating coffee drinkers are really not being represented in this advertising group. Okay, so a different definition. Oh, and there's one other point which is that I started with statistical parity but there are lots of other group notions and that make a lot of sense. So for example, similar false positive rates and similar false negative rates and things like that. And a lot of very often small collections of individually desirable properties cannot be achieved simultaneously unless base rates are the same in the different populations and that's the problem that underlies the famous controversy about recidivism prediction. So another notion would be individual fairness and what that says is that for a given classification task people who are similar with respect to this task should be classified similarly. So first of all, we need the right notion, we need access to a metric that says for this particular classification task in these individuals, you and me, how similar or dissimilar are they? And we also have some sort of boundary conditions. So the individual fairness requirement says that similar people have similar probability distributions on their outcomes. So here our classifiers will map individuals to distributions, probability distributions on outcomes and we enforce elliptic condition. So that's called individual fairness. Sometimes now, guys fault called metric fairness. So just to put this a little bit back in context you might notice a similarity with the definition of differential privacy. Before we required that if databases were similar then they would have similar distributions on outcomes. And for individual fairness we're requiring that in this task specific metric to similar people should have similar probabilities distributions on outcomes. And in both cases it's enforced by some sort of Lipschitz constraint. So this gives the sort of exciting possibility that maybe we can use differential privacy techniques to get fairness. And at least sometimes we can. So there's a theorem that says if we use the exponential mechanism of that I described earlier of McSherry and Talwar that will give individual fairness and small loss if the metric has bound to doubling dimension. So just to give you a little idea about that if we first set our outputs to be the same, the individuals. So we're gonna map individuals now to probability distributions on other individuals and then we'll classify people after that. So we're basically smearing an individual across her neighbors. The probability that an individual v is mapped to v prime is proportional because we're using the exponential mechanism of e to the minus the distance between them. So you're much more likely to get mapped to somebody who's close by than to somebody who's far away. And that's exactly where the fairness comes from. Close neighbors will be mapped to similar distributions on individuals and small loss will come from under the suitable conditions we expect to be mapped to a near neighbor not to a far one because they're much less likely for you under the exponential mechanism. So there are various veins of that that was work from Dworkhart-Potasi, Reingold and Zemel in 2012. There are various veins of algorithmic work. Standard optimization techniques operate when a metric is known. In bandit settings, people are looking at exploration versus exploitation techniques. And there's a collection of results which one set views fairness as an accuracy requirement and the other one is using sort of the metric fairness requirements. And these results operate on large numbers of large overlapping groups. And I'm gonna say a little bit more about this on the next slide. It's going to address those intersectional problems. It's going, so our groups don't have to be disjoint. And we're going to get those coffee drinking sage eaters as long as there are enough of them. And some of the works deal with calibration which is correctness condition and some of them deal with a variant of the metric requirement that similar people are mapped similarly. Now there's another vein of work that is in the machine learning community and it involves learning what's called a fair representation or essentially a censored representation that censors sensitive information. And then trains, tries to keep as much other information as possible and train on that censored representation. Now this as described here is strictly a group notion of fairness for the learning the fair representation but there's no a priori reason why these techniques can't also be extended to have some metric properties. And I think it would be a very interesting line of research. So at this point, guys should probably get up and finish the talk but because I'm talking about work of Ebert Johnson, Cameron, Golden Rothblum. So multi-calibration, first of all, multi-calibration is going to be a property of a predictor and a collection of sets, right? Now suppose individuals have say true probabilities of success or of getting an illness or of something and that the probabilities are measured maybe, taken over their internal randomness and perhaps randomness of the environment when they interact with their environments. And assume that these probabilities are measured, let's say in multiples of a tenth. So people can have a tenth probability, two tenths, three tenths and so on of success. Now we can think of these individuals as holding coins. If your probability of success is three tenths, then you're holding a coin that is gonna come up heads with probability three tenths. Now imagine then that the coins are flipped. So now somebody who had a coin before of one tenth is holding an outcome which is either zero or one. The coin got flipped and we now have outcomes. These are true outcomes. They ran through life and this was their true outcome, okay? So I'm gonna call those the true labels. So a calibration condition says that, or a calibration condition for a set S says that if we look at these 10 slices, the people in S who initially were holding coins of probability a tenth of heads, two tenths of heads, three tenths of heads and so on, each of those gives us a slice. If we look at a slice and we look at what fraction of those people actually got heads when the coins were flipped, that fraction should be whatever their coin value was. So if this is the one tenth slice, then a tenth of those people should have in fact positive outcomes. That's what calibration means for a single set and now we're gonna do this simultaneously over very large numbers of intersecting sets. And these might be a collection of, I mean all these sets have to be defined a priori but they might be, I really like the complexity theory version of it where you might think of these as sets that can be recognized by small circuits. All right, so what do we get? We want to build a predictor that is calibrated on every one of these sets. And so we want it to be well calibrated on the sage eaters, we want it to be well calibrated on the thyme eaters, we want it to be well calibrated on the coffee drinking sage eaters and so on and so forth. And also on sets that may be obliquely correlated to these sensitive attributes or other sorts of attributes. So the algorithm begins with some kind of initial predictor that's going to give a number, one tenth, two tenth, three tenths and so on and some training data which already has labels, zero, one labels. So the algorithm starts with this predictor and it looks then for a set and that has a poorly calibrated slice. So it's looking for a poorly calibrated slice as measured on the training data which is why the sets have to be large. And then it modifies the predictor to do well on that particular slice. And then this process is repeated. So what happens here is that the slices are defined by a combination of the set and the predictor that you have in hand. So when you update your predictor and repeat this process, you're now adaptively defining your sets. And if you're not careful, you don't have anywhere near enough data to handle the number of queries that you're going to do in this adaptive setting. So one solution would be to have vast quantities of data but a better solution and the solution that is taken in the work is to access the data through differential privacy. And that lets you reuse it because it's resilient to adaptive analysis. Okay, so we're looking at some very big problems, right? We're looking at privacy, at least private data analysis and adaptivity which sort of underlies the entire scientific method. And we're looking at fairness. And I want to encourage you to look at really big problems. So here's one that I would like to get to sometime, but maybe you'll do this. Affect of computing and emotional manipulation. So this is from a January 19th, 2015 issue of The New Yorker. Computers are learning to read emotion and the business world can't wait. Software can scan a conversation between a woman and a child and determine if the woman is a mother. This is pretty, this is very personal stuff. And the people who want to do this are not necessarily our friends. And in particular, let's say my advertiser is not my friend. So my advertiser, first of all, as we all know, advertising tries to create demand. We think of it as trying to create demand in a kind of positive way but the advertiser absolutely doesn't have my interests at heart. So suppose I'm sad. Maybe my advertiser will suggest that I say buy some chocolate to alleviate my sadness. But maybe my sadness is totally appropriate and the advertiser certainly doesn't care about that. Maybe I'm sad because somebody died. I should be sad and I shouldn't compensate for this by eating chocolate. So more insidiously, my advertiser may very well want to keep me sad because then I'll keep buying and buying. So this raises the problem of using effective computing to exploit my emotional state for financial gain and manipulate my emotional state for continued financial gain. So in conclusion, what I want to say is, first of all, work on the obvious questions. Make neural nets robust to adversarial examples. Alex was completely right. I made this slide before hearing his talk. It's just, obviously we should work on that. Differential privacy isn't strong yet for social networks. We're going to need a lot of privacy for social network interactions. Social scientists are drooling at the thought of doing this on Facebook data. See the Social Science One project. We want to advance the state of the art in adaptive data analysis, improve the state of the art for fairness. The story under composition, I've been working with Christina Elevento, my student. The story for composition is that things don't behave well under composition at all. And there are some algorithms for dealing with this, but we need a lot more. Also things that I know Tal Rabin has been talking about for a long time. Video can be manipulated. So cameras should be signing things. So we want to prevent the rewriting by video manipulation of history. Use natural language processing to make job ads appeal across gender rather than just to one gender. But don't only work on the obvious. So create a culture, and this requires a lot of talking to a lot of people outside of this room. Create a culture that will demand signed video, that will treat any video that isn't signed as being completely suspect. Tackle fake news more broadly. Define and solve some aspect of the problem of manipulation via effective computing. And find a way of restoring the informational commons, or build a dissonance engine that provides well-reasoned and opposing points of view. Like the best of an Aaron's, the best of a Sorkin episode on the West Wing. So that's it. Thank you.