 Those of you who are from the University of Michigan know that we have two time zones in Michigan. At the University of Michigan, we have regular time, which the law school operates on, and we have Michigan time, which many other parts of the university operate on, and Michigan time starts 10 minutes after whatever the called time is. So I've been here now 15 years, and I am still puzzled by this phenomenon. I'm puzzled both that there is such a thing as Michigan time, and that the law school is not on it. Here to explain that to us is of course Sandal Melanathan, who is an expert on such puzzles. Sandal is a dear friend. We have had wonderful time working together, writing together. I'd say my most intellectually fun experience in the last 15 years was meeting quite often over two years with Sandal and Eldar Shafir. I can't even remember how many hours we met. Many, many, many, many hours to write a seven page paper. So it's not the most efficient paper I've ever written, but it's the most fun. Sandal is the Robert C. Wagoner Professor of Economics at Harvard University. He's currently visiting at the University of Chicago. He has worked on anti-poverty issues, behavioral economics, the impact of poverty on mental bandwidth with Eldar Shafir, which turned into a book called Scarcity, why having too little means so much. Those of you who have purchased that book, first of all, those of you who haven't purchased the book should go buy it. Those of you who have already purchased the book will know that on the cover of the size of the title is scarce, that is, it's quite small, which I thought was kind of a clever joke. He has measured problems of discrimination in the marketplace, shown that higher cigarette taxes make smokers happy, modeled how competition affects media bias, and his latest work, as you'll see, is on using machine learning to better understand human beings in all their forms. And he's also a frequent contributor to the New York Times, usually shakes things up with his columns. I'm told he's about to shake things up the week after the election with an exciting new one where he plays the role of a constitutional scholar. Sendal worked with others to found a nonprofit called Ideas 42, which is a behavioral R&D lab that does great work in the U.S. and globally. And he also serves on the board of the MacArthur Foundation. He's worked at various times in government. When I was working at the Treasury Department, I asked Sendal to come in and be our nudge in residence, nudge not nudge, which he did admirably. And he is also the recipient of the MacArthur Genius Award and has been designated a young global leader by the World Economic Forum, a top 100 thinker by Foreign Policy Magazine. He's on the smart list of 50 people who will change the world according to Wired magazine. And Sendal has other skills. He is great at inventing ridiculous ways to get fit involving large balls and bands and rocks and things and is an aficionado of an extremely good cup of espresso. So with that, please join me in welcoming Sendal to the stage. All right. Thank you, Michael. You know, it's funny that you were talking about time zones. I drove here from Chicago and it's only about, let's say, yesterday at three that a friend of mine pointed out to me that you do realize Michigan is in a different time zone. And I was trying to figure out why everyone in Michigan pretends to be on the East Coast. But we can move forward from that. So today I want to talk about big data and machine learning. So I'll just dive right in. You know, I want to talk a little bit about the promise of machine learning. But of course, commensurate with that, we should talk a little bit about the risks. And before we can do any of that, what I really want to do is I have a showing it's very commonplace to talk about machine learning as if it was some black box that has some universal intelligence. So really I want to spend a lot of the talk at the beginning talking about what is it. How does it work? What's the innards of it? It allows me to put up my current favorite funny slide with probably a third of you have seen. But you know, it'll be fun to keep another time. But I'll really start with this and just talk about how does it work. And you'll see how fundamentally it's related to big data. It's hard to really understand big data in today's era without understanding machine learning. And then from that I'll give it to the other two things. We'll start with what is machine learning. My own journey with it started, I was an undergraduate in computer science actually. And that was in 1993 as my graduate. At that moment I think it's fair to say that there were a bunch of super hard problems. Like just so hard that I think it's, you really felt like there's going to have to be some massive breakthrough to solve these problems. These problems were not like build intelligent machines. They were like, wouldn't it be great if we could get an algorithm that could detect a face? It wasn't even that. I had a fellow student, a PhD student who intelligently dropped out of left economics and went through a five math. And I said to him, you're later. I said, what do you work on? I said, oh, I work on nose detection. Okay? Nose detection. So I said, well, he said, no, no, no, no, sorry. You're not understanding. It's not just me. My entire lab works on nose detection. No, that's not just a lab. It's a whole field in computational geometry detecting noses. It's a hard problem, but one day we will figure out how to detect noses. Now, it seems comical, but if you don't understand how difficult these problems were, you won't understand the breakthrough that came about and how big the change has been. Think about it. Now you can take a photo. And as you're taking the photo, not only is there a nose detector that Daniel really wanted, there's a face detector packed into your little phone. That's why you have the little red square around the face. It's crazy. It's so crazy that, for example, speech was considered just an unsolvable problem. We just didn't know how we were going to tackle it in 1993. Now you can pick up your iPhone. I meant to have my iPhone. And you can say weather. The audio signal for weather is turned into a set of phonemes that then is converted to the word W-A-T-H-E-R and not the word W-H-E-T-H-E-R, which is then converted into a request. What is the weather here? Which is then paired with your GPS device to figure out where you are, which then goes to the internet, figures out the weather and tells you the weather. That's insane. And yet most people respond to that by saying what? Oh, Siri, you're stupid. Someone put a magic trick in your pocket and your first response is, well, the rabbit's not that cute. I mean, yes, it's cute, but I've seen cuter rabbits. There is something you just have to remember how amazing this is. And so here I was when I finished Scarcity with Eldar. I had a lot of free time on my hands. And so I thought, oh, I want to figure out what happened. What happened between 1990s? It's like a Yankee in Connecticut. What's the name of the thing? A Connecticut Yankee in King Arthur's Court. Here I fell asleep underneath the tree and suddenly something happened. Now I have this magic trick in my pocket. What the hell is going on? So what I wanted to do was I wanted to just understand what constitutes machine intelligence. What makes it tick? How does it work? So let's just start with that. Let's start with that magic trick. To do that, this is my favorite slide. And if you've already seen it, we'll screw you, because I'm going to go through it anyway. This is a piece of very simple act of machine intelligence. The piece of machine intelligence is I've got a piece of text. So for example, this is a product on Amazon called the Hutzler 571 Banana Slicer. And a piece of text I'm interested in is I'm interested in the reviews that go with the Hutzler 571 Banana Slicer. You can go to an algorithm and say, here's a review. I bought this in order to speed up cleaning up a banana for my cereal. Any time I saved in that endeavor was spent cleaning this implement, it is not easy to clean. You have to scrub between every rung to thoroughly clean it. So the Hutzler 571 Banana Slicer has this review. How many people think this is a positive review? Raise your hand. How many people think it's a negative review? Raise your hand. There you go. It's correct. As a piece of human intelligence, you correctly get this with a negative review. This is the problem of sentiment analysis. Give somebody a piece of text, have them decide is it positive sentiment or negative sentiment. I'll just go through these because I like these. Great gift. Once I figured out I had to peel the banana before using it, it works much better. All right, how many people raise your hands? Is this a positive review? Yes, that's right. How many think it's a negative review? Oh, what's wrong with you back there? You know what? You should look for the Hutzler 572. Yes, positive review. We'll do a couple more. I think I've got one more. I think I've got two more. Here we go. There's no way to tell if this is a standard or metric banana slicer. Additional markings on it would help greatly. You can all see that's a negative review. I'll do the last one. I do have one more. I forgot. This is my favorite. I tried the banana slicer and found it unacceptable. As shown in the picture, the slices curve from left to right. All of my bananas are bent the other way. So as you can see, this is a negative review. What's amazing about this task as defines most tasks that artificial intelligence was trying to deal with is that humans do it roughly perfectly. Nose detection. We're all really good nose detectors. You can put that on your CV if you'd like. Excellent nose detector. And what we need is a machine to do it. And the AI approach starting in the late 1960s started with this obvious observation. Of course, this must be easy. We all do it perfectly. Even my grad student can do it. How hard could this be? So since we do it perfectly, let's figure out how do we do it? Let's introspect. For example, what makes for a good... And then we'll program up whatever our introspection... We'll program up whatever our introspection yields. So how would we do this for the movie reviews? Well, let's just think about what words constitute a positive review? So let's just take a review, make it into a vector. Once we've made it a vector, we'll introspect and say, what are the kind of words that I think are positive? What are the kind of words that I think are negative? And we'll count. Pretty complex algorithm. We'll just count. So here's a paper that did that with 2,000 movie reviews, 1,000 of which were positive, 1,000 of which were negative. And so we'll call this the trying to copy humans. And in this paper, what they did was they took a bunch of positive words, dazzling, brilliant, cool, gripping, moving. Those are from Michael's teaching reviews. And those are... And so you can say... And now let's come up with some negative words, like rock, cliched. Those are from my teaching reviews. Slow, awful, bad. I always say that the cliched one really hurt. I don't know what it means for a class to be cliched. But nevertheless, you have these positive reviews and negative reviews, and then you just go to the movie reviews and you just count. And you say, oh, how do I decide whether a movie review is positive or negative? How many of these? How many of these? Turns out that this approach doesn't work that well. Given that half-half is a positive-negative, this approach gets about 60%. Which is not very far from flipping a coin. But even the 60% is an overstatement of the strength and power of this approach. In 17% of the times, one of these words appears. In 83% of the times, none of these words appear. So it means the vast majority of times, this algorithm has nothing to say. And when it does, it's barely right. Basically one of my grad students. So in that sense, you can see that this isn't going very well. Now, your response to this is that's because you only have 10 words. You just need more words. I would describe the entire history of artificial intelligence, starting in the late 60s, all the way through the mid-90s as more words. It was a sequence of doubling down and tripling down on this approach of saying, let's just keep going. We just haven't introspected far enough. We just haven't yet intuited what words work. And this approach, as you can see from Daniel's experience with nose detection, really stalled. And you have to appreciate how much it had stalled. The optimism in the late 60s was such that Marvin Minsky, this might be an urban legend, but it's the kind of urban legend no one is intended to go and look up whether it's an urban legend. Marvin Minsky purportedly gave a grad student a problem of computer vision as a summer project. How hard could it be? In 1994, we were still trying to find these noses. Complicated problems like language were just off the map. Sentiment analysis, we just weren't getting anywhere. And yet, here we are with Siri in my pocket able to do all sorts of things, a computer that's able to do things. What happened? What was the breakthrough? What was the deep insight? In other words, what's the magic trick? I think I'm going to tell you a few sort of like throat clearing exercises before I tell you the magic trick, partly to annoy you. The first thing I'll tell you is that there's an excellent radio lab episode called Black Box. You guys, any of you watch radio lab? This is a watch, you listen to radio lab? Only one of you? Wow. Go listen to Black Box. It's amazing. In that episode, there's a really, there's many amazing things in that episode. It's my favorite episode actually. One of the things is that there's a magic trick like this. And then they bring a pen on, a pen and teller of fame, and they ask pen, well, how does this work? And he looks at it and he goes, oh, I know how it works. And they're like, tell us. He's like, are you sure you want to know? He's like, what do you mean? That's why we brought you on. He's like, but people don't get it. You tell them how a magic trick works and they're always disappointed. There's no actual magic. It's all wires and things. So point one, it's going to be a little deflating. Point two, it's going to seem obvious, but I most assuredly am confident it only seems obvious to you because you're wrong. We did not have the smartest people in this field working on this. Some of the smartest minds on the planet working on this, fully incented between 1968 and 1998 because they couldn't figure out this obvious trick I'm about to show you. There's something revolutionary in this trick, it is hard for anyone to see it, but once you've seen it, you're like, yeah, it's obvious. Here's the trick. Stop introspecting. Sure you know how to do it. Who cares how you do it? Treat this sentiment analysis, nose detection, face detection. Treat all of it as a purely empirical exercise. How do you do sentiment analysis? I don't know. It's some alien beings doing it. I'm just going to collect some data and I'm going to get 1,000 reviews, 2,000 positive reviews, 1,000 positive, 1,000 negative. And I'm going to pretend I don't speak this language. I'm going to pretend I'm not human. I have no idea what the hell is happening. I'm going to let the data tell me what makes a positive review and a negative review. Kind of a weird thing to do. Even though you know how to do it, you're going to delete all of that and simply turn it into a blind empirical exercise. And I cannot emphasize for you how blind it is. So an example data set would be 2,000 reviews, 1,000 good, 1,000 bad. The truth is this little trick is what unlocked every piece of machine intelligence you see today. Well, that's not true. Not every. There's a small fraction which was unlocked by just computing power. I think when we saw the success of chess, for example, that's not this. Chess is not a piece of intelligence. It's a fixed game tree. You're just looking forward through a game tree that has no... It looks smart, but it's not. It's just a game tree. What else that you see is pretty much of this variety. So, for example, the learning not programming approach for the most part doesn't produce any surprises. Like, for example, in the movie review case, the most diagnostic positive words are love, superb, great. Okay, for sure. The most diagnostic negative words are bad, stupid, worse. Those do pretty well. You do start to see some surprises. The word still turns out to be one of the most diagnostic words for a positive review. As in the sentence, the acting was shitty. Still, I found myself enjoying it. Now, what's funny about the word still is that it has this weird dual nature. You all obviously know at one level the word still must be positive and diagnostic. Why? You are the ones deciding whether a review is positive or negative. So, when you see it in a sentence, you know it. Yet, I don't think anybody in this room, if we've been playing the sentiment analysis, find a word exercise, would have said, yeah, I want to look for the word still. And that's one of the surprises of these sort of inducting up what makes for a positive or negative review. Let's do a few more. I'm going to show you another very diagnostic one is, and I'm not going to tell you which direction, is a piece of punctuation, question mark. Turns out, question mark is highly diagnostic. How many people think that if a movie review has a question mark in it, it's a positive review? Raise your hands. How many think it's a negative review? Ah, you're all correct. It is, in fact, a negative review, as in, who would like this piece of shit? Let's do one more. How many people think exclamation mark is diagnostic of a positive review? How many people think it's diagnostic of a negative review? Yeah, this looks like it's 60, 40, 60% say positive, 40%. It seems like 60% of you need to get out on the internet more because if the internet has taught me one thing it's that people get very, very excited when they hate something. And exclamation mark turns out to be very diagnostic of a negative review. And this sort of empiricism towards things that we know how to do, it would be like your kid asking you how to ride a bike and you saying, let's just get you on there, you try a lot of stuff, you keep falling down, I'm pretty sure it'll all work out for you. That's basically what we're doing here. It's a very blind empiricism. But it's a blind empiricism that works. We now have sentiment analyzers very quickly got up to about 95% accuracy. Today you can download our packages that do off-the-shelf sentiment analysis for lots of pieces of text in the range of 97%, 98%, industrial level sentiment analyzers which have gotten lots of data and have been finely tuned can get to the 99% range. In fact, one of the interesting pieces of machine intelligence is that when you see all these big advances that are being made recently, the big advance to get us into the 90s was this insight. All the other advances you read about now, the specific things, whether it's a deep neural network or something else, there would allow you to move from 96 to 99, or 96 to 99.99. It's important because a self-driving car at 96 is not a self-driving car, it's a self-crashing car. So the last 3.9%, the last 4% matters. But the bulk of the work what really kick-started it is this empirical insight. And really it's very simple. Turn any intelligence task into an empirical learning task. Specify what is to be predicted and specify what you use to predict it. What variables? In fact, you could say this is just statistics. And I want you to remember that when we come back to the risks of it. At some deep level, a lot of your intuitions about machine intelligence will be sharpened if you understand that it is just statistics. And all the things that you know can go wrong with statistics can go wrong with machine intelligence. There's nothing special about it. It's like running a regression. There's something amazing about it. It's not a regression you would have known how to run, but the problems often start there. There's underneath most machine intelligence that you see. So input, prediction, outcome, feedback. Now, you can start to see why in a conference on big data I'm talking about machine intelligence. Because in one direction big data is essential. It is the data sets. It is the ability to have hundreds of thousands of sentiment analyzing movie reviews that allows us to build machine intelligence in the first place. In fact, there's a maxim which is roughly true, which is that more data almost always beats better algorithm. As long as your algorithm is good enough more data will just allow it to learn. Some algorithms learn faster, some slower, but more data trumps everything. So that's the one direction in which big data fueled the rise of machine intelligence. And if you want to have an intuition for what's happening inside the black box, the movie reviews gives you some sense of it. The sense of it is a complex piece of text that you as a human see with all its complexity. The algorithm sees it through this sort of almost filtered lens. Is this word here? Is this word here? Is this pair of words here? Then what it does is it's got a million instances. It just goes to its millions instances roughly and says what are other things like this that I have seen and where they positive or negative. The richer the other things the better. This is almost a parody of the activity but it proves to be a useful parody. That is the nearest neighbor is heuristic. You've got a large data set you're looking for the nearest neighbor when you see a new instance of the ones that you've seen. So I found that very helpful. And to me the really abiding insight here about when you look at the world around you and say oh my god machines are doing such intelligent things. You could walk away saying wow machines are now becoming super smart. But what it really ought to get you to do is to ask what is smart? Many activities that I thought were super smart turn out to be amenable to this kind of approach. To this nearest neighbor's approach. We can talk at the end if you'd like but there's also a bunch of other activities that you don't particularly associate with intelligence or that much with intelligence but which are not at all amenable to this approach. This is really more saying something about the nature of the problem than it is about machines being on the verge of suddenly being able to do everything in anything. But I want you to think about this because you can already see where this could go wrong and that's where I'm going to come back to at the end. So fast forward here's an example of a task that is very abiding to this insight. This is a mail sorting room from the late 1800s I think this is from Brooklyn. As you can imagine there are people and they looked at the mail and they said oh this is going to this area. They would turn around and put it in the box etc. It's an intelligent task. Such an intelligent task if you think about it reading was one of the highest skills known to man until the probably the late 1800s early 1800s reading was just a very high level skill. These people were engaging in high level cognition flash forward 20 years and now these people are not engaging in high level cognition. This thing is engaging in it. This is how all your mail is sorted today. In fact that's probably been the arc of machine intelligence is to move towards more and more automation. This is what the main mind does. Let's try and automate it. Sentiment analysis, translation, dictation, face recognition, self-driving cars all these things suddenly are automated. I want to think of that as the automation approach. I want you to keep that in the back of your mind because the automation approach is super powerful. What do I mean by automation? These are things where we have humans doing an excellent job, but why should humans be doing it? Let's get a machine to do it. When I talk about what is machine learning, I think the first promise of machine learning is in fact in the automation world. It's that many high cognition tasks are simpler and more rote than we thought. And much like we used big data to build machines, we can now use machines to build big data for us. What I mean by that is that it means that lots and lots of digital stuff suddenly become data in a way that you probably don't think about. I want you to think about big data not in the end sense, sample size sense, but in the qualitative differences that now arise. You can see we're very close. If machines can read language, then why isn't language data? Why not use that as inputs into anything that you do? An example of this, the kind of thing that I could never have done as an empirical exercise before without some form of machine intelligence, is take all the Google searches that are out there. You can imagine starting to categorize Google searches and starting to do something about them. Here's one of my favorite ones of these. A grad student of mine found this. You can use Google Trends and look at searches for the phrase iPhone slow. And for some reason, some mysterious reason, there are enormous peaks on these dates. I wonder what those dates are. Anyone have any guesses? Christmas. Yes, I will spend my Christmas figuring out why my iPhone is slow. No, it's not Christmas. In fact, if you it's anyone bring any tin foil with them? Because I think we may all need it. These are the dates of the releases of iPhones. If you want to believe that it's in Apple's interest to slow down your iPhone, well, here's a bunch of people who when the iPhone 3GS is released, they're suddenly like, why did my old iPhone become a brick? You can go try this. What's amazing is you can try this with things like Samsung slow, Google phone slow, and you don't see that. Why is that? Well, Google doesn't sell devices. Google just sells the operating system. Apple sells both the operating system. Anyway, like I said, you need some tin foil for this. All of a sudden, this becomes data. Data in a way that you hadn't anticipated. People are starting to use this. For example, Hal Varian has a very nice paper where he's able to predict the sales. I know we have people from Ford here. He's able to predict the sales of General Motors cars, Ford Motor Company cars well before their release date by just seeing how many people search for Ford Mustang, Ford, whatever, whatever. Well, most people searching for it are searching for it to think about buying it. It turns out to be quite a leading indicator. And he has this very cool graph where he just overlays it. Another source of new data that you never even think of data. Every, you know, right now, every day there are satellites criss-causing the Earth going around and around in many patterns. And they're taking satellite photos. If algorithms can look at images, why can't they turn those images into data for us? An example of that is if you look at North Korea's official national accounts, they had a 13-year run of around 500% growth. So they were doing a total, total growth. They're not being ridiculous, it's about. Which is a very good 13-year run, but you might be a little skeptical how would you measure your skepticism. Well, here's NASA's nighttime satellite photo of the peninsula there. And this is South Korea. And for some reason, there's like an ocean between South Korea and the mainland. But it's actually tragic if you think about it. We know from this, and this is, ironically, 20 years ago it did not look like this. It was more lit up. Nighttime lights are a measure of economic activity. In fact, people have done this. There are some wonderful papers that have done this in developing countries, and actually you can get very detailed measures of economic activity that are much better than surveys do for you. But why nighttime photos? Daytime photos give you, in agricultural areas, detailed looks at the crop cover. And in fact, people are starting to use that to get a sense of what is the, this is NOAA data, a sense of what is the vegetative health index. And now we're at the point where we can predict crop yields right out in Michigan and Iowa, anywhere, well before the crop is much better than yield prediction. Satellites are really cheap now. We have these data up now, and you can start doing this pretty much real time. Here's another one. This is a rainfall collector. There might even be one around here about, like, rainfall and disease and so on. And these are expensive, and you can only put them in certain parts of the world, and you can't put that many of them. Here's a different rainfall collector. You might also know it as a cell phone tower. How is this a rainfall collector? Because cell phone towers are often sending pings to each other. When they send pings to each other, there is some slight degradation of signal. That degradation of signal has to do with what's in the air, such as rain. And now people have used machine learning to figure out that just by looking at the ping degradation, you get a very, very accurate measure of rainfall, which means overnight we now have rainfall measures throughout the world, anywhere where there's a cell phone tower. All this is to say, once you understand that any kind of digital input can now suddenly be turned into data, it means that we have all sorts of new kinds of data. Data is in the form of text. Data is in the form of images. Data is in the form of audio. When an entire 10K form is filed, all the text in that form is data. When there's an earnings call, all the text in that earnings call, all the cadence of the speech of the CEO. That's data. All of it is now data. That's the weird part that I think the other direction that machine intelligence now opens you up is to suddenly recognize there are sources of data that you would never even imagine to have been an input into the way you proceed. There's exhaust, Twitter, Facebook. There's a great project which is taking all of the radio signals in Uganda and just literally turning it into a stream of language data to understand the political situation in Uganda because now you know what every radio is saying at every point in time. So if you want to do a political analysis of Uganda, that's a lot of data by which you can understand. So you can just sort of imagine that this automation approach is going to give us a very different window into the world and unlock data of a kind that we really don't know normally how to use or think about. And that's one avenue that I think is pretty exciting. People are really working on that. There's another avenue which is there's something funny about automation. Implicit in it is the idea that it hopes to match human performance. Oh, here I have a satellite photo. Okay, I want you to tell me in each pixel how much light there is. Okay, I could have a human do that, but that just gets to be too hard with the scale we need. Now let's get an algorithm to try and do that. But why should we try to match human performance? As Michael alluded to, I've worked on behavioral science and one thing I've learned from behavioral science is that this model of human beings is not quite such a good model. Arguably this is a better model of human beings. And once you see this model, you're like why are we trying to match these people? Why should we be trying to improve on these people? People aren't good at probabilities, they aren't good at inference, they aren't good at attributions. There's a lot of things we're pretty bad at. So let's try to beat human performance. So rather than automating, let's try to get into the world of prediction. That's what I want to kind of do, and give you one project where I'm going to try and go through it in a little bit of detail for about, say, 10 minutes, and then I'll try and zoom back and tell you the bigger picture I think it describes. So this project has to do with what happens to you after you're arrested? So there are about 12 million arrests in this country. Now the first thing that happens to you within about 48 hours of arrest is somebody has to make a decision, a judge has to make a decision about what you will do between now and trial, or now and when your case is decided upon. You get to go home, or will you wait in jail? Now, this seems like a small decision, but it's a huge decision. Because the typical wait in jail is about 2 to 3 months. And in some jurisdictions the average wait is 9 to 12 months. Right now at any moment in time there are 3 quarter of a million people in jail. Jail is where you're basically waiting to be tried. So most of these people, almost all of them, they've not been found guilty of the thing they're charged with. This is just a place they're sitting around. And as I said, 2 to 3 months, sometimes 9 to 12 months. So this is a very consequential decision. What's the judge doing in making this decision? Well, this is true in every state in the U.S. The judge is tasked with one specific goal. You have to figure out does this person pose any flight risk? And any public safety risk. That's the only thing you can consider. So it's like the judge is looking at the rap sheet, looking at the person, sorry, Mike, I don't mean to look at you. I don't want to bring back any bad experiences. But looking at the person saying what's my best guess of this person's flight risk? It should sound a lot like sentiment analysis. We have 12 million such data points every year. If an algorithm can look at a piece of text and make a guess whether it's positive or negative, it could look at a rap sheet and make a guess what's the public safety risk. This problem is interesting exactly because that's the only thing the judge is doing here. So let's do it. So we took data from a large city and basically trained an algorithm on about 750,000 data points to predict each person's public safety risk. And then we said we have a predicted risk and let's see how it comports with what the judge did. So our first finding is here's a predicted risk on the x-axis, the judge's release decisions on the y-axis. And you find a large region in which there's a bunch of agreement. I'm going to say there's agreement here because there's low risk and high release rates. If there's some turbulent waters, this region, I'm going to say this is a region of high disagreement. Why am I saying it's high disagreement? Because you'll notice that these people are being released at very high rates, around 50%. The judge seems to think on average they're not that risky. But the algorithm is saying they're going to commit a crime at a 60% rate. A little hard to believe, the algorithm is saying this person has a high risk, the judge is saying probably not, I'm going to release them at a pretty high rate. So let's find out what happens. Who's right? In the next graph I'm going to show you the algorithm's prediction and reality, which unfortunately falls exactly along the 45 degree line, this is the group of people that the judge is releasing at a 50% rate and the algorithm just, it's a little short, it's at around 60%, a little bit under 60%, actually around 58%. So the first observation is that it looks like there are some very, very high risk people being released at very high rates. At least the algorithm would have predicted them to be risky. And why are they releasing them? Well a natural candidate explanation is they didn't realize they were risky. Of course another explanation is you say well that's just the risk tolerance of judges. They see 60% and they say I'm okay with 60%. So how do we differentiate the misranking from the fact that they're just setting a high threshold? So I want you to go through a thought experiment with me. Suppose you have two judges with similar case loads and judge A is very lenient and judge B is less lenient. So B is setting a lower threshold. I'm not going to decide what the threshold should be, but we can all agree that if we could look at the additional people B jails, in order of risk there should be no misranking. In fact nature provides us with this experiment because there is sort of a pseudo random assignment to judge and judges do vary by leniency and so in our data we're able to look between judges and here's the most lenient judge. I've colored by risk the release set and I've colored by risk the jailed set. Now if you were to get more stringent this is how you should get more stringent. You should jail these people and continue to jail these people. Continue to release these people. Here's what the next quartile quintile of stringency judges actually do. They jail these people who come from all over the distribution. In fact this is quite a bit of misranking. How extreme is this misranking? Let's not decide how many people we should jail. Let's just ask the question what would happen if the additional jailings were generated this way and not this way? What would happen to crime rates? So here I have the algorithm deciding starting with the most lenient judge just jailing in order of predicted risk. The next most lenient judges in our sample jail 6.7% more people and achieve a crime rate that's 10% lower. The algorithm would have achieved for that same amount a 17% larger drop. So about a 70% larger drop. Alternatively you could have said hey, you were so happy with this crime rate drop why don't we go this way and figure out how many people would we need to jail to get the 10% crime drop you were trying to get and the answer is about 3.5%. So if I were to summarize these numbers I would say we could either jail 52% fewer people or cut crime by 70%. These are enormous first order distortions on the extent of rankings and it's true throughout the distribution of all the quintile. We seem to be misranking quite a bit. Okay, so the bottom line here and I'll skip this slide is that it appears that the algorithm, exactly because this is not like face recognition, the judges don't do it perfectly. The algorithm does a very good job relative to the judge at predicting crime. In any absolute sense this is a shitty machine learning task. It can't predict crime very well. Like if you turn this over to machine learning they're like why are you doing this? We can do face recognition at nearly 100%. It's like, yeah, this is 64%, not very good. The judges is 61% and that 3% is worth a lot. So that exercise has gotten me to realize that there's a lot of exercise like that and that there's a whole market that I think should be opening up for decision aids. The judge is doing a complex task. How could the judge do a good job? How could the judge do as well as somebody as an algorithm who has access to 750,000 data points? Which loans to underwrite? Which insurance claim is fraud? How much house can I afford? These are all predictions that individuals are being asked to make. How long will I be unemployed? Think about it, someone gets unemployed and the first thing they need to know and ask is that question. What do we do? We say I'm sure you've got a good guess on that one. But we have all the data, like we do with the judge case, to say to them by the way here's our best guess of your distribution of unemployed given your CV, given everything we know about you. That would be much better than that. Give me another example which is probably the most tragic example I can think of. Which is imagine you start your first day of community college and no one in your family has been to college before. You show up. It's kind of a nerve wracking time. You don't know if you belong. And at registration you have to decide what math class to take. Now it's a tough decision because there's the remedial math class and if you take that you might waste three months of your time doing something you already know and it's not just three months like a well-off undergrad's three months. You're working. Wasted credits is one of the biggest sources of dropout. But maybe you take the regular math class and if you take the regular math class it's possible you do very badly. Sure you were good at math but you were in this high school you didn't know what that means. It's a bad high school. You have no idea. You do badly there. Again that's wasted credits but also pretty it can make you feel pretty bad. So how do you decide this thing? This pretty big problem. Well there's a guidance counselor on the fifth floor. This is funny because that same evening you go home and you're deciding what movie to watch and the world's best data scientists are going to help you decide how you should spend the next two hours of your time. But for the next three months of your time well there's that guidance counselor on the fifth floor. I think there's an enormous missed opportunity we have to fundamentally improve decisions in the social domain along the way that you're already seeing the market trying to creep out improvements for itself. How much house can I afford? Well there are lenders already asking that question but they're not asking how much house you can afford comfortably. They're asking how much house can you afford to pay back if you don't lose our shirts. That's not the question you're asking. You're not asking the question can I buy this much house and you know no you're saying how much can I afford. There's a lot of data now I think that we have the ability to sort of fundamentally provide decision aids for good. And I think that to me is one of the big between that and the new data it unlocks that's one of the big promises of machine learning. Of course everything comes with risks so let me just go through what I think the risks are and I'll stop. The first risk is one must not forget in the entire machine learning literature it's just a dog with a bone. For example in the judge case I gave you could say well you know the algorithm may be racist. Of course it may be racist because what does the algorithm predict? Whatever you tell it to predict not in the abstract sense of whatever you tell it to predict in the very narrow sense literally what do you tell it to predict? I told it to predict rearrest risk. That's all it's going to predict. You can't say well but it may be racist it doesn't know anything about race. Race is variable x49 and this is very basic because in fact even if I don't give it race as an input which we don't hear the algorithm could end up reconstructing race not out of malice not out of anything but because it just has no clue the one variable you give it and optimizing. If you want to give it two variables you should have given it two variables. This is a very important problem the dog with the bone problem so I'll illustrate it with the race case in the race case you'll notice this is the base rate about 48% of the sample were black 33% Hispanic black 82% overall being minority judges release about 57% African American the algorithm left to itself actually good news it's not that racist releases about 59% tiny bit more and actually jails a few more and jails about 1% point less of Hispanics on net it doesn't change at all it could have but it doesn't in fact though I think this is a missed opportunity it's a missed opportunity because if we said to the algorithm our true preferences which is hey I actually care about the crime thing I told you about but I didn't tell you about the racial equity thing I also care about that actually turns out the algorithm can hit this out of the park here's the gain in overall crime it had 24.68% if we say I want you to be exactly as racist or non-racist as a judge 24.64 is the gain we have so we can get the exact judge distribution but why stick with that I'm not comfortable with the level of racial distribution of the judge I want you to match the base rate we can match exactly the base rate which you'll notice means 9% more point more African-Americans being released huge effect and almost the exact same effect on crime it's a dog with a bone but it's very good with that bone if you know what to tell it it can actually help you in this activity too so here the bottom line is we're reducing crime reducing jailing and dramatically reducing the number of African-Americans in jail why because we train the algorithm to say that's another preference we have at almost no cost but I can't underestimate the importance of the dog with the bone case this is probably where every social science misapplication of machine learning comes from I'll give an example many people will do something very close to bail like oh I'm going to predict recidivism risk for purposes of sentencing okay great but you do realize that a judge in making sentencing only one of the things they care about is recidivism they care about a ton of other things the idea that you're going to inform that decision in a very meaningful way is a bit misleading and that's one of the things to keep in mind is these algorithms work well when there's a bone that really is going to help you make a decision take personnel I'm going to use this to hire employees what are you going to train it on well I want good employees what's good well we have performance reviews well flip that question do you really believe that that employee who does really well at performance reviews is good in the only sense that that's what you want in your firm nobody would agree with that they'd say yeah there's some employees who do well at you know but the fifth thing algorithms have no way to deal with because they can't talk to you they just can see the variable you give it so the dog with a bone element is pretty important the second element that's pretty important is that machine intelligence is in some sense just statistics why do I say that because if I told you I was doing statistics all these things would immediately come to mind rare events and tail events changing distributions yes you trained on this data but not on this data but somehow that's gets a little bit lost once you call it machine intelligence but the algorithm has no ability to see rare events that haven't appeared often in the data it has no ability to exactly zero ability to train on data up to 2007 and then have anything to say about data from 2009 on because it's a new data generating process it's not a castigation of machine intelligence it's just how statistics works statistics works with the data generating process is constant when it changes we need more tools and by itself this isn't going to overcome it the final thing I want to talk about is perhaps the vaguest thing but possibly the most important thing I want to call it interpretability risk what I mean by interpretability risk is people have this illusion and that's why I wanted to go through the part at the beginning about how these algorithms are built they have this illusion that somehow these algorithms are built by somebody who knows what's happening inside the algorithm what did the person building the algorithm actually do what they built was an update rule when I got a new data point this is how I will update you can build a little machine that has an update rule but after a million steps nobody not the builder knows where it's ended up that's the nature of the beast we're in right now we built them but we don't understand them and how could we they embody a million data points of knowledge that would be like getting a stated data set and saying I really understand this data set you don't understand this data set it has a million data points same thing here interpretability risk shows up in many places for example I'll show you two and then I'll stop in computer vision people are starting to see interpretability risk so how do you see this so they have done these wonderful things where they've trained algorithms to recognize objects and then they asked the algorithm what are objects what are pictures that look like the object that you claim sometimes it produces reasonable things this is a cauldron and this is a picture of something that the algorithm mislabels a column that's not so bad right but this stuff can get really weird this mislabels of a matchstick ping pong ball sunglasses and you can start to see the algorithm isn't really seeing what you think it's seeing it's just found some images some features of the visual image that could be sunglasses perhaps the easiest way to see this is actually to do something with Siri that I actually want you to try if I have my phone I would do it remember how I said isn't it amazing that Siri can talk about weather if you want to understand interpretability risk rare events just pick up Siri and say don't tell me the weather or for example you could say don't tell me the score of the Cubs game you can even explain to Siri why don't tell me the score of the Cubs game I have it T-vote I want to watch it first of all Siri may ask back what is T-vote but that's another question what do you think Siri will do it will proceed to say the Cubs won yesterday tying up to Siri's 1-1 and it will tell you the score no it will tell you the score if you say don't tell me the weather it will tell you the weather what is happening if you intuit from where I started you know what's happening Siri doesn't understand language how many sentences does it have in its data set of people saying do not tell me the weather by now but none at all it goes to the nearest neighbor the nearest neighbor is somebody asking for the Cubs score tell you the Cubs score that's the essence of interpretability risk we build these algorithms and we put an anthropomorphic bias on them and we presume that they do what we would do in that circumstance they don't and as a result we're opening ourselves to a certain kind of risk through a blind spot we have about their intelligence so let me conclude I think I hope by the end I've convinced you the risks are real but to me the promise of using these techniques is huge and our goal has to be to figure out how to quantify the risks and to overcome them alright thank you very much if there are questions I'm happy to take questions but it's also 5 o'clock so if you don't have questions I'm also happy to call it a day and we can chat after I think I'm getting ready with the mic if there's a question there's a question over there you talk about machine learning how do you foresee ontology to work well with machine learning the algorithm ontology? this is going to be really embarrassing but I'll need you to define ontology for me ontology is about a specific domain and talk about what these things are in the domain they define what things are what things are not and also link these things using specific relations we'll call them explicit relations because these are as a part of the actually it's based on the meaning semantic so it's intuitive you talk about just words the position of the words makes difference because in the sentence we talk about the position how words speak together I think the way I would say it is I think from this work I've learned to both understand as behavioral scientists I've often felt there's a lot of problems with human intelligence but in this work I've learned to marvel at some of its amazing parts of it so just to illustrate all the data we choose to give the algorithms are things that we chose to measure the algorithm cannot see anything that we haven't either actively or passively measured so in this judge example okay the algorithm is doing pretty well wouldn't it be nice to be able to ask it what is it that you think you're seeing so then we can go and do measure that and in some sense that's the deep sense in which there is understanding locked in our head that unless the algorithm and us can talk there's no way to get to that next stage and that meaning of a bunch of things will always be locked in our head because until we know to measure that thing until we know to say to somebody somebody has a brilliant insight in 19 whatever grant of Edder's paper was hey we could measure social networks until then we didn't really measure so whatever it might be measurement is such a central activity we understood it one more question this is kind of a related question to that one so for this kind of task how important is domain knowledge to be able to do a good job with these I think in terms of applying these machine intelligence things I have felt like nearly everything that I've seen that isn't spent a lot of time on the domain ends up being relatively useless so even the choosing of the bail example required a lot of expertise not my expertise but my co-authors expertise on the fact that yeah actually that's a place where prediction is super valuable and from there on every step of the way it proves to be very very important because right now these algorithms are so easy to run once you have a data set you can download a random forest package and just run it the actual implementation is not nearly as hard as in the posing of a good question and then all the computer science that comes from being able to do better than the out of the box package I think those two skills are super useful but being able to pose the right question which is a feature of the domain proves super important so let me stop there thank you very much