 Good morning, Madrid. I am delighted to be here, and I'm delighted to tell you all about this thing safe, and I suppose the alternative. Before I go further, I want to mention that I, besides serving as our head of decision intelligence at Google, I also blog for your amusement, and I'm happy to say that not only do I blog in English, but now there are many translations of my work to Spanish, so if you're interested in any of those things, you can go to the top of my Twitter page, and you can find the links there for the English and the Spanish, and that's my handle up there. If you're wondering what that word means, in Latin it is the Latin opposite of data. Data is that which is given, whereas quesita is that which is sought. That missing quantity, what we are looking for. And that to me has always been somewhat more relevant than data that which is given, that which is just sort of lying around. It's what we want to do with it. That seems the important thing, and that's kind of going to be the theme here as well. So let's dive in. This is going to be a series of nuggets, sort of thinking tips that will help you stay safe if you are leading projects in an AI future or consuming AI products. So I suppose that's going to be everybody. So let's dive in. First things first. There's a lot of science fiction hype in this space, and that hype was necessary back in those days when AI sounded cool, but you couldn't actually do anything with it. And the reason that you couldn't is that it needs data, and you'll see why. You might be taking this for granted, but sometimes you actually need quite a lot of data to get things done with AI, and you also need the computing power, the hardware to move it all around. And the terms have been around since the 50s and 60s, but the computing power wasn't there. And so a lot of this stuff had to sort of be hyped up. There had to be an appeal to science fiction. Otherwise, how will all those PhDs doing their dissertations, how will they justify what they're working on and get grants? The thing is, today, with the rise of cloud, anyone can rent a data center. That's pretty much what cloud lets you do, right? You get to borrow somebody else's computers and give them back when you don't want them anymore. So that makes any kind of data and large computing power task. It makes that a triad before you buy a proposition, right? So maybe better to rent a data center rather than build your own if you're not quite sure if you want to be in the game. So that's one of those steps that started democratizing all this. And today, it's real. And we don't need the appeal to robots anymore. We don't need that fluff and nonsense. AI is actually a really useful tool. And now it's possible to use it, but all that sci-fi hype, it makes these appeals to personhood. And I actually think of that as pretty dangerous. It sets the wrong expectations. And it also distracts you to me in many ways. This reminds me of a geology professor perhaps using a pet rock in a little rock with a face drawn on it to teach geology. When it has a face on it, people pay better attention. But watch out. Don't cross that line into teaching a pet rock psychology instead of pet rock geology. AI is just another kind of pet rock. It's easy to build a little doll with a face drawn on it and then plug AI systems into it. That doesn't make this thing anything like a person. It's just a tool. And in fact, what a lot of people don't realize is what precisely it's a tool for. We've got to understand that before we can make use of it and make use of it safely. So what is it actually a tool for? Before I tell you that, let me remind you about something to do with tools. A little bit of rhetoric, this is fear. Is AI better than human? Folks, any tool that is not better than human is a tool we don't use. A hammer is better than my hand at knocking nails into the wall. That's the point of that tool. And a bucket is better than my hands again at holding water. A calculator is faster than me at multiplying six digit numbers together. So the point of our tools is to be better than us at something. And if it isn't, throw the tool away. That's kind of how it works. So what is this a tool for? Let me remind you what we want with computers. Computers are information transformers. You take inputs, you turn them into outputs. Via a recipe, or to use the fancy word for recipe, via a model. Now, where does that model come from? And does that conversion from input to output? In traditional programming, here's where it comes from. A developer, a software engineer, you know this developers, communicates with the universe in some way, yes? Thinks really hard about the problem. And comes up with those instructions by thinking, by meditating. And then codes them up in a language that the computer can understand. That pretty much is traditional software development. So in other words, traditional software is about expressing your wishes with instructions. Okay, so what's the difference with AI and machine learning? Simply this. Instead of using instructions, you're gonna use examples to get your wishes known. And then it is the AI algorithm or the machine learning algorithms job to find patterns in those examples and automatically make that recipe or model for you so you don't have to think about it anymore. That's what it does. So what is it a tool for, my friends? It's a tool for writing code. So if it automates anybody's job, it automates it. Software developers. Now software developers have the job of automating everything else. But that's the funny thing here. It automates some part of that software engineering process. But don't worry, and it's not like software engineering is gonna go out of style. You still need to do a lot of engineering to move all that data around, to get those algorithms to accept it, to build proper production-worthy systems out of these prototype recipes that you get by finding patterns in examples. But it changes the way that software development will work. Now, why is this exciting? Well, think about the task of figuring out whether an image has a cat in it or not. Think about how you would automate that task, right? In comes a photograph, out must come the label cat, not cat. If you had to come up with explicit instructions, what would they be? What would you tell the computer to do with each pixel? Do you even know how your brain figures out whether an image has a cat in it? We've had eons of evolution, and our brains just do it. We have no idea how they do it. We're gonna write, like, look for ovals and look for triangles for the ears. Is that really gonna work? Wouldn't you rather say, here's a bunch of examples, you figure it out. And the thing is, the beautiful thing is, that we already, as humans, we have these two modes of communication with one another already. Sometimes we prefer to give explicit instructions. Sometimes we prefer to say, hey, watch me do it, then you figure it out. So we already communicate in both of these ways. But with traditional software, we could only talk to computers one way. Machine learning, AI, unlocks that second mode of communication. So it is, in fact, a revolution in communication. How we can talk to machines to write code. And what it also means is that you can automate the ineffable. It doesn't matter if you don't know how to do the task yourself, if you can't come up with those instructions. If you can come up with examples, data, then you might just be able to succeed at the task. And that is why you should be excited. This is a fundamental leap in human progress. This is about unlocking a whole new class of possibilities. Forget those robots. This is real exciting stuff for the business. We can suddenly start automating things that we couldn't automate before. It's amazing. Software development has just taken a huge leap. But now let's start building some of those nuggets for staying safe. And because we're all fast asleep today, I want to wake us up by having you participate in a task. So everyone please shout for me. What I want you to do is I want you to be my AI system. And we are going to do a cat not cat classification task. So I'm going to call out the number in the image. And I want you to shout at me cat not cat. Can you do it? Can you do better than that? Yes. Okay, let's go. Number one, cat. Number two, not cat. Number three, cat. Number four, cat. Notice with number four, if you were looking for only two triangles, those three ears would, and three eyes would make a problem there. Number five, not cat. Number six, I'm sorry, what? What options did you have? Cat or no cat? And what did you say to me? Big cat is not an allowable option, my friends. Maybe cat is not an allowable option. You, the system, can only issue one of two outputs, cat or not cat. Huh. This is a cat or not? I don't hear a consensus here. You see the problem? This is not objective. The right answer here is whatever the decision maker needs the system to do. The right answer depends on the purpose for which the system was built. So, since we seem to be missing a decision maker, I will step into those big boots, and I will tell you that the purpose of this system is to be a pet recommendation system, and if it says cat, that animal had better be in its typical adult form, safe to cuddle. Okay, let's try this again. Number six, not cat. Amazing, and if you're still saying cat, please take out more life insurance. So, I hope that you see here that the objective is subjective. What the system is designed for, what its purpose is, depends on whoever made it, and what the right answers are depends also on what those people need it. And so, you can't expect to build some system without thinking carefully about the intended behavior, or that you can just pick up someone else's AI system and expect it to work correctly for your needs. Because if I built this thing to say not cat whenever a tiger photo comes up, and you are instead using it for something else, we're saying cat would be the right thing, my system is not going to work for you. And so, it really matters that you think carefully at the beginning. What are we doing here? What's the purpose? In fact, the decision maker is the person with the most important role in AI. There's a lot of this hype stuff. It makes you think that maybe it's the mathematician or the researcher or the engineer or the data scientist, someone like that who's the most important. It's actually the people who decide on the purpose and decide how the system will be evaluated. And whenever you encounter any kind of system, whether it is AI or traditional software, to stay safe as consumers, you need to also remember this. The objective is subjective. The system was built for some purpose. If your purpose is not that purpose, the system might not work for you. So what was that purpose? What were the incentives? Always think about that stuff. Don't trust these things blindly. All right, next up. Decision makers. Would you prefer a worker who is reliable and does exactly what they're told? Or an unreliable worker? It doesn't feel like it sometimes. Which one is better? Which one is safer? What do you think? What do I hear from you? Talk to me. Reliable or unreliable? Which is the better worker? All right, I'm hearing you say reliable. And for that, I'm going to say watch out. Because the answer should be a strong, it depends. And what does it depend on? The skills and abilities of the decision maker. Because you see, if you have a brilliant, wise decision maker who's really good at setting goals, at setting objectives, then you want the dependable, reliable worker who does what they're told and tries to meet those goals. But what if you have an idiot who sets terrible goals? Then the unreliable worker is the one you want because that's the person who's going to stop this stupidity from scaling. And what I also want you to realize is that relative to machines, humans are pretty unreliable. We've got a whole host of incentives. Sometimes we feel like sitting outside playing in the sunshine, hanging out with our kids or on the beach. We are not single-mindedly following some objective. And in fact, because we, all of us, have all these different incentives all pulling in different directions, that puts a little bit of a break on silly objectives. Whereas machines, you can clone the same incentives over millions of them, and they all just optimize towards that. So then, they're going to amplify something. Let's hope that they amplify intelligent decision-making rather than foolish decision-making. So again, the skill of the person in charge is pretty important here. What instruction? What objective? What goal was set here? Who set it and with what skill? This really matters, especially with AI and machine learning. That scale you up so quickly. In fact, let me show you something. You want to know where the real danger of AI lies. Because all this AI-ethic stuff and AI-danger stuff, there's a lot of rhetoric there that literally you can take out the words machine learning and AI, you can replace in the word technology, and it still holds. It's not AI-specific. So all that stuff about disrupting markets, changing how labor works, changing human relationships, societies, compact systems you can't get rid of, all that stuff, you don't need data and AI for that. So what's the thing that is the thing that is specific to AI and machine learning? Here it comes. Two automated tasks, the traditional way. Let's say it took 10,000 lines of code. Some human had to have agonized over each little part of that instruction set. Each little line in those 10,000 lines. Maybe not the same human, but some member of our species had to actually think about it, otherwise it couldn't be written. Now, with machine learning and AI, there's actually only two instructions. Optimize this objective on that data set, go. Okay, now this is beautiful because you can get tasks automated without thinking too much about it, right? Before, you had to think carefully about what all those instructions were going to be to try to get the system, say, to separate cats from not-cats. Now you can say, my objective is accuracy, be as accurate as possible, and here's my folder with cat and not-cat photos, off you go. Now it actually looks in practice, it looks like there's more lines of code, there's all this engineers sitting there sweating over their machine learning code. That's because the tools kind of suck today. If you guys have had a chance to play with things like TensorFlow 1.x, there was a lot of code writing there, but that's all dealing with the boilerplate, really. One day, the tools will be so good that you will be able to express yourself like this. In fact, in some spaces, like with AutoML, it kind of is like that. Just two lines, those examples, this objective, off you go. And it's beautiful because you can think less and get tasks automated, which is wonderful for small tasks and personal projects, and tasks that are fun or artistic and don't have a lot of impact on the world. But for building large-scale systems that impact humanity, this is a thoughtlessness enabler, and thoughtlessness can be a hazard. You have to put, to really ethically develop AI, you have to put as much effort into these two lines as you would have into those 10,000 lines of code. Not like only one line's worth, but 5,000's worth, 500,000's line, 500,000 lines worth into each of these. And who does this? The decision maker. Not the engineers. And unfortunately in today's projects, often the decision makers are absent. They think they can hire a bunch of nerds and tell them to sprinkle machine learning over the top of the business, off you go. But this is actually down to the leaders, the people who are saying why the systems exist in the first place. So to do this ethically and safely, you really have to honour these two lines when the task has the potential, when the system has the potential to touch a lot of lives. And that is why Nugget 4 is to think about this in the way that you might think about those magic lamp stories. In fact, AI and machine learning, they're a little bit like a proliferation of magic lamps. And in all those stories, all those be careful what you wish for stories, it's never the lamp or the genie that's dangerous. It is the unskilled wisher who doesn't know how to think about setting wise objectives. And so that is a skill that we are all going to need to develop in an AI future. Let me show you some humorous examples of this. First, not AI examples about saying what you mean. So I enjoy this fellow. You can find him on Twitter. He's called James Friedman. His handle there, fjme013. And what he does is he does some Photoshop fun. So people will submit photographs to him and he'll Photoshop them as requested. So here's a request. It says, can you please edit this picture so my brother and I wear the same t-shirt? Ta-da, job done. Here's another one. Just this picture to make me look as if I am holding that cliff. Right? Now, James does this on purpose because it's funny. Whereas your AI system is going to do it just because you gave the examples foolishly. Not on purpose. Here's an example. So some researchers at DeepMind wanted a system to learn how to navigate a terrain, right? Learn locomotion. And in setting the objectives, they failed to mention the arms. So they got this arm-swinging idiot, right? Which is pretty funny, but you get this kind of behavior because you say the legs are important and the arms are not. This is the kind of thing that you should worry about with AI systems. Who gave those instructions? Did they give them wisely? And did they give them in an iron-clad way so that the letter and the spirit of the objective are the same? They have to be the same with AI. We have to be really careful. And we have centuries of bad habits on this where we use really sloppy language and we just kind of expect people to figure out what we mean. There's a special kind of boss out there. The kind of boss whose employees say, I never give my boss what the boss asked for. I give them what I think they wanted. You're lucky your humans do that for you. Your technology will not. And so that kind of boss is a hazard, again, tasks that are mission-critical and that matter at scale. You can't have a person like that in charge of these projects. You need people with the skills to be very careful about the objective set, to think about them carefully. AI is not magic and it can't read your mind. It won't give you what you thought you hoped you were asking for. It'll do one of two things. Either it'll give you nothing and then it doesn't leave the attic, it doesn't pass testing, it'll release it, or it'll give you exactly precisely what you asked for. So develop that skill of thinking about if the system were as perverse as possible and were trying to mess with me, what's the worst thing it could do with the objective that I set? That's a skill you're going to have to develop both as a consumer and as a leader to think through objectives in this space. So I love this quote from AI researcher Alex Erpin. He says, I've taken to imagining, he's talking about his area of deep reinforcement learning, this actually applies throughout all of machine learning. I've taken to imagining machine learning AI as a demon that's deliberately misinterpreting your reward, your objective, and that's actually a productive mindset to have. I completely agree. If we sort of just expect that it'll work and it'll figure us out, we're setting bad expectations here. This is just technology, it is not magic, it doesn't read your mind. Oh, by the way, I hope you indulge me. I am in this talk using the terms machine learning and AI somewhat interchangeably. Technically, they are different. And as an ex researcher, I'm sympathetic to the fact that people do like to, in academia, they like to make a distinction between them. But I'm using them interchangeably because quite frankly, I give up, industry uses them as synonyms again. That's a, as well as a fight that I'm not going to fight. And in fact, the real reason is that my new favorite definition of the difference between AI and machine learning comes from Matt Veloso. Here's a quote, I hope you'll like it. If it is written in Python, it's probably machine learning. And if it is written in PowerPoint, it's probably AI. There you go, a little humor to wake you up there. All right. Now because AI systems can't read your mind, how confident are you that you actually did give your instructions really well? You might have hoped you did. But what I want you to consider when you develop these systems is that maybe you didn't trust no one. Don't even trust yourself. What didn't you think of? What could go wrong? And do you have safety nets in place just in case it does? Always build safety nets. So here's an example. So this fellow, BJ May, is another Twitter situation, I quite like Twitter I guess, another Twitter situation that amused me. So he said that his smart front door system, and there's a smart locking system through his nest camera on his front door, that recognizes faces. If it doesn't recognize the face of someone who's supposed to be in the house, then it automatically locks the door. And so BJ May is complaining that he was locked out of his own house by the system because it was protecting everyone in the house from Batman. Right. Okay. So here's the thing. This situation ended just fine. It's just a nice joke on Twitter because the people who developed this system had thought about possibilities like this before and had designed the system to have other ways to get into the house. You could put in a pin code to overwrite the thing in case this happens. If those engineers had never thought that a mistake was possible until poor BJ May was locked out of his house, who knows, maybe poor BJ May would still be locked out of his house. You cannot think about this stuff after the mistake happens. You have to consider that mistakes are possible and develop safety nets alongside the system. Don't just trust it. Always have backup plans. And so Nugget 5 is think like a site reliability engineer. Site reliability engineering is a discipline started at Google and it's all about keeping large scale systems safe and reliable in production. And when I say to site reliability engineers, I have a slide and I say, what happens if there's a mistake on some slide? And I say, what's the typo on this slide, my friends? They all shouted at me, not if, when, correct, when there's a mistake. Mistakes will happen. In fact, if there is one single most dangerous thing that both producers and consumers of AI systems can do, it is to forget that mistakes are possible and to trust these things is 100% perfect. The goal with these systems is to pass a performance bar and that bar might be really high. That performance bar might be what you might call better than human performance. That is not the same thing as perfect. And you have to see that difference. Mistakes are possible. You need to remember that. Always, on both sides, if you're building or using these systems. And so make sure that there are safety nets. Don't use systems without them. Now, AI lets you automate the ineffable. Things that you can't say the instructions for, you can still get automated. Amazing. So how would we trust a system where you couldn't say the instructions because they were so complicated or you couldn't even wrap your head around them that were so complicated? Are you expecting to sort of read them and understand what they do if they were too complicated for you to come up with in the first place? You shouldn't expect that. You should expect that probably the reason you couldn't automate it the traditional way by just saying how to do the task is that it is too complicated for our human memory. The wonderful thing about machines, though, is that their memory is much better than ours. See, if you ask me to start reading a spreadsheet and it's got a million things in it, by the time my eye has gone down to the millionth one, I've forgotten most of the ones before, but not a computer. That's the advantage of the computer. It remembers all the examples. It remembers them in parallel, all together, all at once. It can manipulate them in a way that I can't. And similarly, an instruction set that is a million pages long. If I start reading that, I'm not going to make sense of it, I'm going to get so bored. Not because it's difficult in some intellectual way, but just because it's boring and there's a lot of it. AI models, AI instructions, recipe sets, they are long and they are boring. When you start reading them, you forget what you've read. That's the sense in which these things are black boxes. For any of these systems, you could open it up and you could start reading that model. It just wouldn't mean anything to you. It would be a whole lot of mathematical garbage. That's because the instructions are complicated. Now, you could say, I don't want to deal with anything that has complicated instruction sets. Okay. That means that you are limiting the tasks that you can automate only to the things that a simple human mind like mine could wrap itself around. And you know, we're not that impressive as a species. If I start reading the digits of pi, I'm going to start forgetting them after about the tenth one. Again, not so with a computer. We've got limited short working memories. That's really what we are doing here with AI. We are taking advantage of systems that do have good memories for pulling patterns out that we can't see ourselves because we can't remember what we've read. Okay. If you stick with only interpretable things that you can understand, you're only going to be able to automate simple tasks and you're not going to be able to go past those low-hanging fruit and make progress in those areas that are a little too much for us individually. Okay, but how can you trust it then? You can't read it, so how do you trust it? Well, here's what I'll say to you. Don't trust AI. Don't trust humans. Don't trust yourself. See, I've got some trust issues going on here. Trust no one. Instead, force everything, human and machine, to earn your trust by testing it. Think about how you would figure out whether a human student knows calculus or not. You want to build a human student to do calculus. What are you going to do? You're going to take a scalpel and open up their brain and look around and see how the brain implements the calculus to figure out if the student can do the calculus? No, of course not. That's silly. That seems to be what people want to do with these AI systems, right? Opening it up and reading the instruction set, reading the model, that is the equivalent of poking around in the student's brain with a scalpel. What are you going to do instead? What's the much better way? Something you can do with students and with machines? Give them an exam. Test them properly. Curate the inputs. Make sure that these are relevant to the sorts of tasks the student needs to perform later when they're a mathematician. And so, calculus of the high school level and calculus of the PhD level are going to be different things, with different textbooks, different exams. And then, look at the output of the system, score that output, and if the score there is good enough for your tastes, then, and only then, can you say the student or the system is qualified to do the job. And so, the exam matters, and who sets the exam also matters. Whenever anything goes wrong with an AI system, one of the questions that I ask immediately is, who tested it and how did they test it, and what did they test it on? Let me show you an example of very silly testing. So first off, we're going to train a system, right? We express our wishes with examples, so I'm going to teach it to categorize three different kinds of things. My two cats, Huxley and Tesla, and Banana, those are my three categories. And I'm going to give it more examples than just these, that's for illustration, but I want you to imagine that there's many, many more examples in the training set. We give the examples in, it finds patterns, it makes a recipe, and then we're going to test it, and we are going to see that, oh, it seems to have perfect performance. What went wrong here? Did anyone notice that the photographs in the test set were the same as the photographs in the training set? How do I know that this isn't memorization? How do I know it can actually pick out the generalized differences between Huxley and Tesla and Banana? From this, I can't. It is possible to beat this exam through pure memorization. So what's a better way to do the testing? Give it new examples that it has never seen before. Then, and only then, can I assess whether it's generalizing, whether it's not doing this by remembering the answers, just like a calculus student. Professors, the dumbest way to test your students is to give them the same examples that you did with them in class on the final exam. You can beat those by memorizing. You don't have to understand anything. You don't have to generalize beyond. You can just sort of memorize this example, had a red lorry in it, and so the answer is 52.7. And then when that student goes out into the world, everything crashes and burns. Test on new data. You would be amazed how often in industry this one basic thing is not followed. Right? It's from new data that you can actually find out whether the system is doing what you needed to do. It's from testing on new data that it earns your trust. Always use pristine data for testing. This word overfitting, it's the curse word, the profanity for AI people. We're always avoiding overfitting. Everything is avoiding overfitting. Overfitting, think of it if you prefer to use simple language as memorization. Memorizing the noise in your data instead of the relevant patterns that also live beyond the training data that you used. The little peculiarities of the homework problems rather than the general stuff about calculus that the student is supposed to learn. The way to beat that test with data you have not used for anything before. So nugget number eight follows directly from this, my friends. You have to make sure that you set aside some data that was not used for anything before. And this is a habit that a lot of teams simply don't have. They get so excited to start playing with their data that they take the whole lump of it and they shove it all into the system. And then sometimes they don't have any more left over. So they're like, okay, well, whoops, I guess I'm going to evaluate on the same data. And then later on. So watch out. Always, always, always split your data. Now let's suppose instead that on this fresh data we had great performance. 100% accuracy. This means that the system can separate my cat, Tesla from my cat, Huxley. Yes? Careful. Don't jump to conclusions. As a recovering statistician we have this job of being grumpy and pedantic all the time. So some of that pedanticity that's not a word. Some of that pedantic impulse that has stuck with me is to be very clear about what things actually mean. So what does this mean? All it means is that on a test set of some particular size on these examples the system happened to perform the task well. I will never say that this means that the system can in general recognize the difference between my two cats. And yet that poetic language especially that marketers and salespeople like to use is that this is in fact a Huxley-Teslege generalized detection system. No it isn't. Don't jump to conclusions. Look closer. Turns out that in the training set every time we had Tesla we also had a radiator in the background. And with Huxley never a radiator. This system is trying to separate group one and group two and finds a convenient pattern that grill in the back. And then what happened with the test data? Same again. Same kind of data set taken the same way. This is not a Huxley-Teslege detection system. This is a radiator not radiator detection system. This is not a problem if all the photographs I'm ever going to show this system are taken in this manner. If Tesla is always photographed with the radiator and Huxley is never photographed with the radiator then for my business purposes this system will do in place of a Tesla-Huxley detection system. It will do the job, it will perform and I can trust it to work in that environment. So my friends what environment is it that you have built your system for? Make sure you think about that. Remember at its core we are expressing our wishes with examples. We're supposed to communicate with examples. When we pick dumb examples to communicate with all kinds of things go wrong. So I tell Google engineers when we're near machine learning they should consider tattooing this sentence on themselves. The world represented by your data is the only world that you can expect to succeed in. You will do well in that world but watch out what if your world changes what if someone tries to use your system outside the world represented by your data. You will get to fail. That is the prudent thing to do then trust it there. Just because your student has learned how to do calculus doesn't mean that they suddenly know how to do architecture. Watch out. And machine learning turns patterns into recipes that's what's going on here. Patterns and examples in data you put garbage examples in what you're going to get out? Garbage. Express yourself wisely thoughtfully. And here is the thing the moment where we get to the AI bias part of the talk. This whole conference is about AI and ethics and responsible AI so we've got to talk about AI bias. When you think about data sets I want you to think about them like textbooks that you are giving to your machines. And I want you to remember that in fact a textbook is a data set it may or may not be in electronic format but it is a data set and a data set is a textbook kind of the same thing and like textbooks data sets have human authors a person wrote them. They're in fact a curated set of memories you need to think about who that author was and if that author was a horrible biased prejudiced individual that might affect the nature of the textbook and if your student has only this textbook to learn from why should you be surprised if they pick up the biases of those authors? And unlike humans these AI systems they're so innocent they don't get other inputs from elsewhere they just learn the patterns that are in here that's it you give them bad patterns you get bad results let's remind ourselves what this bias thing is bias has a lot of definitions as statistical bias which is actually not the bias that we talk about here statistical bias is results that are systematically off the mark algorithmic bias this stuff when we talk about bias and fairness in AI it occurs when a computer system reflects the implicit values of the humans who created it if that's the definition, that's from Wikipedia if that's the definition is it possible for a system not to be biased? Is there anyone whose past experience doesn't affect their implicit values in some way and whose implicit values then don't drive what they do next? Have we as humanity stopped growing? Do we believe that what looks to us like fair and good today will still look fair and good to people in 500 years time? Maybe they'll think we were a bunch of barbarians talking about the difficulties in creating unbiased systems the semantic impossibility of it does not give anybody a license to be a jerk we should strive to do our best and create systems that reflect the best of us and we should worry about who those authors are but I wonder if it's fair to blame the authors of a textbook for a student learning silly things maybe it's a little better to blame the teachers because what kind of stupid teacher gives a student a textbook that that teacher hasn't bothered to open to read and to think about whether this textbook was even appropriate in the first place that is a terrible teacher of course textbooks have human authors should check that it's an appropriate textbook and think about whether the student learning from this textbook is going to create a student that the world doesn't want that's the teacher's responsibility so open your textbooks my friends it's very easy with AI to say my data lives in that data warehouse in that database in that spreadsheet without opening that thing without putting your eyes on it then when it turns out to be full of garbage or bad patterns or prejudice it is your fault as a teacher when your student learns that same thing quit blaming AI for AI bias blame people people make data if you say bias comes from data it comes from people and in fact I'm going to attempt the Hemingway Ernest Hemingway he had a lovely short story in six words on AI bias the whole problem and the solution to here we go AI bias inappropriate examples never examined that's where the problem comes from how you solve it is by making sure that you examine it let me remind you here even this was a gentle example of bias I was the person who picked these examples I was the one who said this style of photograph was appropriate to this task I might have created the dataset because for me in my world Tesla is always by the radiator you should open it and have a look and have a think about what my incentives and biases might have been in creating this and what might be inappropriate if you borrowed my data and used it for your own tasks of course here's going to be a little problem what if the author of the textbook and the teacher are the same person those prejudices aren't going to get caught I made it seem appropriate I opened it up still appropriate off we go I'm not going to find that there was anything bad until the student goes out into the world and starts doing bad things is there a way to prevent this kind of thing yes there is more authors and don't just have one person checking the system get a diversity of perspectives the more eyeballs you get on that textbook the more likely it is that someone's going to say what on earth is that this isn't good for the kind of task it's going to negatively affect this whole group of users diversity is important in fact when it comes to machine learning and AI diversity is not a nice staff it's a must have without it if you make large scale systems that affect the planet you're going to get terrible results you need to have different sorts of people bringing their diverse skills and perspectives to thinking about what exactly we are doing when we say optimize this objective on that data set or textbook go and if you keep all these nuggets in mind that is how you build a safer and brighter future and I want you to see that this is so human this stuff is fundamentally about the human side of AI and when we talk about it as autonomous and robots and all that that is the part that we're forgetting and that's the part that we cannot forget so I hope that as you go forward and as you build and consume these systems you stay really aware of that human side that's how we build a safer, brighter AI future together thank you very much