 Please welcome him. Hello, everybody. So I'd like to try and answer a question today. I'd like to try and answer this question. What is this machine learning thing anyway? Because it seems like wherever you look, especially in the Python world, you're hearing about machine learning. It's winning at board games. It's driving cars. It's telling you whether a transaction on your website is likely to be fraudulent. But what is it? A lot of the introductions on this kind of take one of two routes. The first route is that they look a bit like this. Reading code examples without context can be kind of helpful if you're trying to pick up a new library for a familiar task. But if you're trying to get into a whole new area of development and you don't know what's going on behind the scenes, then this can be kind of difficult to grok. So let's look at some stuff behind the scenes. That tends to look more like this. Again, in the right context, that's fine. If you're from a math background, then this might be a language that you speak. You might be able to understand this, but if you're not from a math background, then this is just hiding some fairly simple ideas behind some unfamiliar and relatively complex notation. So we're going to try and take the middle ground here. We're going to try and not really get into implementation. We'll do the ideas behind the implementation, but without all of this math. And the goal is that you'll be comfortable enough with the landmarks of the machine learning environment that you'll be able to kind of go away and explore further and raise billions of dollars of VC money. So the first question we need to answer is what can machine learning do for us? If we're going to understand what this thing is and how it works, we should understand what its goals are, what it's aiming for. So here's a programming problem that I tried to solve recently. I needed to read some recipes from the web, and I was particularly interested in understanding the ingredients. I wanted to know three things about each ingredient. I wanted to know the quantity, what unit of measure was used for that quantity, and the name of the ingredient. And the first two recipes I looked at, all of the ingredients were structured exactly like this. It was a number, then a word describing the unit of measure, and then the name of the ingredient. And I thought, this is a string processing problem. I'll just split on white space. It'll be fine. And then I looked at a third recipe and thought, oh, maybe I need a regular expression. And then I looked at another recipe and thought, maybe I need two regular expressions. And I ended up with a list of things like this. And I had to throw out my hands and admit, I don't know what the rules are to parse these things. Think about what you do when you're programming. You're writing a very specific list of instructions that the computer is going to follow. And we use all these nice abstractions, like objects and functions, to make it seem like it's all these high-level ideas. But really, it's a linear list of things that the computer is going to do one after another. You're making decisions and writing them down, and the computer is just playing those decisions back. That's great if you know what you're doing. But with this list of ingredients, I didn't know what I was doing. I knew there was some relationship between this text and the information I wanted, but I couldn't describe that relationship in enough detail to write software to extract that information for me. So I was a bit stuck. What I needed was generalization. Generalization is the killer feature of machine learning. This is why it's useful and why it's interesting. It's the ability to take input that we didn't explicitly consider when we were designing our system and do something useful with it. So there were too many different ways of writing a description of an ingredient for me to consider all of them. So I needed this property of generalization. I needed the program to be able to handle an ingredient description that I hadn't even considered when I was writing the program, and do the right thing. That sounds really hard. I've just said that when we're writing programs, we're writing a very explicit list of instructions for a computer to follow. And now I'm saying that we want that computer, while following those explicit instructions, to come up with seemingly new ideas for itself, to generalize to things that we didn't personally consider while developing the software. That seems almost contradictory. But fortunately, you've probably built generalizing systems before. I'd be willing to bet that the majority of people in this room have built at least one. Maybe not with software, maybe not as a programmer, but at some point in your life. Remember that point in high school science class where your teacher said, pay attention. This is going to be useful later on? Sorry, they weren't lying. It was actually true. High school science experiments bear a striking similarity to machine learning systems. So we're going to do a high school science experiment together now, and then we're going to talk about how that relates to building a machine learning system, and hopefully make this whole machine learning thing seem a lot less magical. In order to do this experiment, I've brought with me a piece of finely tuned, highly calibrated science equipment. I'm not a physicist, so I might be getting the terminology wrong, but I'm pretty sure this is what scientists call a tennis ball. All right, one of the properties that this tennis ball has is if I drop it, it's going to bounce. And what we'd like to do in this experiment is understand the relationship between the height of the drop and the height of the bounce. So if I drop it low down, it's a small bounce. If I drop it high up, it's a large bounce. But we'd like to know what the relationship is so that if I'm going to drop it from here, we could give a reasonable guess of how high it's going to bounce when it comes back up. So what's the first thing we do in a high school science experiment? It's empirical observation. We want to collect data. So that, in this case, involves just dropping a tennis ball a whole bunch of times and measuring the height that we drop it from, measuring how high it bounces, and just building a table of data. I'm going to put this down before it rolls off the stage. Once we've collected our data, we can take a look at it. So here's a scatterplot of tennis ball drop height versus bounce height. So along the horizontal axis, we have the height of the drop, and on the vertical axis, we have the height of the bounce. And it seems like these things are arranged in roughly a straight line, which is nice. I was always taught in my high school science class that when things were arranged in a straight line, we should draw a line on the chart. But what they didn't tell me was that really what we were doing there is building a mathematical model. It seems like kind of pretty fancy language for drawing a line on a chart. But let's draw the line, and let's explore that a little bit and talk about why I think this is a mathematical model. So here's a line. It's not a very good line. It's not very close to the data. But I can move it around. I can make it a little closer. What's this line doing for us? I mean, it kind of emphasizes that the points we observed are pretty much in a straight line. So it's easier to spot that that's happening, I guess. But the real value of this line is that it fills in the gaps in our data. So we dropped the ball from 1 meter and 2 meters and 3 meters. We didn't drop it from 1 and 1 half. But the line passes through 1 and 1 half. The line has generalized two drop heights that we didn't consider when we were designing this experiment. We built a generalizing system that was maybe slightly easier than you were thinking when I showed you that list of ingredients. So what is a line when you want to think about it mathematically? Well, you need to know two things to draw a line. One thing is a fixed point. You need to know a fixed point that the line passes through. And traditionally, we use where it crosses the vertical axis there. So in terms of our experiment, that's the height that the ball would bounce to if dropped from a height of 0 meters. And if we move that around, we change the line. We're drawing different lines. But I suspect that the real value is going to be around 0. We can test that using science. If I drop the ball from 0 meters, yep. Bounce height was 0. It's good to know. But the other number that we need to know for drawing a line is the angle of the line, what's called a gradient, the steepness of the slope. And as I change that, we can draw a whole bunch of different lines. But in order for this to be a model for our data, we want a line that is close to the observations that we made. What we want is that our model agrees with our observations. And if that's true, if that's happening, if we found a line that agrees with our observations and we know those two numbers, we know where it crosses the vertical axis and we know the angle, we can now start to make predictions. So here's some Python code. At last, you're thinking, I thought this talk was all going to be high school science and no Python. Here's some Python code that can make predictions of bounce heights based on drop heights. So we start with that intercept value. That's what the bounce height would be if the drop height was zero. And then we add the drop height multiplied by some number. And that predicts our bounce height. But I've kind of cheated here. I've said we've built a generalizing system. But I still used human judgment. I did it by eye. I just looked at the chart and went, oh, yeah, that line seems fine. We'll use that one. How do we know that our observations fit with the model? I said they agreed because it looked like they did, which is fine if we're drawing a line on a chart by hand, but isn't really that helpful if we're trying to automate this and build it in a computer system. We can measure how well the model fits the observations. So in this version of the chart, I've added these red lines, which measure the distance between what the model is predicting and the observation that we made. So this line is pretty bad. Those red bars are really big. And you see this error number at the bottom of the slide that's changing as I move the line around. What I've done there is I've taken the height of those error bars. I've squared them to make sure they're all positive numbers. And then I've taken the average of them. So now that we have a measure of how closely our model fits our observations, we can make sure that we've got those two parameters right. We can make sure that the intercept parameter and the gradient parameter are the right ones, the ones that fit our data well. Even doing it by eye, a lot of these look similar to me, but the numbers are telling me that this one seems to be the best for changing the gradient. When we talk about learning in machine learning, often that refers to learning the parameters of a model. So the parameters of our line are the gradient and the intercept. And the learning process involves starting somewhere random and taking small steps. So we might say, what happens if we go this way? Oh, the error number gets larger, so we should go the other way. And we take small steps, continue taking small steps until the error number starts going up again. And we realize we've gone too far and we go back. This is one of the central ideas of machine learning, that you can train a mathematical model by finding values for the parameters of that model so that the model agrees with the observations that we've made in the real world. So we've built a model and we've checked that it fits quite closely with the observations that we made, but I made another claim. I claimed that this already generalizes. How do we know? How do we know for sure that it generalizes? Well, to figure that out, we need to go back to our scientific equipment and back to empirical observation. Our model was saying that if I drop this tennis ball from about a meter, which is probably about here-ish, maybe it's going to bounce to about three quarters of a meter, probably about there, the model seems about right. If this were a real world situation, we would test more carefully than that and with many more examples than just the one. But we can test if the model generalizes by making more observations, observations we didn't use to build the model in the first place and find out if the model really can produce good predictions. So high school science or machine learning, which was this? Well, when we're building a machine learning system, we need to collect examples. When I was building that recipe system, I needed to collect examples of ingredients. When we were building a model for the bounce height of a ball, we had to collect examples of how high a ball bounces. We have to choose a model. Now, in the case of the bouncing tennis ball, choosing a model is quite easy. You have one input number and one output number. You can put them on a scatterplot and you can see a straight line and go, oh, it's a straight line. That's great. I'll draw a straight line. That should be my model. In the real world, it can be somewhat harder. You're often dealing with many inputs, maybe even many outputs. So what can you do? Well, with the recipe model, I did two things. Firstly, I leaned on other people's experience. There were people who had developed mathematical models and software libraries for those mathematical models which I could use. And the second thing is that you can compare different models. You can try a few different things and compare their performance and see which one works for you. Then we train the model. We find the parameters that make the model's predictions fit with the observations that we've made of the world. In the case of a straight line, again, there are two parameters. There's the slope and the intercept. In a modern neural network model, there might be many hundreds, maybe even many thousands of parameters that the training algorithm has to find. But the process is very similar. The process of finding the parameters which minimize the difference between what our model predicts and what we've observed in the world is very similar, even on a large-scale complex model. Even if you were training something to recognize faces, it's important to test the model. We don't want to deploy something to production, expecting it to generalize if we don't know that it's going to actually generalize. Normally, when you're building a machine learning system, in the initial data collection phase, you will separate some of your data. You'll put it off to one side and just save that for testing later. It's important that it isn't the data that you use to build the model, because in that case, you're just checking again that your model is close to your observations. You need to check that it generalizes with data that wasn't used to build it. And finally, we can make predictions. We can predict how high a tennis ball is going to bounce. I don't know why you'd want to do that, but you now can. Or we could figure out which bits of a sentence map to the quantity and unit and nail of ingredients in a recipe. I keep using this word predictions, which seems to make more sense in the context of bouncing the tennis ball and maybe less sense in the context of picking bits of a sentence that map to parts of an ingredient. I like the word prediction, because it emphasizes that this is an educated guess coming from a model. You want to build a model that can guess well enough for the thing that you're trying to do, but it isn't necessarily going to be 100% accurate all of the time. Hopefully, this seems less scary than the first two slides I showed you than the mathematical notation or the incomprehensible Python code with the magic. And if I've encouraged you to explore machine learning further than what's next, where should you get? Well, there are two directions I could recommend. One, if you want to get more into implementation, there is a website called fast.ai. They have a large online course, which is designed for developers. There isn't a lot of mathematical prerequisite here. What they require is that you know some Python. The other thing I could recommend, if you want to get more into the mathematical side and more into the models and understanding what's going on behind them is this book. Even if you don't have a whole ton of math, every time there's an equation in this book, they will give a worked example from real data, which helps you check your understanding of the equations. And that's it. If you have any questions, you can contact me online. I'm George Brock, everywhere from Twitter to GitHub to the IRC channel for this conference. You can find the slides of that link. My email address is georgeatthoughtbot.com. I should mention that Thoughtbot is a consultancy, and so if you want to talk to me about that, you can. And also, I have awesome robot stickers. Thanks very much. Thank you, George.