 Today I want to talk about machine learning a very cool topic. It's getting used more and more these days Part of the reason that machine learning seems weird is that we have this idea of Machines as being this sort of rigid and Unadaptable kind of thing just sort of blindly doing whatever they do but the whole idea of learning is that you know you take in some data some observations about the world and Try to find patterns in them that you can then exploit to do better next time So the opposite of blindly doing the same thing. There's a lot of different ways that we can sort of look at learning problems Here's four of them and you already know all of them Supervised learning is you get some input and then a teacher a supervisor tells you what they are unsupervised learning There's no teacher. You're just sort of seeing stuff and saying oh this thing and this thing. Oh, these are pretty similar I'm going to put them together. Oh this other thing. That's very different. I'm going to put it in a separate category Reinforcement learning you're actually out doing things not just looking but acting in the world. I wonder what happens if I do this. Oh Got a shock negative reinforcement or tasty positive reinforcement and then you're trying to learn to do things that get positive reinforcement temporal temporal Reinforcement learning the same as reinforcement learning except there's sort of like a big time delay It's like oh, I shouldn't have eaten that mushroom yesterday Games chess checkers whatever are typically examples of temporal reinforcement learning tasks given these problem formulations There's lots of different learning algorithm nearest neighbor and K means clustering. Those are unsupervised learning algorithms The other ones are mostly supervised This is a current area of research. There's lots of different algorithms here. This is definitely not a solved problem And you know and there's a sense in which the whole idea of learning is not really a solvable problem It depends on making assumptions about the world that some pattern that used to hold will continue to hold Regardless of what learning algorithm or which problem formulation really you adopt There's some things that come up over and over again. I want to talk about two of them here and try to do demos The first one is this idea of exploration versus exploitation trying new stuff is Exploration cashing out the best thing that you know so far Exploitation there's an inherent trade-off between them All right, let's let's try an example. This is the chili monster is Alien robot chili monster. I think maybe it's a zombie too and the way it works is You poke this guy in the eye and and he gives you a cookie so There plus one that's a good cookie. That's a payoff And so the question is is well, so should we punch him in the right red eye again, or should we go for green? I'm gonna go for green Green okay, that was bad minus one so Here we are. We're now the learning creature Do we want to just commit to red and do that all the time or do we want to try green some more and see if It catches up instead of making up explanations immediately. We should lay back and collect data We need to keep count of how many things went well and how many things went badly I mean one of the mistakes that we can make as people was to only count one side Well, so we've got a little collector cup here that we can use then the way it works We just sort of put it here and it vacuums up whatever it is. So so far from the red side. We've got four Good ones and two bad ones got a guy for the green side He's paying off at three and two. They're both pretty good Okay, so, you know, which one should we pick? Well, we've got this guy. We've got a learning creature That is connected to our cups So he's aware of the data that he gets here and he doesn't have a mouse So he throws water balloons like that. So he threw it at green and Gotta got a negative payoff right so if he goes again Well, I got a negative payoff on that side So the learning algorithm for this guy has to decide what to do which but which way is he gonna go? Let's just let him run for a while. So how does this guy work inside and here? It's it's pretty cool. Actually, it's very simple and it's an algorithm that is being used Out in the world. We'll talk about that in a sec. All right So here this graph. I'm not going to explain it in complete detail because it's a little bit involved It's called a beta distribution and what the way it works is you can feed in the two numbers the number of wins and the number of losses I'm fluffing over some details here and it gives you back this curve Green curve corresponds to the fact that we've got 17 wins and 13 losses Which is actually a little better than the red curve at 16 and plus 16 and minus 15 But both the curves are very broad Meaning we're not really sure how much this thing is worth at all So the white guy is a little demo rod and he's completely flat But as we try it and like start to get wins The curve gets sharper and sharper toward the wins It's better than just saying the average number of times it won because you know if we had three wins and three losses That's 50 50, but if we had 300 wins and 300 losses That's also 50 50, but we know more about it when we have 300 wins and losses than we have three So you see as we try to like go from zero zero It's completely flat as we increase the number of wins and losses if I can go straight here It stays at sort of 50 50 But the distribution gets narrower and narrower because we got more and more data saying it's really likely that it's 50 50 So what this guy is doing is he throws a random number weighted by the green curve And then throws a random number weighted by the red curve and whichever one comes out higher He picks that so now at the moment the red and green curves are sort of overlapping quite a bit But as we get more data for these things the Distributions start to get more narrow and that if they don't if they overlap less then it's gonna pick one over the other Okay, I programmed this so that one of red and green pays off 48 percent of the time The other one pays off 52 percent of the time But I don't even know which is which this kind of problem where you've got to do something out in the world And you have two goals one is to get knowledge about what's happening in the world That's why you explore and the other one is to actually make something good for yourself That's the exploit is a fundamental problem of learning algorithms and this kind of algorithm this kind of learning problem This is really not called the alien robot chili monster problems more often called the two-arm bandit problem Or the multi-arm bandit where each action you've got is like a different slot machine You're trying to figure out which is the best slot machine if they're any different And this problem comes up all over the internet it comes up with Advertising and now that the actions are what ads should you put on a web page? Say and the chili monster is you know you It's trying to get you to click on one of those ads That's a plus one and if you don't click on an ad that's minus one for all of them So the system behind the ad system or like in the video Website when you're watching videos you get to the end of videos You get this little grid of other ones to watch same thing it those choice of which ones to put up there are Based on this kind of algorithm. It's trying to make the ones that on the one hand you'll pick and you'll watch But on the other hand, it's also trying to get knowledge about is this particular video one that people like Okay, so You know if so if the video site is showing me a set of videos that I might want to watch It's like I'm gonna make one choice and then I'm gone It's not like I'm the chili monster sitting there getting treated over and over and over again So in fact really I'm just a teeny little piece of the chili monster and all the other people who are looking at Videos and getting presented with choice screens and so forth are all more parts of the chili monster Which raises the question, you know, I don't like the same videos that somebody else does And if it's if they show me something that someone else likes it's not necessarily going to work that well for me I mean they might be able to pick the videos that everybody likes, you know But you could do better if you could recognize that I am similar to other, you know geeky computer He whatever I am and somebody else is similar to that too so what we can try to do is that we want to do a Categorization task and based on some knowledge that we have we want to make a distinction between one versus the other And this ultimately leads into this question of knowledge representation How do we divide up things that we see in the world into categories that we can make predictions about them? The things in this category are going to tend to like these videos things in this category are going to tend to like those videos Okay, so let's do an example of that one One of the earliest learning algorithms for making distinctions supervised learning now We're going to get input teacher's going to tell us this is this this is that and we have to learn it in this case We're going to learn pictures. Okay, I went out and I found on the web a bunch of news Photographs that were used for face recognition research. It took a dozen pictures of former president push And I looked in my camera and I got a dozen pictures of cats So we're going to try to learn between former president and cat All right, and the way it works is this up here. We've got our input. This is 250 by 250 Individual pixels that make up a black and white picture so they can be zero which is black They can be one which is white anywhere in between which is gray Down here at the bottom. We have another whole set of twenty two hundred fifty by two hundred fifty sixty two thousand something Weights numbers that represent what we've learned and if we start this thing up All right, so at first the first thing we do is clear out the weight So this level of gray is a weight value of zero and if the weight value goes negative It's going to be darker than that if the weight value goes positive It's going to be brighter than that so the way this machine works is we put in a picture All right, there's a picture former president push Uh and in order to evaluate this we take each pixel In turn 250 by 250 multiply it by the corresponding weight and add them all up And if it's greater than zero we say the category is president and if it's less than or equal to zero We say the category is cat Okay, now at the moment all the weights are zero So no matter what picture it is we're going to get time zero plus something time zero plus something time zero The whole result is going to be zero So the first prediction is the sum is less than or equal to zero in this case It's equal to zero So what the perceptron says is cat and the supervisor says wrong Okay, and now the perceptron learning algorithm what it does is saying oh, okay You're saying this thing should be greater than zero It should have a sum greater than zero So it goes through each of its weights and all the inputs that are high It makes the weight more positive all the weights inputs that are low It makes them more negative as a result the whole sum goes up So now we're going to modify our brains to make this thing look produce a more positive sum It actually makes kind of a shadow Of the input that we're in because the brighter spots get more positive weights the darker spots get more negative weights All right now here's a cat But since we just changed the weights to make everything positive it comes out some greater than zero So again, it makes the wrong guess Now we do the same learning except now we want to make the sum smaller less than or equal to zero So it's like we memorize a negative of the picture in order to kind of cancel it out Like that. So now we've got sort of half of one picture in positive half of the other picture in negative And and so on so we can go through We've got a dozen one category a dozen of the other category And we're just going through them in a random order Whenever we get the answer right, okay cat, we don't change the weights We don't have to learn we didn't make a mistake But when we do make a mistake then we update the weights again, okay We go all the way through all 24 cases. We see how well we did If we got them all right, we're done. Otherwise we shuffle them again and go back to the beginning so now we're at the end of the pass and We got half right the same thing we would have gotten by flipping coin are always saying cat But let's see what happens in the second pass All right. Yes Well, no still making some mistakes. Nope Yes, yes, yeah, uh-huh. Yes Yes Starting to learn this before we actually categorize all the inputs correctly and there's oh there we got it Okay, so is that cool? Uh, we learned all these things now One of the things that people got excited about the perceptron way back when in the 50s when it was first studied Was that there was a proof that if it was any way for it to learn something it would learn it So in this case it learned it no problem But we would like more than having it just memorize a bunch of pictures that we showed it We want it to learn the concept of former president versus cat So that we could feed in new pictures Maybe pictures that the supervisor hadn't even categorized And get useful answers out of this thing. So that's called how well the algorithm generalizes So not just learning not just getting it right, you know teaching to the test But then being able to apply knowledge to other cases. So here's a different picture of the cats that the perceptron never saw What do you think it's going to get it right? No Another picture of a cat Got it, right 50 50 Another picture of the former president never saw No, sorry Not so good Okay, the perceptron as we've used it here Is not very good at generalization because really if we look at these weights, you know, it's learning extremely Specific little details about the pictures that we happen to show it And there's really no reason to expect that it would actually apply to other pictures I mean, even if we took one of the same pictures we trained on just sort of moved it a little bit It would line up with different weights and get probably a different answer so the Knowledge representation problem is how do you actually build algorithms that are like the perceptron? They're making categorizations, but they make better characterizations They make better categorizations that are more like the way we do it Okay, and one thing we do is that one of the algorithms that was mentioned So on the beginning the deep belief networks is it's like got a whole bunch of perceptrons But instead of going straight from the input to the output It's got layers of them and trying to get more and more general abstract things And a lot of this stuff is working pretty well now As we go forward One of the things that in fact is driving machine learning getting used now is the combination of two things Computers are much more powerful than they used to be number one And because there's tons and tons of data to learn from there's lots of pictures There's people typing stuff There's people speaking and just like if we had many many more pictures That we were trying to make a distinction between we'd have a better chance There are learning algorithm would sort out what's really important and what isn't Assuming our knowledge representation and the way we built the learning algorithm Is even capable of representing the concepts that we wanted to learn The explore exploit trade-off does not go away And finally knowledge representation sort of goes to the core of what learning needs to produce in order to be successful And also even how we see things how we understand the world As a result of our experiences And you know the stupid little perceptron one thing that it's really bad at is recognizing what it doesn't know So for example, if we give it something that's not a cat or a president like This is my dad at that sky city years ago And what do we say that is we said my dad's a cat And and here's a picture of a dentist light that looks like a robot. So I took a picture of it There you go