 Okay, let's start. Start on time. Yes, so my name is Melvin. Hopefully I'll tell you a little bit about the story of how we create some of these intelligent programs that are able to play certain games better than people. It's one of the early success of this nascent field of the artificial intelligence, right? So we have this benchmark, which are just simple games, and we get computers to play them. And obviously today we have progressed much beyond that. And we're still making progress in this particular problem of just playing games, but more and more complex games. Okay, so I'll skip this. So one of the early, I think, fairly famous or well-publicized events was Deep Blue versus Kasparov. So here you have not actually seen... Actually, I was quite surprised. This is what Deep Blue looks like. It's actually not a very big machine, right? This is IBM, built by IBM in 1986. Then the famous match, they actually had two matches, I think. So Deep Blue lost the first match. Then there was a rematch. Then Deep Blue won the rematch. Then I think the story goes, Kasparov asked for a third match, but IBM said, no, we're done. Time to retire the Deep Blue, right? Because we won, right? So keep the record, you know? But regularly, I think these days you can play chess on your phone at the grandmaster level, actually. So a lot of the technologies that Deep Blue has sort of pioneered has filtered down into, you know, just like apps you can run on your iPhone or your Android phones. And this was, yeah. This was 1996. Okay, I am hoping to explain some of the techniques that were used just to give a little bit of more depth into the story. And we'll see how that goes. Anyway, so the central concept really is something called a game tree. You can think of it this way. So each of the, you know, you can see here this is tic-tac-toe because it's, I think, a game everybody understands. And so how you build this, well, it's kind of an inverted tree, right? So the top, the root, right, is like the beginning of the game. And then it's how we play also. You consider, okay, what would your opponent do, right? What is his first move? Let's say the cross player, right? So, you know, there are actually like three different ways you can play the first move in tic-tac-toe. And after that, you know, you have the subsequent moves like the circle, right, circle player. And all the different ways. If you map out all the different ways, you get this structure which we call the game tree. So at the beginning is where you are. And this sort of going down is moving to the future, looking at the turns coming up. Okay. So if you imagine, we expand, imagine this is another very simple game. We expand this tree all the way to the, when the game is over, right? We can go a few more turns, like three or four more turns. And the game is over. We call this last, last row of games the terminal, because it means finish already, right? The game is over. And when the game is over, we can assign a score, right? So let's say if I'm sort of the first player, terminology always called the max player. Let's say that's the player playing the cross, right? So if it's a win for the player playing the cross, we put a one, okay? And if it's a loss for that player, we put a zero. Right? That's how we can label the bottom, right? Because the game is over, right, by definition. So if I am the, let's say I'm the player playing the circle, which we call the min player. So what would I want to do? I want to avoid all the ones, actually, right? I am playing the zero or the circle. I prefer the values that are zero, right? Just like I'm a circle, right? So if you see what we can do, so you can actually propagate the values backwards. So at the second level, I am the player playing the circle. I prefer zero, zero meaning the cross player has a win, or similarly the circle player has, zero meaning the cross player has lost, sorry, and the circle player has won, right? Because let's assume there's no draw, just to keep things easy. Either the cross player wins or the player playing the circle wins, right? So just to recall, so zero means the player going, the cross has lost and the player playing the circle has won. So if I'm the second level, the circle player, I prefer to go to the zero, right? So the value at my second level is the smaller of the two, because smaller means good for me, right? That's also why it's called the min, right? Min means smaller is better for me, right? Then of course in this side, right, I have sort of no choice because both sides lead to a loss. So I have to put a one, so at the very top it's called the max player, so I want the bigger number, bigger number means better for me, right? So they're alternating, so one person wants more big numbers, one person wants smaller numbers, okay? But it's okay if you don't fully understand this, it's just a little bit of technical thing if you're interested. But the concept here is that if we knew all the values at the bottom, we can sort of work backwards and figure out what I should do, what is the best move I can make. That's just the overall concept, how it actually works not so important. And this is what I call optimal play because if we knew all the answers in the future, we can make the best possible move. The best possible move always exists, right? But we may not be able to find it in practice, but in conceptually it's how you would do it. Okay, and the problem in practice is that we have so many, the future moves have so many, many millions and millions, so you cannot practically figure out what is the best possible move because it's very large, right? It's this large, which is very, very large. So what do we do? There's kind of a trick that we do. So it's like a heuristic, we call it a heuristic. So we cannot expand all the way because it's 10 to the 46 little circles, right? So I can't even draw on the screen. So I'll just draw up to maybe like three levels. So it's like, I look ahead two steps, right? Your turn, my turn, your turn, something like that. Then this is not the terminal, right? Because the game is not over yet, right? Because I just look ahead three steps. But I can maybe somehow give some number, which is based on my feeling, right? Okay, I give a high number if I feel I'm going to win. I give a low number if I feel that this is a bad position for me. I'm going to lose, right? But notice this is like a bit ad hoc because I didn't say how we give this number, right? Because I just based on my intuition, right? This looks nice, I'm going to win, I give 0.7. So I go from 0 to 1. Just now was 0 or 1. Now, because I don't know for sure, I just put some number in between 0 and 1. Okay? And then this 0.9 means I'm very, very confident I'm going to win. Okay? Then we can do the same trick again actually, right? We can work backwards again. So the second layer, we want to choose the smaller number because the second player likes to go to the positions that are the small number near to 0 and then we can move backwards up to the top, right? So overall at the top, that is my, that's how you sort of look forward and use the information and figure out what to do right now which is the top level. Okay, so that is something called the Minimax algorithm and this is sort of how big deep blue works I guess in a very hand wavy kind of way. Deep blue sort of does this. But it tries to be very smart about how many, you know, it has very good hardware so you can explore many, many of these circles or these nodes. So that's generally deep blue. And now, of course, this is the sort of open source version if you're more technical and you want to contribute to advancement in chess AI, you can check out Stockfish which I won't discuss more. Okay. Okay, so what's the problem with this technique, right? Again, as I said, these numbers here that I gave at the beginning, right? This 0.7, 0.1, it's kind of ad hoc, right? You have to be able to judge whether you're in a winning position, you're going to win or you're going to lose. It's not absolute because the game is not over, right? And for some games, and here is where we come to, you know, the game of Go, right? If you've not seen this is the game of Go, you've probably seen it online, I guess, that was the big match last year, was it? This year, March, right? So in Go, if you look at a board like this and sometimes even if you watch the match, you would know even for the experts, the expert commentators, they don't know who is going to win, actually. They're like, oh, maybe he's going to win. It's not like 100%, right? It's very, you know, a slight change could be completely changed the whole game, right? So it's very hard to tell who is ahead and who is behind. So that's what it means by you don't know how to assign the numbers at the bottom. You know, like the 0.7, you can't do that accurately. It's very poor. The numbers are very poor. The result at the top is also not good. So that's why the old methods, the mini-max methods, doesn't really work well for Go. So that's why Deep Blue can't really play Go, for instance, because it does this kind of assigning all the numbers way, right? Which is not so good. Okay, so there are some programs that can play Go. There's a whole community of people who built these programs. Remy Coulom is one of them. He has this program called Champion Crazy Stone, which you can also now download on your phone. Anyway, so just like, you know, what was the state of the art, right? What was the state of the art before AlphaGo? AlphaGo got Go on the world map. But Go has been sort of the holy grail of the AI community for a while. And I think one of the big matches was 2014, for example. They organized this yearly event. So Remy obviously was not playing himself. He's playing his program Crazy Stone. He's just making the moves for the machine. And he played this professional player. And he actually won. He actually won, but with... I will... Okay, I will... He actually won, but with something called a four stone handicap. Okay, so meaning that he gets four stones on the board first. So Crazy Stone is going as black, and he gets four stones on the board. So he has quite a big advantage, actually. And of course, the pro player will have this lesser at the beginning. He has no stones, but then black has really has four. But Crazy Stone won at that point. And I think they asked him, when do you think we could win without any handicap? So the four stones are handicap. And I mean, I guess for him, even just removing one stone, doing three stone handicap was a big jump at that time, which is something just like two years ago, right? I will skip this because it's a little bit technical. I will skip... But I guess that's the... Throw the name out there. So the real breakthrough... In fact, it was invented by Remy himself back in 2010-07, I think. Something called Monte Carlo Research, which is what Alpha Go uses, but I will not go into details. Okay, Alpha Go. So now this is state-of-the-art, actually. State-of-the-art for playing Go. Why does it work? Hopefully, I'll give you a little bit of... Of details about how... Okay, I still have some time. How it works. So this is the match, I think it's all on YouTube or something, if you have not seen. Really interesting, especially the match where the human won. Which is only one match, one of the matches. I think it was match four, or was it match three, right? And then we lost all the rest. But still interesting nonetheless. And in fact, now they have published some new data, which is Alpha Go playing itself. Which they could do, right? Just run both sides, Alpha Go, right? And one of them will lose, actually. Because it's both Alpha Go, right? So someone has to lose, right? So I think they have released some new games after this match took place. So this is Lee Sedol, who is the... He's not actually the reigning Go champion, but he's considered to be the strongest player throughout the recent past. But he's not the current strongest. The current strongest actually is a player from China. But if you look at the last, you know, five, ten years, he's kind of the, you know, right up there. And I forgot the name of this... the term on the left is someone from the Alpha Go team. So again, he's playing for Alpha Go, of course. Yeah, Alpha... Okay, so how do I explain this? Okay. So the big difference actually with Alpha Go is the incorporation of neural networks, I think. That's sort of the buzzword, I guess, these days. Deep learning, right? I mean, you've heard the buzzword. It's called deep learning. So let me try to... This is the last slide I have. So let me try to explain how they do it. Okay, so now people actually play Go online. You know, they go online and it's like find people to play against. There are some websites you can go. So they actually record down all the positions that the... they look at the games by the highly ranked players. So that's called the human expert positions. If you see over here, human expert positions. You can also go online later yourself. It's all publicly available. There's no secret here. Okay. And then... So from this, you can learn about... You can try to predict actually. You can build something called a neural network using deep learning to predict how human players play. So given a particular board, and the board is like a picture kind of because it's black and white dots, right? They can sort of imagine like a picture. Like I'm kind of a small, you know, 32 by 32 picture, black and white picture. And then you can do some kind of prediction to say given this particular board of Go, what is the next move the human player is going to make? Right? Just try to predict, right? And then you can of course see how well you do, right? You give it some new position and you try to say, do you predict the right move that an expert human will make? Okay. So these are the two networks on top here, right? And then... But then realize that this is not enough data actually. So you still want more data. So what you can do is... After you manage to do the prediction, right? You manage to create a network that can predict how a human player would play. You can use this network to play itself, which is this part here, self-play positions. So the network can play itself really quickly because human beings are kind of slow actually. So even if you download all this, right? It's still kind of small data set. So once you build this network that can predict quite well how humans play, you ask the network to play both sides, right? And you use GPUs or that to make it fast. And of course you have a cluster and all these things. So you can make literally millions and millions of games, right? Which human beings cannot play so many games even throughout the history of human... I'm not sure if the human beings have even played one million games, you know? Like through the history of Go. But if you build this network, you can ask it to play itself millions of times easily, right? Of course, spend money to run the hardware and all that. So that's this part. So from this, what they can actually do which is actually quite fascinating, they can build this something called the value network. Okay, so this is like the way to assign the numbers. I said assigning numbers is very hard, right? Like the 0.7, 0.9. But actually Google managed to solve that problem which is quite fascinating. So they managed to build another network based on these millions and millions of moves to assign a number. So a number like 1 means black is going to win. And a number like 0 means white is going to win. But of course, they don't know exactly. So it would be like 0.9 or 0.3 or something like that because you don't know for sure, right? But they managed to do this which is to build a network that can, given a particular bot, return a number. So it's called the value network. So value is that number, you know, 0.9 or 0.7. Okay? And so there's two networks. So one is this called the policy network, I guess, which means it knows how to make the next move. And in fact, it will give you a few moves like it will suggest sort of the next few possible moves, not just the exact next one because sometimes it's like there are a few candidate moves that are quite good, right? And then the value network is to predict the score, right? Given a particular position, what is the number? Is it a 1? Meaning black is going to win or is it a 0? And I said even for human beings, it's very hard, right? So things cannot tell who is going to win. But this network apparently can, right? So that's, by in itself, quite an amazing achievement, okay? And they combine it with this thing called the Monte Carlo Tree Search which what Remy was using all along, right? And these two networks combined with that method gives this dramatic improvement in the performance, right? But the tree search was around since 2007. So the new part really is these two networks that they built. Okay. Okay, so now the last part. So let's go next steps, next steps. So what is next, right? So goal is done. So it's not interesting to play anymore for human beings because they could be using, they could play with AlphaGo, you don't know that. So of course we're not all done. So there are certain games, there are many games, there are many games with cards, right? Where you don't see what the opponent has, right? Some poker I think is a classic example, right? You know, some of the other card games, right? You have a secret information, right? So I cannot see yours and you cannot see mine, okay? And this is a new, this makes things very hard actually. So early on like chairs and checkers and goal, right? You can see everything because it's laid out in front. There's no secrets, right? But once you have all these secrets, it's actually very difficult because you cannot predict the next move, what? I don't know what you have in your hand, right? So how do I know what you would play? Because I don't know, it could be anything, right? I mean, so, you know, no way to tell. So, okay, how do we solve that? That's still a very hard problem, it's not solved, but I can tell you a very simple trick. I mean, kind of a, kind of a, you know, all these games are about sort of tricks in some sense. For one trick you can sort of imagine is, so let's say maybe poker. Poker, you can count cards, right? You can count, I have seen an Ace or and so on, right? Then you sort of roughly know, well, he can't have certain things because those have already come out, right? So it can't be in his hand, right? So what you can do is you can actually just use a random version of his hand, the possible cards, just random it, right? So you know it can't be certain things. What are the things that are left? Let's say you already seen one Ace, so maybe he has three Ace left, right? Maybe his hand has three Ace, but you just generate a random, a random, this is what I call a random instance, right? So the things you don't know, right? The hidden information, you create a random version and then you do the playing with this random version and you keep using different random versions. And this is something called sampling if you look at the literature, but it just means we just use random because we don't know what it is, we pick a random possibility, right? And this very simple trick turns out to work reasonably well, but not superb. So that's what we do now. This is a very simple idea, but it works okay. I mean, it's not great. And poker obviously is still not solved because there's a great economic incentive to solve poker, you can imagine. So you can play online with someone and then you have this program and then you play for the program, right? Just like how AlphaGo works, right? Then you have all this fancy stuff running and all that, right? And you can run on Amazon and all this. So anyway, so it's not solved obviously, but that would be interesting things to come in the future. Yeah. Okay, let me just end off with this. If you're interested in some further readings, actually I didn't talk about Checkers, but Checkers has a very interesting story. None other than the original author of the program himself, Jonathan Schaefer wrote a fantastic book about Checkers. Checkers I think is like the unknown, the hidden story that's never been told. I also didn't have a chance to tell about it here, but you can read it on his book, fascinating read. And then of course there's the chess programming wiki, which is a good website for more detailed information if you want to look at it. Okay, that's all I have for you today. Questions? Ah, I'm on time. Okay. What is your background? Backgrounds in computer science. No, for me I think the fascination is you can have something that plays better than even like regular person, right? I mean, it's easy for a program to play better than me, I'd say in chess. So I did talk about that if you want to talk about the background. So the background was I was playing this game, actually it's called Battle Chess, very ancient game. And I couldn't win the computer, you know? I mean, I was so frustrated, you know how I can't win the computer, right? I mean, it's like, you know, that's what got me down to rabbit hole, I guess, you know? It's like, how come it's like this little cheap program you can buy, it's like, it's better at this game than you are, right? Yeah. Yeah. Yes? It's part of what you just said. You got frustrated. The computer is never going to get frustrated. Yes? So it's a negative for you that you get frustrated in any of it. Oh yeah, of course, of course. So I think even in the deep blue match, there's a very famous story which says deep blue actually made a mistake because time was up because you play under a clock in the official match, right? So in one of the deep blue matches, time was up, so deep blue must make some move, but it hasn't finished thinking yet, okay? But then there was a bug actually. So the bug was that it made a random move, okay? It has calculated some reasonably good moves, but because the time was up, there was a bug, it hit, it made a random move, okay? Just out of the left field, just something random. And then Kasparov was deeply disturbed by this because it's like, you know what, it's out of the blue, you know? It's like no relation to whatever he was doing before, you know? So he's like, okay, I need to go toilet and I need to, you know, take a break and all that, then he was very disturbed by that. But of course they didn't know, you see, at that time nobody knew about that because he had to go and debug and figure out why and all that, right? So that he was very disturbed by this move. I mean, he said it himself, you know, he was very disturbed by this move. Deep blue, deep blue one actually, because Kasparov was so disturbed, you know, after that move. So it goes to show, right? The psychology of it is also an important fact. I mean, when two players fight also a mental battle. First of all, the machine is all, you know, pure processing, churning hardware and all that, right? Yeah. What do you think machines or robots could take emotions into the processing? That's a very tough question. I don't know. I don't think anytime soon that it's not, it's not. Emotions are a very complicated thing. Maybe like the sims, I don't know, if you play the sims, the sims kind of have a, they can get hungry, they can be angry and so that maybe is some kind of a simulation of emotions. We've talked about the real, you know, emotion that's a different, different ballpark. Yeah. Someone in the universe is actually doing, like, doing that right now, doing some research into how robots can have different emotions. I was going to say in Japan, the robot will actually copy the emotion of the owner or it's now really sad. Yeah, I think you want the robot recognizing what is the human's emotion. So the first step, even just to do that, I just recognize from the face or whatever, what is the owner. But for the robot itself to have his own emotion, that's a whole different ballpark. But also just maybe to criticize the whole idea of the neural networks, like we have not seen it, it's a little bit opaque. It's like a bunch of numbers and we don't really understand, sorry, that means 25 minutes. We don't really understand, like what we said, it constructed this network that can predict the value of the board, right, the goal board. It's beyond human achievement already. But we don't really know how it works. It's just this black box that somehow works. By churning a lot of data, it somehow managed to achieve this some kind of intelligence about the number, right. This is a 0.5, this is a 0.7 and all that. Is this what they call big data? In a sense, but it's data without any knowledge in a way, right. We can do the task, but we don't know how we can do better as humans to do this task. But somehow the machine can do it. So it's a very much like a black box kind of thing. But not all big data is like that. I mean, there are some big data that are more illuminating that gives us new insights, right. But in this case, this doesn't really give us any new insights. It just shows us that machines can do it better than humans. But we don't know why that's the case. But that's also a big open area, right, to look at, to peek into the black box and look at all these numbers inside and figure out where is the intelligence actually, right. I mean, there's been some work in that, right. Ask the machine to dream and produce these images. That's some first attempt at looking into the black box itself, actually. Yeah. So she's doing it with this. I mean, figure out what's inside the neural networks. Yeah. Yeah. Although, to us, we play different games. There is a continuum of complexity from our life detector to chest to something like gold. Yes. And actually, it seems like despite a step change in terms of the approach that people are using from integrating all the possibilities to a space where it's no longer possible so you define some heuristic to kind of approximate some things. Yeah. But also, I think the point is all this is still quite a narrow for each problem, each well-formed problem we need to be a human being to go in there to kind of make a few tricks so that the computer can perform well on that narrow domain as opposed to a computer kind of the hands-off find the approach to the problem itself. So I think that's a quite important distinction. Yeah, you're right. She's always a very narrowly focused, right? I mean, you know, you program it to do one thing. But just to bring on touch on that point, there are some, you know, moving towards the direction of what you're talking about. There's this competition called the general game-playing competition. So it's not a particular game. The rules of the game are told to the machine at the start of the game. Human has no way to interfere. So there are such events where you can sort of describe the rules in sort of computer language, right? How the moves are made and so on. And you present it just before the game starts. No programming, right? Here's how the game works now play. So this is obviously much more challenging, right? Because you don't know what the rules are before the game actually starts. So you must do everything first. And you can work on any well, supposedly any game that you can describe in this specialized game language, right? And so there are some work in that which is perhaps also a move towards a more general purpose AI, right? Because you can work on any game, not like a predefined one. Chess, I can encode some chess. Like for example, Go, I need to get out and download all the expert moves and all that, right? Yeah, which is quite limiting, limited to Go only. So I think there was a joke where they said that if you just change the rules by one, Lee Sedol can probably still play, right? But AlphaGo will completely lose, right? You just change one of the rules, right? Yeah, which is true, actually. Human being is much more flexible, right? That's a good point. Thanks for bringing that up. Okay, I think we're done. Thank you so much for your time.