 In March 2016, a program named AlphaGo shocked the world when it bet Lee Sedol, one of the best Go players, probably of all time, and the most dominant player of the last decade. Lee Sedol is to your left. No, right, stage left. The program was represented by Dr. Arjo Hoang, who was one of the principal authors of the program. This achievement made major news across all the world, in Germany, into the most popular news show, and of course it has to be shown with robots and everything because artificial intelligence is going to take over and everything. Of course it has been likened a lot to the legendary match of Gary Kasparov versus IBM's Deep Blue, which IBM Deep Blue won and bet the chess master. And you might wonder, this has been almost 20 years ago, what took us so long with Go? Was it for lack of trying? And I can tell you it was not for lack of trying. There have been tournaments held to beat humans at Go for a long, long time, dedicated researchers, everything. Why is Go so hard? And to show you at what level we were back then in 1998, so a year after Kasparov was bet, this is a Go game that an amateur player, not a professional amateur, a strong amateur but an amateur, won against a computer, but the computer had handicap stones. Those are 29 handicap stones. That is like you can play 29 moves without your opponent ever replying. There used to many handicap stones and not used in tournaments because it's ridiculous, but still the computer lost after we already bet chess. Something that wasn't as much reported on is that already in October 2015, AlphaGo made its first big achievement. It bet a professional player, the European champion which is just a second-dan professional but still very strong, in Go 5-0. And those results were not revealed until the January 2016 where they published their paper about AlphaGo in The Nature magazine. And here it says that a feat is thought to be at least a decade away. And that's just us talking about professional players and that is true. I've been reading computer Go papers for a long, long time and most people on the mailing list thought it was 10, 20, maybe 50 years away, maybe 5. And then after only half a year it turned around and bet one of the best players that there are. So it also improved a lot in that timeframe. So this is what we're going to talk about today. This is Toby and I want to let you know something. I am not a professional expert. I don't do machine learning at work. I'm a media core Go player and I have not written a strong Go engine. I'm not part of the team of one. I wrote three engines myself. They are fun, not very strong. Why am I telling you this? I'm a web developer, just like I guess most of you are. And when I see talks like this I often tend to think, I have all this stuff but I could never notice. And I want to tell you, you can notice stuff too. It's not easy but you can get into it. I went into it back in 2009 and ever since I've been reading Go papers in my free time essentially and then when I prepare talks about it or something. So this is something that you can learn. We don't have any prerequisites here so you can learn and enjoy this all the same. In order to do this we've got to talk about a couple of things. At first, a bit about the game of Go. Then why is it so hard, the game of Go? Why did it take us so long? Then we're going to talk about the Monte Carlo method which was the most successful approach until AlphaGo came along. Then we have to talk about newer networks because AlphaGo used them. I'll introduce newer networks. You don't need to know about that. And then finally we're going to talk about AlphaGo and how it revolutionized the field and brought it essentially 10 years forward. And in the end when you think like what does that do good for me? I'm not writing an AI or anything. What can I learn from this for my daily life? And we're also going to tackle that. So let's start with the game of Go. And the game of Go is played on a little board. This is the small one. And you place stones on the intersections. And this stone is now said to have four liberties. One on each neighboring intersection that is empty. If you place another stone next to it they form a group together and now they have six liberties. So if white plays one stone there I'm at one liberty. Two liberties. No, I'm at three liberties. Two liberties. Now I'm at one liberty. And what happens when white plays here? White catches me. And that means white gets a prisoner and that's one point for white plus one territorial point. But of course black isn't stupid. Black can sort of stretch out and then have three liberties on its side again. So what does this look on the big scale? This is the general goal. The general goal is to make more territory to encircle more territory than your opponent. So you sort of divide a cake of territory. And this is a game. And you can see at first each player tries to sort of get into the corner. It's easiest to make territory there. And then let's fast forward a bit. You can see black sort of lays claim to the lower right, the upper right. White maybe has the lower left. And so on and so forth. You can see more of the game develops. You can see where they lay claim to certain territory. And if you go forward once more, you can see that black now essentially has the center under control while white has most of the edges. And I want to point yourself to this here. Yes. This looks like, oh, white and black are battling it out. What is happening there? And what's happening actually is black is already that group. It only has two liberties left. And there's no way that black group can get out of there. And that's a very common thing and go that you leave so-called dead groups on the board until you really need to capture them because you need all your moves. And if we fast forward to the end of the game, you can see indeed the black group has been captured. So what do you do in the end? In the end, you start counting territory. You eliminate these dead stones. They also count as points. Territory counts as points. So we just place stones on the board. And the other thing is that you can see that you're going to have to go down by black, which was the computer in this case in a challenge last year in October that the computer lost. But this was the only game the computer won. So we just place stones on the board. Why can't that be so goddamn hard? Let's first talk about Go versus chess because European media outlets always say, oh, Go is like chess, and it's not. And chess, you start with everything and you lose pieces. You destroy things. Move stones are just set in the territory. And chess is all about killing the king, whereas Go is more about dividing territory and get more of the cake for yourself. This difference also told me something about something being complex versus complicated. Both games are very complex, which is why it's so hard for an AI to beat them. But Go is not complicated. Go has 10 rules, maybe 11, something like that. It is a very, very simple game. The complexity stems from its simplicity because you can do so many things. You're on this big 19 by 19 board. Where the hell do you place your stones? You don't know. Chess, on the other hand, is more complicated. Where does which thing move? There's the rojada and all that stuff. So chess is really complicated and complex. And about chess, this is a quote by a chess grandmasker named Edward Lesker who discovered Go later in his life and says, while the borrowed rules of chess could only have been created by humans, the rules of Go are so elegant, organic and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play Go. So let's talk about why it's so complicated. Our board is much, much larger. Always every move is legal, as I mentioned. And the average branching factor that is, at every point in the game, how many valid moves are there on average that I can make in Go that's 250 and in chess that's 35. So that makes for a much wider search tree. But the search tree also goes way more in depth. The state space complexity is how many valid board positions exist for that game that we talk about. And in Go, the number of board positions is 10 to the power of 171 whereas chess is at 10 to the power of 47. And I mind you, this is not all the games we can play just the valid board positions. But what does that even mean? That's a ridiculously large number. So the internet says that the number of observable atoms in the observable universe is estimated to be 10 to the power of 80. So if my math does not fail me, I could take each and every one of those atoms and form a pair with each and every one of those atoms. So I have 10 to the power of 80 atoms on the underside and combine them all, which would leave me with 10 to the power of 160 pairs of atoms which would still be less than the number of valid board positions in the game of Go. So it is a number that we can not even really imagine. So what else is there? There's a very global impact of moves. When I place something in the top right, it can have severe consequences for the bottom left. And so you cannot just localize the problem. Also traditionally I or how deep blue, that chest mostly is done by alphabet or search, and there you go into a certain depth of the tree. And then what you need is an evaluation function. You need some function that tells you, okay, I'm standing this much better here on the board than my opponent to tell you where to cut off the tree and where to play, basically. This sort of evaluation function is thought to be impossible to build for Go. Because if I give you a Go position like this, then the most professional players could not say, really, who's ahead and who's not ahead. There's concepts like influence where you say, oh, black there has some influence, some power. It's not really territory, but it could become territory in the future. It is very hard, and there are entire papers written about it being impossible. So still go use the alphabet or search until we entered the Monte Carlo method. And to illustrate that method, I have a question for you. What is pi? What is the number pi? What is the number? Anyone? Don't leave me hanging. You all know it. 3.14. Okay, but how do you know that? How do you determine pi? And it's like the circumference of a circle by something. What if I told you there's another way to determine it? I can basically take a union square and then draw, like, a quarter of a circle there, and then I can throw random pins at it, and then I can count how many pins landed inside the circle versus how many pins landed outside of the circle, multiply that by four, and then I have an estimation, which you hopefully see, yes, you do, of the number pi. And that is I have a system in which I run random simulations, really random simulations and to estimate my results. And this is what, in the Go world, is called the Monte Carlo Revolution, which was introduced in 2006 of these random simulations to sort of make an approximation of the evaluation function that we use in chess. And Monte Carlo method is combined as the Monte Carlo tree search. So you have a selection phase, where do I want to play, you have the expansion phase, that's like I created a new node because I played here, then you have your simulation, where you do the so-called rollout, where you do the random simulation, and then you propagate the result of that simulation back to the top. A more example that is more intricate. Here, the nodes are board game states on each edge. I play a move, and the numbers you see in there are the wins and then the total visits at that particular node. So what happens? At first, I select the node, then I do the expansion, I say some new move is applied, a new board state comes out, and I have zero wins and zero visits. So I do a random simulation here. And at this point, really you're going to say like, Toby, what's this? This is stupid, this is dumb. How can we ever win with a random simulation? And the basic idea is this. If I stand better on the board than my opponent, then although I play random moves all the time for both of us, my chances of winning should still be better than when I stand worse than my opponent. That's the idea. And the good thing is, because we don't need to apply heuristics or anything, we can run thousands and thousands of these simulations per second to get a really good result in the end. The other critique mostly is not human-like at all. Humans don't play like that. When I play Go, I don't do random simulations in my head to the end of the game, count out the game, and then apply it to the search tree. But I just want to remind people that how we fly these days is not how we try to fly back then. We don't have any feathers, we don't have any flappy flappy wings. If you look at this, we have massive cages of steel with turbans that go whoosh. And even if you look at maybe in helicopters, it's even more different. And sometimes, we as humans with our machines are better at different things at what nature is good at, and we should maybe concentrate on building things that are suitable to be solved by a machine and not by a human. So let's say we won this game, and then we do a backpropagation of the results, so we won, we won, we won. But we also got to adjust because of perspective, because up there it's black smooths, down there it's white smooths, so we got to adjust for that. So a win for white is a loss for black, obviously. And this brought me to one of the other really interesting topics I stumbled upon when I started digging into this, is the multi-arm bandit problem. And this is basically, you're in the casino in Las Vegas, and you try to win lots of money. So you play at this one machine all the time, but at some point you start to wonder, like, maybe one of the other machines is better, maybe it has a better winning chance. Shouldn't I try the other one, or is this one the best? And this problem is mostly called exploitation versus exploration. So you want to exploit the best node, the most winning move, you want to explore that and see, is this really good, do you really want to win here? But at the same time you don't want to leave the other moves hanging, or the other machines. Maybe, maybe you were just unlucky, you lost three times in a row. You never know. And to do this, you use the upper confidence bound applied to three searches, in short, UCT. And this is one of the most basic formulas there. You get the winning percentage, plus some exploration factor, and then whatever that is, that value decays with the number of total visits in a node. So the more visits, the less that value is. But the first you sort of encourage the exploration. So there's an example of one of my go engines. And here, for example, on the first level, we decide to go with this node, which I can understand, because it has the highest winning percentage, and there are already many visits. But then on the second level, of course, it's just cut out, there are many more nodes there normally. It interestingly decides to go with the first node, not the three-of-three node, which has a 100% winning percentage, which I would have expected it to do. But no, it goes with this one, because it says, oh, baby, that one loss was just unlucky. Let's try here again. Okay. And basically, all that you need from Monte Carlo Tree Search, you need to know how to generate a valid random move. And you need to know who has won and is the game over. Which, when I realized this, it was like this for me. It was truly amazing, because this means we don't need any game-specific knowledge. I just need to tell the machine, what is a valid move, and is the game over, and how do I score the game who has won? I don't need anything more, which means that Monte Carlo Tree Search still is the premier algorithm to use in general game-playing, which is a competition where you get a serialized rule set, and then play in your, whoa, in your fictionary games. Okay, this one is gone. Okay, I'll manage. And your fictionary rule set, you just can play these games, and it's a competition among multiple machines. The algorithm is also anytime, so at any time I can stop the program and ask it, what's the best move right now? It will give me the best move. And it is also lazy. What do I mean with lazy is, it only cares about winning. With humans, we often care about winning by 20 points, or making bigger moves, and it only cares about winning, which is why many people were surprised in the game that I showed you before. This was the last move by the computer, this black move straight in the middle. Almost no human would have played there, because while there is some danger, the center is generally, to be assumed, very safe and sound. But the computer knows better than us in that particular instance, because it knew, oh, I'm two and a half points ahead. If I play once in my own territory, I'm still one and a half points ahead, and I lose that really slim chance of losing this move there, so it played the move, and so that's what's called a bot move, mostly. But of course, we can enhance it with expert knowledge. The rollouts that just depend on really randomness are not that good, so all good engines to date use so-called heavy rollouts, where they have three-by-three patterns of where to answer to make more realistic simulations to the end. It's still a multicultural research, but it is not fully random at that part. So talking about expert knowledge, we can get some pretty good expert knowledge from neural networks, and that has been something that just came up rather recently in Go Research again, and that was in December 2014 with this paper, Move Aviation Go Using Deep Convolution in Rural Networks. You can see that the authors of that paper, most notably Dr. Aja Wang and David Silver, they're also the principal authors of the AlphaGo paper and of AlphaGo, so Dr. Aja Wang is the major. I think the first paper published as part of the AlphaGo project, which started in early 2014, if I remember correctly, but you know, what does that even mean? I mean, who here knows what a deep convolutional neural network is? Yeah, I didn't know either before, you know? So that's why we're talking about this stuff, and at first it says neural network. This is basically what a neural network is, and you have an input that in Go is mostly in 19 by 19 field with values associated to it, and then you have a hidden layer, and in the end you have an output layer that tells you, oh, the probability of playing this move is this and this high, or the probability of winning is that high. Another good example for neural networks is recognition of handwritten numbers, so then you get the image as a 28 by 28 pixel input, and in the end you have 10 output neurons that tell you, this is the number one, two, three, or something, or tell you the probability that the network thinks it's that number. So what else do neural networks have? They have weights on the connections between the neurons, and they have a threshold or a bias in the neuron which basically tells, am I gonna activate? Am I gonna give output to the next layer? And to sort of illustrate that, if we have one neuron like this and we have inputs coming in like that with those weights associated, if these two incoming connections are activated, we have 5.2 which is bigger than the threshold of the neuron, and so it fires, so it puts into the output. So some neural networks are binary, they just work in zero and one, but most advanced neural networks more have a sigmoid function or something else that goes from zero to one smoother because otherwise you have that flipped switch effect. However, weights can also be negative, so if the other input also fires, then the neuron won't fire as itself because now we're below the threshold and the neuron won't fire. You can see that more in a network where you see that these input neurons are activated, so for instance because there's a stone here or because there's a black pixel there or something, and then the others might fire or might not fire. So but how do we learn? Like how does a neural network learn? And it learns through training. And what does training do? Training goes in and adjusts these parameters. It adjusts the weights and the biases and all that. But how does it know how to adjust them? It adjusts them through supervised learning, which means we need to have a data set, a good clean data set of what we want the neural network to train on, and then we give them an input from the data set and then we tell it also what the expected output was, and then it can go back and for back propagation adjust all these parameters, all these weights and biases to tell it, okay, you're no closer to the real goal function. And what can this input be? It can be games of go, for instance, and then the expected output would be, depending on what we want to train the network for, this paper trained it for prediction of where will the next move happen, for instance. And then the output would be like, okay, you said A5, but it was actually B3. Let's see if we can adjust the weights a bit. How do you deal with the training data? So first you got to get a good data set that is clean and everything, and you split it up into training data and test data for verification. So the training data, you basically run your training for neural network on, but later on you take the test data and then you verify, is this good? Does this give me good predictions? Why that distinction? Well, you want to avoid overfitting. And overfitting is that the neural network basically just remembers your training data and then just makes no higher level abstraction above that data and just always answers how it would answer for a training data and doesn't really learn anything useful in actual application. So that was a neural network. First run through. What's a deep neural network? A deep neural network just has many, many layers. Well, I say many, many. Well, we start at four or three layers of these hidden layers, but this network had 12, and the final alpha go had 13. So that's what's called deep. And what you generally say, as we move along this direction here, the learning gets more abstract. So when you talk about image recognition, which is one of the major applications of neural networks, maybe the first layer recognizes there's a line here or something like that, the first, not the first neural network, the first layer, and then the later stages sort of recognize stop signs or things like that. So what is a convolutional neural network? Well, with a convolutional neural network, we have a local receptor field, in this case, three by three, which takes all the inputs from the layer before it and then maps it to one field on the next layer, basically, to one neuron. And then you have a stride and you sort of... Oh, yeah, that is called a feature map. And then we have a stride where we move this local receptor field along the neural network. The interesting thing about this is that although we apply the stride, all those local receptor fields of one of these feature maps, they share the same weights and biases. So they are all trained to basically recognize the same feature. Is there a line here or something like that? When we just have 12 layers, that can be rather tedious if we learn on that low level because we share the weights and biases, which is why we also go into the breadth or the width of the neural network and for one layer we have multiple feature maps that then can each be trained to sort of see different features of it all. Why do you do all this with the shared weights and biases? You want to minimize the number of parameters because the more parameters you have, the longer it takes to train and it takes weeks to train. So all in all, the network architecture from this paper looks sort of like this. You have lots of input features, which I'm going to talk about briefly soon. And then we have 12 layers deep and it has 64 to 100... Oops, we have 12 layers each. And we have 64 to 192 filters on each of those layers, which is really, really big. And then the output is two layers which give the move probability if it's currently white smooth or if it's currently black smooths for all of the possible moves. So this neural network, this deep convolutional neural network already had 2.3 million parameters and 630 million connections and so they trained it for weeks. Talking about the input features, I would have assumed that the input feature is just like, okay, here's the board. Zero is empty, one is black, minus one is white. No, because that makes it really hard to make weights and biases specific for the feature. So we have three feature maps as an input already for the stone color. Is there a white stone here? Is there a black stone here? Is this field empty? And so they can work more closely with the zero and ones that they get from there and then they have other features like how many liberties does it have, one, two, three, or more than four, for instance. So as I said, it was trained on game data to predict the next move and it reached the accuracy of 55%, which really bested everything that came before it. It made for a very well, sort of strong program itself. It mostly beats GnuGo and GnuGo is the one that was still playing Alphabeta search and Minmac search but it devastatingly lost once it played against big Monte Carlo tree search bots. So while it was really interesting, I didn't think it could take it just far. One other thing that I did in the paper, they combined it with Monte Carlo tree search in the selection phase, so they sort of initialized the probabilities with which a move was picked with the probabilities they got from the network and they combined it and the combined version was much better than just the neural network. And how did they do all this computation because the CPU was already busy with the random simulations? They asynchronously computed it on six GPUs or something like that. Which is amazing for me because you use all your RAM to store the tree, the CPU to do the simulations and then even the graphics card heats up. So now let's finally talk about AlphaGo. So this paper was out. This was sort of the first AlphaGo paper and it was interesting and people were getting good results with it but was nowhere near a revolution. And I'm going to tell you why I wasn't too excited about it because I thought if we just train with human expert positions we can never surpass the humans. We can only try to imitate the humans. Like how will we surpass them? And what they did is really interesting. So this is directly from the AlphaGo paper, the paper and you can see they still train from human expert positions. They create a faster network that has lesser variables but is faster to evaluate which is the rollout policy which we'll talk about in a second and they have the supervised learning policy network which is this one which is basically like the one we talked about before but with a slightly different architecture. But what they do then is interesting. They go in there and they create a reinforcement learning policy network which plays against itself all the time and against older versions of itself all the time and through that self-play it gets better and better every time. Dr. Argeron always says AlphaGo sleeps Christmas, AlphaGo doesn't sleep, AlphaGo still trains AlphaGo is always training and that is the first really good innovation to apply reinforcement learning there to learn from games of self-play to see have I lost, have I won, how can I get better The other great thing is the value network and with this network they developed something that tells you what is the percentage that I'm winning or am I losing here which I told you before is not possible and many people always said it was impossible but apparently it is possible and that's what they achieved. So how do they combine all of this with the tree search? They still go through the basic Monte Carlo tree search phases but let's look at it in a bit more detail. So first we have the selection of course and let's first have a look at a single note in the paper they say the values are stored on the edges but I make it on the notes because it's easier to visualize for me. First thing they store is an action value which we're going to talk about soon. It's basically the value of the Monte Carlo rollouts plus the value of the network combined with some factor between them. It stores the prior probability which is the value that they get from the supervised learning policy network that says okay the percentage of playing here is this and this big and this and this big and of course it stores the visit count. So back to the selection based on the action value plus a bonus and I miswrote that sorry and the bonus is the prior probability but decaying so the more visits you have the less impact the prior probability has on that note being selected. So we select the note with the highest value and what we do then is we do the expansion and at that expansion like everything is here done as a chronously they initialize the values with the prior probability from the supervised learning policy network and this is also really interesting because generally the reinforcement learned policy network is much faster much better than the supervised learning policy network but it still performed worse and in the paper they say it is because they believe that the reinforcement learning sort of converges to a style of play of optimal play whereas the results from the supervised learning one are more varied and in a wider range so it gave them better results here. So with that probabilities we then select the one with the highest probability and we select that one for evaluation and we're not only doing one evaluation we're doing two evaluations. The first evaluation we do is the Monte Carlo rollout and this one is not random but it uses the rollout policy network or the so-called fast network it was trained along with the other policy network but it is much faster to evaluate and it takes two or three microseconds as opposed to milliseconds because that's less parameters and maybe it's also less deep and the other one is the evaluation so it uses that rollout policy network for every move to be made so to have really realistic rollout. The other one that it does it consults the value network and says like hey does this look like a losing or winning position to you and then it combines those two values with the effect of 0.5 so the rollout and the value network they are equals in their sense and then we do a back up we do the back propagation of that value back up the tree and update the action value anything. How does this look like? AlphaGo has a single CPU version but the match version was distributed with 1,202 CPUs one master node that holds the whole search tree and then always as in Chronic City goes like you do this you do this as in Chronic the evaluation of the value network and yeah 1,76 GPUs which is what I still write in the paper but it's not what was used in the match version against Lisa Dahl because there they use tensor processing units. This is custom hardware designed by Google for machine learning it sort of trades the precision of graphics cards to do more evaluation at a faster speed and Dr. Archer Wang said there was a major speed boost for them and a major playing strength boost that they used those tensor processing units and that's the actual rack that the game against Lisa Dahl was played on you can see a little plaque on the side that says like hey we bet Lisa Dahl so according to Dr. Archer Wang AlphaGo has three key strengths in which it mirrors human style play. At first there's the human instinct and it does that and then the other thing that tells us okay where are we likely to play second there's the reading capability which it does through the tree search reading is you see I play move there you play move there and that's what you do in Go if you're sort of my level when I really played you do that for 10 moves ahead maybe but if you're really good player you do that for 50 moves ahead and AlphaGo can do it for quite some moves ahead and the last one and according to Dr. Wang the most important and the key factor to AlphaGo's success is this value network positional judgment that can tell me am I winning right here or am I losing and all together this is now more natural than what we used to do with Monte Carlo tree search so let's talk a bit about Lisa Dahl match and about the style of play of AlphaGo this move that is marked right here was towards the end of game 2 and it's not very good move considering the current situation on the board because it would be the most human minds more profitable to block off white at the top right for you and so the style of AlphaGo is still lazy to a certain extent so as Monte Carlo tree search AlphaGo only cares about winning therefore people say it has a very conservative place there where it wins by a really really tiny margin and otherwise starts to play safe moves the other thing is that when humans for the first time analyze the play of AlphaGo and say oh this move is bad this move is bad what does one achieve with that I think AlphaGo is losing but in the end AlphaGo won and AlphaGo against Lisa Dahl won four matches out of five so it won four to one match and I think that's also because we humans we're used to that specific way of play that we learned and that everybody tells us is good but AlphaGo breaks those conventions and that's actually also what Fanhui the European champion that lost against it he was really struck by that loss and he started playing go differently he didn't adhere to the rules that he was taught before anymore and he played better and about that laziness one of the commentators who was a very high ranking professional down player said so when AlphaGo plays a slack looking move oh I can hardly read that we may regard it as a mistake but perhaps it should more accurately be viewed as a declaration of victory and that's mostly what it is if we look at the confidence graphs I've been told one of the graphs is the estimation of winning percentage from game two here from the Vayu network and the other one from the Monte Carlo rollouts but I might be wrong because in this presentation Dr. Wong never said that that move that we just looked at was played all the way over here and you can see AlphaGo is pretty damn certain it's winning depending on which graph you trust and so AlphaGo just played this because I'm winning anyway so I'm just going to play here most human commentator said this move gave LisaDole sort of a fighting chance let's briefly talk about game four the game that AlphaGo lost and we have this situation here and by many it was called the divine move by LisaDole and it's a very complicated situation LisaDole set out to make it an all or nothing battle because AlphaGo was so good at winning with a small margin okay let's make it an all or nothing game there's many positions involved in there that can even start to comprehend that move but the thing was AlphaGo didn't see how great that move was that move was played around there and you can see that it only really realizes a long time later it's like oh god that was garbage and then goes like oh I've lost damn it and which is also one of the vulnerabilities of all this machine learning stuff even in game five AlphaGo missed the position that people said a queue player should have replied better sometimes they have this surprising lack and knowledge and ability in them so really quick before we run out of time what can we learn from all of this for me implementing those engines especially the lightest one and Ruby there's a huge difference between making X faster and doing less of X because doing less of X is free and especially in Ruby where everything is low this is very great and I think too often we focus on making something faster and thinking about how can we do less of this computation altogether of course benchmark everything your intuition about what is fast and what is not is often very very wrong and there's this difference between solving problems the human way which I think is what we often try but it might be better to try and solve things the computer way because the computer might be different and widely different things than you as a human are and therefore the good solution for it might be different also please don't blindly dismiss approaches as infeasible both the Monte Carlo method and newer networks have been applied to go before their breakthrough and the papers all say it's impossible it's not good we cannot do this and in the end there were the driving factors behind beating humans at go which is really great and so whenever somebody tells somebody else has already tried that don't reinvent the wheel la la la la I mean don't always do that but be aware that with some advances or something else you can still apply this and be better also I think too often we focus on just using one approach the one golden approach we're going to do it this way and that's great but it can also be a combination of approaches I think that's also sort of the gist of the Shopify talk this morning we can have the multi-tenant or the single-tenant but in the end we had parts which was sort of a compromise between all of them and here we combine Monte Carlo and the neural networks to achieve something greater why didn't you just stay with the networks and try to make them only better they said well if I combine with Monte Carlo they seem to complement each other they seem to be better so I went that route naturally also I want to share with you the joy of creation I have seldomly had as much fun as when I was riding my go engines because you know it's this little I teach it to play a game and then you look at it and you play against it and it's like oh it made that move oh stupid mistake and then you make it better gradually and it's so much fun to just create something like this and that's what I also want to achieve with this talk for you don't be afraid of AI don't be afraid of machine learning you can dig into these topics I did you can too and maybe write a little bot and then say yeah this is fun this is cool so my bot I haven't worked on it since last year to my shame it's called RubyCon it's a basic Monte Carlo research bot no neural networks and this one has one of my biggest sources of inspiration which was Michi it's I think 300 or 400 lines of Python and it most of the latest and up to date Monte Carlo research techniques so this was my talk thank you very much I hope you could get something out of it