 Please welcome Tom. Good afternoon and thank you very much for coming. My name is Tom Rohn and I will present a joint work with the friend that is calling Niv Rizaki, who couldn't attend. You can see both our slides and the card on this link. I apologize if it's not very clear, but it's Github, Niv M, Learning Chess. So, what are you going to talk about? Today is Learning Chess from data. While everyone wants to make a computer play chess smarter, we're a bit modest and we just want to make the computer play chess. Okay, so, what's on our mind? We want to know if a computer can learn chess only by looking at data of chess games. So, there are many questions that can be asked in this domain. We're going to focus today on two of those questions. One is giving a board state. Can we do a specific move? Is it a legal move? And the other one is game over, giving a board state. Is it a checkmate? Has the game ended? Of course, if those are possible, then the sky is the limit. And what else can we empirically learn about other systems? Maybe some physics and other things. I want to mention that this is a work on progress. We're still working on it. We have additional and further ideas, but I came here today to show you what we have done so far. Okay, so, let's start. And what we know about chess, first, tell that there is constant tension between features that we allow ourselves to know when doing this learning process and features or other things that we want to know. But first, we know that there are two sides to parties who play the game. We know that a game could end with either one winner or a tie, no two winners or other situation. We know that the board is eight by eight and doesn't change through the game. We know that there are different pieces that have different unknown properties, such as how can those pieces move? Can they eat other pieces? What happened to them when they get eaten? Maybe promotion for pounds and so on. Okay, so the data set we worked on is given in a algebraic chess notation if we'll have some time in the end. I'll show you how it looks like, but the idea is that every square on the board is represented by a letter A to H and the number one to eight and the move is basically done from one square to another. Usually only the two square is written and while there's only one piece that can do that move, or if it's not clear, then both the two and the from square is written. We ignored the metadata on this set, such as player ranking location, and so on we had just a bit more than 100,000 games with full or partial description. There were many games that didn't end either with a checkmate or a tie just ended in the middle and we've had a bit more than 8 million moves with distribution between the different pieces. We used a Python library package which is called chess that allows us to parse chess algebraic notation and provided the board the status, provided methods like is this chess, is this checkmate and so on and some pilot, mainly sci-fi, some matplot for plotting and nine pi. Basically we thought we would have enough or big enough data for doing more produce and all we build it as we thought we were going to do more produce but for this time it was enough to do it on a single machine, maybe some in the future. So the first thing we wanted to do and the first question we addressed before, the game on, can we do a simple move? So the most naive thing would be okay, have we seen that move before by saying this move? I mean the board status and the move I want to do? If so, yes, good, do it, no, try again. Maybe there's not enough data so I haven't seen this move or maybe it's not legal and therefore I haven't seen it and it's not efficient on neither running time or memory. So well and of course there is no learning done here so let's move to our second try here and so for each move we made, we checked the difference from the two square to the firm square and two square and we drove the diff histogram. For example, if the pound move two steps on the first time a pound can move then the x difference is zero and the y difference is two and we did some adjustments of the black and white so it would be relatively and now you can see those histograms so this is one for pound, a pound can move either one step forward, two step forward or one step forward into the side and to each side. This is how the bishop move this is how the rook move on the stretch this is how a knight move, it's kind of nice the king and you can see that the king can move one step to each side and castling to one of the sides. Okay, so the pros of this approach is it's very good for common moves and it's getting better as the data size pros of course and it's fairly time and momentary efficient, we can code all this really really simply and however it doesn't take into account the board status so if there are pieces in the way I cannot answer this question or I can answer it wrongly so it's a necessary condition if we have enough data but it's not sufficient so the next take we did on this idea was that for each move we not only looked at the move dip but also the surrounding of each piece so you can see here this is roughly the idea and we have three possible states one is occupied, one is free and one is out of the board if we're standing off the edge of the board then some of the squares can be out of the board and this is some of the results we got we're navigating those histograms and doing some grouping on it so for example for the queen if the queen wanted to move on this direction moving at least two steps then the square above it and on the right must be free and that makes sense knowing the chess rules another thing about the queen if you want to move seven steps downward and right then this means that she's moving across all the board therefore she must stand in the corner and this square must be free okay? cool so if the king, if there is castling and the king move then the one near it must be free also for the pound goes forward and oh surprisingly nothing for the knight and knowing chess rules we know that the knight can jump over pieces however not having this rule doesn't tell us anything because maybe there is not enough data maybe there is nothing relevant but that's nice for us knowing the rules of the system that the knight can skip over pieces okay? so the pros of this approach is also we keep it efficient not too much data that we store and of course runtime we take the surrounding into account so we can argue whether the surrounding is one radius, two radius more but also doing this tells us that we have the trade of, I talked before we have some external knowledge about the game and about the environment we are in so again this trade of and the main this much is that we assume that moves are independent of one another and while we can usually say it's not true for all the moves for example castling, a king cannot do castling because chest before or if the king moved before and there are several more moves that are limited by this limitation so okay this is all we are going to discuss about moves today and we still have an idea to improve it but we know that this gives roughly good results and it's of course let's generalize which I mentioned before okay so now for learning checkmate and here we ask giving state of the board is it a checkmate or not we're not asking whether if it is a checkmate who won the black or the white we might be asking that in the future okay we used several datasets 800k, the training set we used 40% of this we used for training 64 testing and we had 50-50 of true and full samples of course the real distribution of the probability is much less because you only have one checkmate at most at each game maybe less and we use SVM classifier with linear kernel we probably won't use it in the future although we had some nice results just with this naive classifier now crash course about classification for people who don't come from this domain really crash course I know I speak too fast, I apologize we have a lot to talk about today so we start with data then we extract features and we'll talk about the features we used in a minute but features can be count features Boolean features, categories many others, maybe a combination, maybe the features depend on one another there are models for each problem and then there is a classification some of the data is used for training some you predict, we use sci-pi for this mission and actually we were able sci-pi is very very general and we were able to use very a code that we used before for a total different task just applying our feature extraction and pushing it to the classifier we had and actually another good feature of sci-pi it is very easy to toggle between different classifiers, they all have train and estimate and fit functions so just play with it ok so here again we have a few versions, so the first version we had was simple count features what that means, first we counted the total number of pieces that were on the board, then we counted how many white pieces how many black pieces, for each type of piece, we counted how many pieces, for example there were 5 white pounds and 3 black pounds so we had total of 8 pounds, we also counted the number of different white pounds, 5 and 3 black pounds, so this brought us to something that is a bit better than a monkey, with a accuracy of 70, we had for the cases for checkmate we were able to say 80% that this is a checkmate and for not checkmate we were able to say in 59% of the time that it's not a checkmate but then we had some misclassifications so well we want to be much better than a monkey, so we moved to the next thing and the next thing was using the previous features and using data about the first degree neighbors, in this case we didn't look out of the board and we excluded it but we'll do that on the next versions and so we looked on the data of empty is it of the same side of the piece we are looking on or from other pieces around it and we aggregated the data for all the different pieces on the board from each party and we also built some Boolean features based on this data for example is there more pieces around me from my side or from the other is it mostly empty and so on, such features and we did head improvement we can see that the checkmate raises to 86% and no checkmate were now able to classify well on 87%, remember we had 59% previously so we are doing much better now and the third version was taking the same as before but extending the radius to 2 and 3 this make much more features however 300 features is not that much and maybe in the next versions we can add more features without, it's not that much but as I said before it makes it less generalized as we assume something bigger about the game and the board and indeed we had improvement and now our QRC is 89.5 and both has improved we can ask further questions if increasing the radius to 4568 would improve it I personally don't like this approach and don't want to do it because we assume more about the board and about the game and about the system as a whole and I would rather think about or suggest different features so having this benchmark I would like to think about what we suggest want to do in the future okay so test different classifiers here we used SVM maybe changing the kernel maybe think of nearest neighbor maybe using some deep learning as a buzzword I don't know this is a small change but I think it would have some interesting effect the edges of the board into the different counts we're doing okay asking who is the winner which I mentioned earlier is it the black, the white we can either approach it as multi classification problem whether the white one whether the black one or it's not a checkmate or we can use it just as black or white checkmate is it black or white who won okay asking whether a specific situation is chess not necessarily chessmate complex move detection history something writing the history or maybe we can think of other features that represent us what we have done maybe for example counting how many specific piece moved or something else of course as I said I mentioned we want to reduce the data we have the external data we have on the game okay more efficient parsing we use the chess package which is nice but on some cases we did something like bootstrapping we took the data we put it into the chess we produce what we want maybe we can not do this lap and just do it ourselves scaling so for classifying the 800,000 samples it was really hard for our computers and sci-fi it eventually happened but it was hard so maybe we need to think about distributing it about using something like shogun that was mentioned here earlier I think there are many tools that we can think of and surprising we have time for questions and thank you for listening so far well I'm curious how long it took to go through all those generations so we actually were sprinting in the last two weeks about this as well as working at the same time I forgot to mention earlier that even we both work in taking which is a quality Israeli technology company but so as well as working we're doing this overnight have you analyzed there is a current pattern in the wrong estimations we think of it we want to say ok why is there a pattern in that we haven't looked into the black box that say ok this is why we're wrong we do plan to do it because we want to go further we want to improve it and we want to make it general and of course the big question is when you apply it to other systems can you learn physics just by looking at it for example normal chess and giants generally are stateless they just get a position and make estimations not based on that but based on their own algorithms and they don't take into account neither except for the first part of the game they don't have any any state it's just a position and not that but knowing what they can do so we're not focusing on doing better strategy we're focusing lesson number one this is how the pieces work