 Hi everybody, I'm Adam, and this talk is called Beating Mastermind. I'm a group engineering lead and community lead at Braintree Payments. This code that I'm talking about here, the slides I'm going to show you, and my speaker notes will all be posted to GitHub after the talk. So this talk is about a game called Mastermind, but really it's about being comfortable with algorithms and being comfortable with taking a paper that describes an algorithm and turning it into code. And I'm going to talk a little bit about one specific algorithm called Minimax. So what is Mastermind? It's a two-player code-breaking board game. This picture you see on the screen is exactly what the one I had as a kid looked like. One player who is the code maker makes up a pattern of colored pegs and keeps it hidden. You can see that behind the screen at the bottom left, that picture. The other player is the code breaker, and they try and guess that code, getting feedback in the form of some little black and white pegs after each guess, so you can see a bunch of guesses there, sort of in the center of the screen, and black and white pegs next to them indicating how close to correct that guess is. And it's based off at least a 100-year-old pencil and paper game called Bulls and Cows. No one knows quite how old that game is, but dates to at least the late 1800s. So here's an example game. I'm going to walk you through it real quick. The secret pattern is off the bottom of the screen, hidden from us here. And it's, you know, of course, while you're playing the game, you wouldn't actually know what it was anyway. The guesses start from the top here, so in the very first guess. You can see there's no circles to the right of it, meaning it got no score. So I guessed four red pegs, none of those guesses were correct. So the second guess of four yellow pegs has two black circles to the right. So that means that two of the pegs in the correct answer are yellow pegs. But we don't have any information about where those pegs are yet, because we guessed yellow in every position, so any two of those could be correct. So then in the third answer, I switched to guessing two yellow and two green pegs. And as feedback, I get one black peg and one white peg. So one of those black, the black peg means that one of those is the correct color and in the correct position, where the white peg means that one of those is the correct color but in the wrong position. Since I already know that exactly two of the yellow pegs are right, I know both of those pegs have to be referring to the yellow pegs. So both the greens are wrong, one of the yellows is in the right position, and one of the yellows is in the wrong position. So in my fourth guess here, I switch from green to blue pegs and I move one of the two yellow ones. And now as my score, I get back three black pegs. So this means that three of these are correct and in the correct position. And again, we know that two of them represent the yellow pegs specifically. So both yellow pegs are now in the right position. And one of the blue ones is, but we don't know which one. All we know is that one of them is correct and one of them is wrong. So the next thing we do is take out one of the blue pegs and add an orange peg. So now we have two black pegs. We know those represent the two yellows. They're both correct color and correct position. And the white one means that the blue peg is the right color, but it's in the wrong spot. So, and that orange one doesn't get a peg. Therefore, we know that it must be wrong. So in my final guess, I guess yellow, purple, yellow, blue, I get four black pegs as my answer. And that means that I was right as my score and that means I was right. So hopefully that kind of gives you an idea of how the game plays out. You try and progressively get closer and closer to the real answer. Sometimes you take a step backwards. You might get a worse score, but you get more information, which helps you get a better score the next time. So that screenshot I just showed is from Simon Tatham's portable puzzle collection, which is a really amazing collection of logic puzzles and games that are either downloadable or playable online. You can find them, just play them in JavaScript right in the browser. And seeing mastermind on this page, here it was called guess for, I assume, trademark reasons, is what inspired this work and this talk. I really encourage everyone to check this site out. There's a lot of really fun games for programmers here. A few you might be familiar with on the screen. Lots of other ones that are crazy and that I've never seen anywhere else. I wanted to show a few more pictures of the game from this site originally, but because they're nice and colorful and easy to read. But I decided I'm going to show you a little bit more of the console of my implementation actually running. So you can get a sense of what kind of feedback I was getting as I wrote the code we're going to talk about here. So let's talk about the paper that I referenced in the talks title. It was written by Donald Nooth and published in the Journal of Recreational Mathematics in 1976. And Donald Nooth is known as the father of the analysis of algorithms. And he created the text typesetting system. And he wrote the art of computer programming, which is a sort of multi-volume tome on algorithmic programming. And in this paper, he gives a solution that will always win mastermind in five moves or less. So the problem is that this is the form of the answer in the paper. It's just a giant lookup table called figure one. And this isn't even the whole thing. This is only the first half. So while the paper does include a brief description of how to use this lookup table, it doesn't really tell you anything about how or why it works. Looking at this doesn't give me any better understanding of mastermind and the structure of it. It is probably possible for a person to memorize this. I wouldn't want to try two pages of it. But we're here to sort of actually understand what's going on. We're here to analyze this. For programmers, we want to know how it works. So the paper basically has seven parts. First part is describing the game. The second part explains how to use that lookup table that I just showed you. And then the third part gives several examples. The fourth part is a little bit more interesting. And it describes an important part of why the algorithm in the paper picked a certain guess in the previous example. So I'm going to just quote from the paper here. And I'm going to read it aloud to you. So incidentally, the fourth move, 1462 in this example, is really a brilliant stroke. A crucial play if a win in five is going to be guaranteed. None of the seven code words which satisfy the first three patterns could successfully be used as the fourth test pattern. A code word which cannot possibly win in four is necessary here in order to win in five. So what this is saying is sometimes we're going to need to guess combinations of colors that we know are impossible in order to eliminate enough potential answers that it's possible to guarantee a win in only five moves. So a code word which cannot possibly win in four is necessary in order to win in five. We have to pick an impossible answer to win in five moves. And that number there, 1462, it's just another way of representing colors. So instead of having blue and red and green, we just have one, two, and three. So 1462 is just a representation of four pegs using the numbers one, two, six instead of the colors. So this is really interesting to me. And it's something that I never did on purpose when I would play this game myself until after I read this paper, the idea that an impossible move is actually more useful to you than a one that could be the right answer. So this is the next part of the paper. And it's sort of the core of it, part five. And so I'm going to read this also. Please try and follow it. It's a little dense. Figure one was found by choosing at every stage a test pattern that minimizes the maximum number of remaining possibilities over all 15 responses by the code maker. If this minimum can be achieved by a valid pattern making four black hits possible, a valid one should be used. Subject to this condition, the first such test pattern in numeric order was selected. Fortunately, this procedure guarantees a win in five moves. So let's break this down. There's 15 possible responses by the code maker. Those are the scores you could get. There's 15 different combinations of white and black pegs that make sense that you could get as an answer. And so you want to minimize the maximum number of remaining possibilities. That means choose the score. Choose the answer that is going to lead you to scores that are the least likely. So we don't want a bunch of scores left over that we could possibly get to our guess that don't help us much make our next choice. We want to minimize the number of remaining possibilities. So if there's 16 different possibilities left and one of the scores could be valid for 15 of those answers, then you don't want to make that guess. Because that guess doesn't get you from 16 to 15. You want to pick a different guess where the very worst case score still reduces the number from 16 to say 8 possibilities remaining. So subject to that condition, we want to pick a valid pattern instead of an invalid pattern. So remember, we sometimes have to pick an invalid pattern in order to get on to the next step and win in five moves or less. But we want to prefer a valid pattern when it's as good as or better than any invalid pattern. So that's how we're going to break our ties. If two different guesses are both going to leave us with scores that only reduce the state from 16 to 15, we want to pick the one that itself could be the right answer. Because then we have a chance to win in four moves instead of five. And then subject to that condition, that second condition, just use the first test pattern in numeric order. So prefer 1, 1, 1, 1 over 6, 3, 4, 5. And that's just a tiebreaker. We're representing all of our different color combinations as numbers. So it's a great way to break ties. So what now we need to do is figure out how do we actually implement the Minimax algorithm? How do we implement that idea just laid out in the paper? And the answer is just to Google it, like everyone here does a million times in their work. The fifth result is a Wikipedia article titled Minimax. And that's what we're going to go to here. But some people here maybe don't love using Wikipedia, don't worry. One of the top answers was also a stack overflow question. So you can always use that instead. That's generally my go-to. In this case, the Wikipedia article is just a little more helpful. But they're both going to get you to the same place. So this is really something that you should feel comfortable doing in your work. You should not be afraid to Google. Doesn't mean that you should do it blindly. But it is an important way to help you understand what you're reading. It's perfectly fine. Yeah, so I'm not going to go into the exact definition of Minimax. Actually, before I switch, I want to say one thing is that you can put practically almost any of the words from that section of the paper we just read into Google and get the right answer. If you put in Maximize Minimum, you get the right helpful page that you're looking for. If you put in Minimum Maximize Algorithm or Algorithm Minimize, all kinds of combinations of those words, they're all going to get you there. So it's not like I put in some magic selection of words here to get the right answer. This one happened to be what I thought to Google, but there's lots of different possibilities, all of which are going to get you to the same place. So I'm not going to go into the exact definition of Minimax. It's pretty much what I've just described. You want to maximize the minimum number of remaining choices. You can really just read the Wikipedia article if you want to know more. But I don't know about you. I'm trying to figure out how this works. I'm a programmer. I'm not sure I want to read through a ton of abstract mathematical definition for this. So luckily, I happen to see on the screen section 2.2 of this article looks really interesting. It says pseudocode, which is really great. And I'm a programmer. I'm going to understand pseudocode. So I'm actually just going to jump down there and see if it tells me what I need to know. So this is pretty short, and that's promising. And let's just walk through the whole thing. This is describing the Minimax algorithm that sounds like it's what we want. Let's get through it. So starting from the function parameters, the function Minimax takes a node, a depth, and a Boolean, saying whether or not you're the maximizing player. The node is just where we are in the game, what possibilities we've already eliminated, what score our last guess got, and what that guess was. The depth is how much further ahead in the game we want to look to see which guess is best. Kind of need to project forward in time to get a sense of the value of different guesses. And then we're going to talk a little bit more about that last parameter in a second. So to understand the depth in this particular case, remember what it said in the paper, that we want to look at which guess is best based on the different possible responses you could get back, the different scores. So we're just looking one guess and one score ahead in time. So that's just a depth of two when we first call this function. So let's look at the second and third lines here. If the depth is zero, or if the node is a terminal node, then return the heuristic value of that node. So if we've already looked forward our guess and answer in time, or we were at a terminal node, we're at the end of the game, which will never be right. We can always look one move ahead. So we're never actually going to hit that condition. Then we want to return our heuristic value, which is what we read in the paper. It's the minimum of the maximum possibilities. It is to break ties there, whether or not the guess that you want to make is a valid answer, or whether it's invalid, because we prefer valid ones. But we have to be willing to use an invalid one. And then finally, we want to use the numbers one to six instead of colors, and sort numerically to break ties. So that's our heuristic value. It's just the concatenation of those three things. So let's look at the middle part. So we have this conditional if maximizing player then. So who is the maximizing player, and who is the minimizing player? We're the code breakers. We're trying to break this code. So we want to minimize the number of remaining possibilities so that we can get closer to an answer. That makes us the minimizing player. Whereas the code maker wants to maximize the number of remaining possibilities. They want us to have as many unknowns as possible so that we don't get to the answer any sooner. So if we're the maximizing player, we go through all of the different possible children of the node, all of the possible different states that could come out of the game state we're in now. And for each one of those, we calculate the maximum. And then we calculate, overall, the maximum value we would get from our heuristic, from calling one level deeper into this function, where our depth is now zero. So you see some code here about starting from negative infinity and taking pairwise maxims between a value and recursively calling the function. But really what you do in Ruby or a lot of languages is you just make a big list and then you'd call max once to calculate what the maximum value is. So this calls max many times on just two elements at a time comparing them. When we write this code, we'll probably write one big max that compares all the values at once, just because it's easier. Yeah, and then for the minimizing player, it's just the opposite. They start from positive infinity, take pairwise minimums between recursing down into this function and getting heuristic values and that existing value. So again, we're going to take minimum once in Ruby probably instead of taking it 100 times on pairwise values, but it's the same thing. So the important part here is just that when you get recursed into the function, you're returning back the value of a heuristic. And until then, you're just taking either the maximum or minimum heuristic value, depending on which player you're currently representing in the algorithm. So just to walk through it one more time, we're going to call the function once, which is depth of two. And we're going to be the minimizing player on the first call to the function. So we're going to take the min between all the recursive calls into the function, and in this case, as the maximizing player and with the depth of one. And then as the maximizing player, in each one of those calls, we're going to take the maximum value from recursing a second time and calculating that heuristic and returning that value out. So as the maximizing player, we're going to call that third time into the function, get the heuristics, return back out, return out our maximum to the minimizing player, and then the minimizing player is going to compare all those maxims together and take the overall minimum. So we're trying to get the minimum of our potential worst cases here. So at this point, we've kind of done the hard work. We've read the paper. We figured out where it's actually described in the algorithm. We've walked through a basic way that that algorithm works based on Wikipedia, and it's time to look at the code. So all this boils down to just something like 60 or 100 lines of actual code. Some of it I've made terse so it will fit well on slides. Some of it I've left a little more expanded. Don't assume that this is the best way to code all of this. It's just a way that was convenient for showing on the screen. It's not always the same as what the best way for code is to be read on when you have a lot more screen real estate or when you're actually writing it. So first, I'm going to show you the basic logic needed for mastermind. So when we start a new game, which we'll do by calling the play method, we need to set up a few things. At the beginning, we haven't made any guesses yet. We need to pick a randomly generated correct answer that's going to then sort of be hidden from us behind a screen. So we just four times, we pick a random number between one and six to represent a different color, join those together. So then we have a string with something like 2, 1, 3, 4 in it that represents the correct answer. And then we make copies of the list of all possible scores and the list of all possible answers. I'll talk about that in a little more detail in a minute. And then this is the main body of our mastermind game. This is the sort of structure of the game. While we haven't made 10 guesses, while there are still guesses left to make, we call our function that makes a guess. And if the list of all possible answers includes that guess, meaning it's a valid guess, it's something that we're actually allowed to pick, it's like we didn't say whatever, cyan, magenta, some color that's not in there, then we increment the number of guesses that are made, calculate what the score would be for that guess. And if that score is four black pegs represented here by the letter B four times, then we print out what the correct answer was and how many guesses it took us to get there and we break out. So what you might notice here is this code doesn't actually handle losing. But our algorithm always wins in five moves or less, so we don't really have to worry about losing after 10, since we know the code is correct. But I promise that when I was originally writing this, I did handle the case where I didn't get to bbbb, and if you ever implement this yourself, you should absolutely do that until you're 100% positive your code is right, because you don't want it to just run and spit out no output and have no idea why. Cool. So the other important part of this function is calculating what the score is for a given combination of guess and answer. So I set up a few different variables to start with. I have a string ready to store the score. I have some lists ready to store the list of pegs that aren't correct in the guess and the list of answer pegs that while correct, if we knew them, don't match anything in our guess. And then I pair up those sets of pegs. So each peg in the guess, I pair up with the same peg in answer, so the first peg of each, the second peg of each, et cetera. And then for each pair of pegs, I compare them. And if they're the same, I add a black peg to my feedback. So right if all four of them are the same, you'll get four black pegs. If they're not all the same, you'll get some lesser score. And then I take any wrong pegs and I store them in those lists I set up for later, those arrays. And then finally, I do go through the list of all the pegs from the guess that were wrong and see if there are any answer pegs that were in a different spot, but that actually match that color. And if there are a match, then I take that peg out from further consideration and I add a white peg to my answer. So if you picked blue in the first slot, but the blue is actually in the third, this will see that there was a blue on each side, give me a white peg in my score, and remove that peg from further consideration. And then finally, I just return out the score. So that's the basics of the game. We go through the loop, somebody makes a guess, we calculate the score, if it's right, we finish. If it's not right, we keep going until we get up to 10 answers. So before we move on to the specific algorithm that I give in the paper, I want to show a little bit what we're actually talking about, what this game is going to do. So luckily, we can write our own make guess function really easily that lets us guess from the command line and try out the structure of the program. And it's really simple. It basically just prints out the current game state. If you already have got a score for your last guess, it prints that out, otherwise just a blank line to start the game off. It prints how many guesses you have left, and then just get some input from the user at the keyboard. So real quick, I'm going to switch over to mirroring my display so that you can all see what this game looks like. Is this big enough? Can you raise your hand if this isn't big enough for you to read? Cool. All right, so when the game starts, you have 10 guesses left, and you just make a guess. And remember, we're using numbers instead of colors. So I'm going to go ahead and guess 1111. OK, cool. So we actually got really lucky here. There's three ones in the answer. So I could probably figure out some way to guess really more complicated and try and eliminate multiple numbers at a time. But I'm just going to take this really simple. I'm going to guess 1112. OK, so we got it in two guesses. So that's not a great example. Yeah, I'm a genius. Yeah. No, normally it doesn't go quite that easy. But at least you're starting to get an idea of how the game plays. Guess, score, guess, score. All right, so let's try this again. All right, so this time I know there's 11. So if I do 1222, OK, so there's 11 and 12, but they're in the wrong places. That means the two has to be first. The one is somewhere other than first. OK, so the two is now in the right place. The one is still in the wrong place. The one is still in the wrong place. There's no threes, no fours. So 2551. OK, so there's also no five. So at this point, the two and one are in the right place. And I'm hoping that the other ones are six, because otherwise I made a mistake. OK, cool. So in six guesses, we managed to figure out that the secret code was 2661. And I did that just through normal process of elimination. There's no tricks. I have to say that in general, we got very lucky there. Personally, if I'm playing using the best strategy, I know I always win in eight moves or less. This strategy should generally always win in 10. We got really lucky winning in two and six. But anyway, that gives you an idea of how the game plays and what the computer needs to do here in order to win in five. They need to make more efficient guesses that faster and reduce the problem space, because otherwise they'll get stuck winning in six, seven, eight like we just did. Cool. So just have to switch back to my slides. Cool. So let's get into our algorithm. Let's dive into it, the algorithm given in the paper. So there's some information that we need to set up before we can actually use the minimax algorithm. And it's actually a little slow. So what we're going to do is build it up once in the initialized method of our class. And then we're just going to reference it. So if we want to play over and over and over again, like we just saw me do at the console, but obviously the computer is doing it, it's going to be much faster. It doesn't have to run this every time. We can store it all up at the beginning, and then just play in a loop, and it'll be fine. So first we set up a list of all possible answers, which is the Cartesian product of the numbers one to six with itself representing the different colors. So repeat it up to four times. Repeat it four times. So everything from 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, all the way up to 6, 6, 6, 6. And then we build up a list of a map between all the possible guesses and answers and the score that would be represented by that combination. So for each possible answer combined with each possible guess, which those are, of course, the same list, you'd only make a guess if it was a theoretically possible answer. You may have eliminated it, but it could at one point theoretically have been true. And we calculate the score for them. So this isn't optimized for every combination of guess and answer. There's an equivalent combination of answer and guess that we're calculating the score for it twice. But we're only doing this once. When you first start up a program, we don't care too much about the time. And then in the end, we want to convert that list, that array of all possible answers, into a set so that we can do fast lookups into that set rather than having to iterate through a really big array to find out whether something is in it or not. And we're going to need all this later so that we can look up scores rapidly and so that we can check whether answers are valid or not rapidly. So finally, the climax, the algorithm itself, we again are making a function called make guess. And we're going to start out just looking at if this isn't our first guess. So if guess is greater than 0, if we have already made a guess. So for now, just forget about the first one. We're going to go through the list of all possible answers, which we copied at the beginning of our program from the list of all answers. And we're going to use the keep if method, which is a special set method that lets you filter out items from the set without returning back an array, which is what would happen if you would use the select method or something on a set. So we just go through all the possible answers, and they're still possible. If the score we just got back from our previous guess matches, the score you would get if that was the correct possible answer and for the guess we just made. So we're just reducing the possible answers we think it could be every time by checking that answer against the score that we just got back. So this is really the meat of it here. So this is a little dense. I'm going to go through it just a couple lines at a time. If you don't follow every bit of it, don't worry about it. That's why the code is all going to be posted online. It's just a little bit dense. Yeah, so we go through our possible scores hash. It's keyed by the guess, and then its values are a hash that maps each potential answer to the score that you would get there. And we filter this just like we did for our list of possible answers to exclude ones that don't match what we just found out from our last score. And then we store that array back for later. Because every time we get more information, we want to make these data structures smaller and smaller so they're faster and faster to go through the next time. And so that we don't have to do the same thinking more than once. We can gain information every pass we make through. So this is basically our first call to the function for the minimizing player, the code breaker. If you think back to that pseudocode function that we were calling recursively. So for each group of scores by answer that we have now filtered to eliminate impossible things, we group those by itself. Meaning that we have a score like BBW. And you group it into a list of them. So instead of having one long array with a bunch of the scores repeated a bunch of times, you want to just combine it so that you have one entry for each score and then a list that contains how many of those were in there. And then we're going to count the length of those lists. So what we're going to end up with is a list of numbers like 5, 17, 28. And those are going to be how many times each given score showed up in that list. So the number of different paths there are to getting any given score. And then we're going to take the maximum one. So here we've recursed, here we are as the maximizing player now taking our maximum. We look at all the different scores and how many different ways we could get to each one. And we look at which one is the worst. Which one has the most possible roots to that score? And that's our worst case is that if we get that score, we gain the least additional information. And so we're in the worst possible condition. So we take that max and then we check, is the guess that we're currently looking at possible or impossible? And remember, we want to prefer possible guesses, things that could be the right answer to ones that aren't, but we need to take both. And then we just pass out, we calculate our heuristic value, which is the worst case number of possibilities, whether or not the guess is possible, and then the numeric value of the guess itself. And yeah, we've now calculated our heuristic. We've taken a maximum, and we're ready to take a minimum now on our outermost call to the function. So remember the general structure of this. If the guesses are greater than zero, if we're not on our first guess, we map over all the different possibilities and calculate our heuristics. And then we want to take the minimum of all our heuristic values because we want to minimize our maximum. We want to pick the guess that's going to leave us with the best worst case, that in the absolute worst case is still going to eliminate as many possibilities as possible. And then we just take the last item from that heuristic. Because remember, the heuristic was a number, and then whether or not the guess was possible, and then the value of the guess itself. So we're just pulling the guess itself back out of the heuristic by taking the last one. And finally, if this is the very first guess we're making, we just guess 1, 1, 2, 2. And this is kind of explained in the paper, but it's also basically brute force. If you try all the potential different first moves, the only one that ends up with you always winning in five moves or less is two of one color and two of another. So this was just determined through exhaustively running through all possible first moves and finding out that this one works the best. If you try 1, 1, 2, 3, or some other combination, you end up sometimes taking six moves to win. So this is the best one, because it's the best one. There's not really a ton of logic there. And this is something that if you play this a bunch as like a teenager or something, or as an adult, certainly, you'll figure this out for yourself that this is the best way to start. So that's it. If you just call that play method, the game will play itself, and I'm going to switch back a little bit and show you what running the program looks like. So if I run interactive mastermind, it'll let you play it manually. Or if you just run mastermind, and I didn't add any typos at the last second, it'll spin for a while. And this is building up those data structures that you need, all the scores for every combination of guess and answer, the list of all possible answers. It's setting those up at the beginning. And then once it gets those set up, it'll start to play itself automatically over and over. Just ran the play method in a loop. And that should start any second now. Yeah, and then it just plays itself. And as you see, this algorithm, it very often wins in five moves, because that's what it's optimized for. It's not optimized to win in as little as possible. It's optimized for the worst case to be no worse than five. Because remember, the algorithm we picked minimizes the maximum value. Oh, wow. So anyway, there's a typo in my program. But I was making some quick changes right before, so I knew it was possible. But in general, it wins in five moves or less. Luckily, it's not taking 10 anyway, right? That would be really bad, if my program would crash. Yeah, so with the bug fixed, this code will all be available afterwards. And I am glad to take questions.