 Dyna, ddodd yn gweithio i mi ar y cyfnodd. Dwi'n meddwl i'r rhubi, o'r cyfrifodau rhaid o'r cyfrifodau rhubi, o'r cyfrifodau rhubi o rhaid o'r cyfrifodau rhubi. Dyna dwi'n meddwl i mi, mae'r ddaid, ddodd, o'r ddorothi. Rwyf wedi'i ddweud y Bryton yng Nghymru o'r south cof, ddodd yn ddweud, mae'r cyfrifodau rhaid o'r cyfrifodau. I'm a bit of a maths nerd. I studied it in my spare time for a few years with the Open University. I previously helped to organise the branch of code bar. If you don't know what code bar is, you should definitely check them out. They do some great work for underrepresentation in tech, and I currently work for Happy Bear software, a fabulous Rails consultancy. They helped get me here, so thanks Happy Bear. Now to today. Today I'm going to be taking down a bit of a rabbit hole I went down recently, which is completely unrelated to WebDiv and possibly even a little unrelated to Ruby, but trust me, it's going to be fascinating. We're going to be looking at rock, paper, scissors. Rock, paper, scissors is a simple game played between two people, but just as a formality, let's briefly explain the rules. Each player simultaneously chooses one of three moves. These moves are rock, paper and scissors. If both players choose the same move it's a draw, otherwise we have rock, beats scissors, scissors beats paper and paper beats rock, but I've bloody no idea how paper manages to beat rock. In Malaysia they have a far more sensible version where they have water instead of paper and bird instead of scissors. Speaking of other versions, here's my personal favourite, Bear Hunter Ninja. The earliest Japanese version of the game, which was imported directly from China, has instead a frog, a slug and a snake. Going up a gear, there's also rock, paper, scissors, lizard, spock, which I'm sure you've seen Sheldon play on Big Bang Theory, but he never mentions this version, rock, paper, scissors, fire, water, air, sponge or rock, paper, scissors, fire, water, air, sponge, gun, human or rock, paper, scissors, fire, water, air, sponge, gun, human, devil, wolf and then you can add dragon, snake, lightning, tree and then there's 25, but that's all nothing compared to rock, paper, scissors, 101. Pretty ridiculous. But there are some great moves in it, one of my favourite is the monkey move. He beats a lot of other moves by either flinging poop, climbing on them, scaring them or ripping them up, but of course monkey doesn't always win. Monkey is enraged by math, which I'm rather fond of, not just because I like maths but makes me look kind of street. And another good one, which actually involves movement, is money, but you should be careful to remember that these versions of the game are outlawed within the official rock, paper, scissors society due to their unnecessary complexity. Yes, there's a world rock, paper, scissors society and this is their shiny new website, but I'm not sure how seriously you should take them. Even lizards are using rock, paper, scissors, but for them it's personal. This is the male side blotched lizard. Now these lizards don't actually play rock, paper, scissors, that would be ridiculous, but rock, paper, scissors does model their mating behaviour well. The male lizard comes in one of three colours, orange, blue and yellow, and these colours map to the move that they play. So the orange lizard is greedy and ultra dominant and is the largest of the three and the orange lizard establishes large territories with loads of females. Whereas the blue male is a little bit smaller, he's quite content, a little bit of territory, one single female, but because of this the orange is much larger and can beat blue and take over his territory in a single female. Now the yellow is sly. He's actually the colouring of a sexually mature female so they have no territory but sneak into oranges and take their territory mate with the females because the orange is too much territory to defend at once. So we have orange beats blue, blue beats yellow as blue only has to defend one female and we have yellow beats orange as it can sneak into the territory. But going back to rock, paper, scissors, you may be thinking how much can there be to rock, paper, scissors. There's only one way to play and that way is random. It's the optimal move. Well I would say there's more to it than that and there are actually a couple great techniques on ways to win given to us by the lovely World Rock Paper Scissors Society and their lovely World Rock Paper Scissors Society guide. One such move is called the crystal ball. The aim is to confuse your opponent by declaring what move you will play next, ignoring the fact you don't actually know what they'll play. So you hope they get confused and won't play the move, you say that they will. So you say you're going to play paper aren't you, then they won't play paper but one of rock or scissors making rock a safe bet to play. A more difficult strategy for the seasoned World Paper Scissors player is tattoo play. The aim is to distract with subtle clues to influence your opponent such as wearing a t-shirt that says I'm going to play paper or a belt with the word scissors across it. Some may even go so far as to make body modifications with ROC and K8 tattooed across their knuckles. Of course bots are unable to employ these strategies. What techniques could a bot use? Well there's this bot which has a 100% winning rate. In one millisecond it can recognise the position and move of the opponent's hand and immediately play a counter move. But although impressive I would say this is clearly cheating. No we will be looking at techniques to decide what the opponent will play before they play it. So let's think about the problem we're tackling. In terms of gameplay rock paper scissors can be thought of as a scaled down version of poker. In poker you're unaware of your opponent's move the moment you're making yours meaning you have to model your opponent. Are they frequent bluffers? Do they have a tick when they bluff? Not knowing your opponent's move when you're making yours is known as imperfect information and this is also a feature of rock paper scissors and it is this imperfect information that makes rock paper scissors so interesting to model. In contrast games such as chess are known as perfect information games. Each player can see all of the pieces on the board at all times. Furthermore in rock paper scissors there is only one winner and one loser and this is known as zero sum. Zero sum games are called such as the outcome of each game is zero. So let's say we have a win is plus one and a lose is minus one and a draw is zero. We have one winner one loser one plus minus one is zero and two draws is zero. So in contrast where participants can all gain or all suffer together are known as non-zero sum. Now let's start looking at competitions and strategies to crush our opponents. In 1999 and again a year later a rock paper scissors bot competition entitled the International Roe Shambo Programme Competition was run by Dars Billings. He invited people to enter a bot to compete in a round robbing competition. Each bout between two bots consisted of 1,000 rounds of rock paper scissors. They had an open round where every match result counted and they had a best of the best where the greatest number of wins of a bout gave the winner of that bout. To encourage people to create bots that actively tried to beat each other they added dummy bots into the mix. Now not all the players were optimal and this changed everything. This meant that to beat these dummy bots you had to you could and had to and could deed to detect patterns of the dummy bots play and employ an appropriate counter strategy. TLDR if we have some not so good bots we can have some really awesome ones to beat them. So I came powder was the winner of this competition and we're going to step through the evolution of that bot. So first let's look at the most well known strategy. This is random bot. So FYI this is what all our examples will look like. Each bot has a state containing our past moves and our opponents past moves and we have a next turn method containing which will return our next chosen move. We can see here that we are randomly returning one of R, P or S after rock, P for paper and S for scissors. So how does this bot fair against other bots? This is the always rock bot and I think any one of us could be this bot. However the random bot would still only win around 50% of the time. So random is often considered optimum as an optimum strategy as it is impossible to gain advantage over a truly random opponent. Equally however random won't be able to exploit opponents that do use strategies. Quite clearly random is far from optimal. Not losing is not the same as winning. So let's look how we would beat this bot. It's the frequency counting bot and it is a bot that is actively trying to beat other bots. It's using frequency analysis to determine which move to play. So here we increment our counter with their last move, we take the move with the highest count and we return our counter move. Against the always rock bot only the rock count in our hash would increase. This means this bot would beat it every time. In fact this bot would be any opponent that chose rock paper and scissors with different statistical frequencies. This bot doesn't choose rock paper and scissors with different statistical frequencies. This is our fixed string bot and it cycles through a set string here of moves. This string is the statistically balanced string which is a fancy way of saying the same number of R's, P's and S's. So we have our current position starting at minus one and we increase our current position wrapping at the end of the string and we get our move by taking the value of our current position in the string. So how might we generate our balanced strings or R, P's and S's? If we did it ourselves it would either take ages or it would be full of detectable patterns. This is where De Bruyne could come in which was used in the official tournament. De Bruyne is a cyclic sequence of order n on a size k alphabet in which every possible string of length n on A occurs exactly once as a substring. Simply put it's a sequence consisting of a set number of different elements in which every possible n length combination of these elements occurs. So looking at the string we just used we know we have three different elements. We have R, P and S and we are saying we want all substrings of length two. We can represent this like so where A is our set and B represents both the length of our set and the length of our substring. So in this string there's every possible two length combination of our set. R, R, P, SP and even SR wrapping round at the end. Now De Bruyne strings can be made of any set you'd like. So for example we could have the set zero one. So we have the set length two and let's say we want all substrings of length three and there are actually only two distinct strings where one is actually the reverse of the other. Now let's say we have a set one two three four and we want say all substrings of length four as well. Well that would actually give us this string. The lengths of the string are the length of the subset to the power of the substring of length of the substring. So note though these strings are actually optimally short. They are as short as they can be while still containing all the substrings. So going back to our RPS set let's say now we want all substrings of length six. We get this so we can see how quickly the length of these strings increases. You can generate De Bruyne strings quite easily even that super long one. This would allow you to use several or such as that super long one to make them quite hard to detect. So this is the code I use to generate De Bruyne strings. I translated the Python code taken off the Wikipedia page into Ruby code. Now using a fixed string is a pretty good starting strategy and now we have a strategy to beat the frequency counting bot. But how do we go about beating these balance bots such as our fixed string bot? We need a new method. We could employ direct history. Direct history is a pretty solid strategy for beating an opponent. We can store their move history and look for patterns. And let's say here is our opponent's last 30 moves where this is their latest move and this is their first move. Now let's say we take a window of length 10. So we take the opponent's last 10 moves and we search their history for a match. We find it once giving us a count of one. We also get their following move. This would be the move we'd want to beat and we can see it's P. We could also look at a window size of nine which gives us a single match with the same following move to beat and we could keep decreasing our window size and checking for matches. Once we get to four we actually have two matches and we can see that we have the possible following move P or the possible following move R. And we get this same situation with a window size of three. So I put all of this in one of my favourite things, a table, and this now gives you several strategies. You could either A, take the matches of the longest length that give you this row so you'd be trying to count to P. You could take the matches of the highest count which would give you these three rows and you'd either count to P or R. Or you could take all the window sizes and take the highest frequency following move giving this column. Now there's a bunch of ways you could implement direct history but let's look at a most basic example first. So here's my pretty basic implementation. It's dirt simple and doesn't work for the first seven moves but it is a basic working example implementing dynamic history with a window size of three. We first append their latest move onto our string of moves. Then here we have some rejects to find matches of length three that match the end of the string. It also captures the following character to the match so we know what move we'd like to counter. We use this rejects to search for our opponent's history, we take their head and capture and we return our counter move. Let's look close about that rejects. So to understand this sequence it's probably a good idea to start at the end. So the dollar character means the end of the line and the backslash one represents a previously captured group. In this case the first capture group so we're looking from here backwards. And this is our first capture group. Capture groups are in round brackets. We're capturing any character of RP or S exactly three times and we know from the last slide we also want this to match at the end of the string so we can see we have the match R, R, R. Next again we are capturing any one of RP or S and this is the part where capturing our opponent's following move the one we want to counter which we can see is S here. And lastly we say let there be any one of RP or S zero or more times and now you might be able to see why seven is the maximum this would work with. Well we've got our window size of three so we want the last three moves plus we want the three to match it and then we want the following move so all that makes seven. So let's look at some examples implementing some of the other direct history strategies. Let's start with the biggest window size i.e. the longest match. This code implements biggest window size. We look through window sizes from our largest to our smallest so from here five down to two and we use our rejects we had previously to search for a match. If we find one we break out the loop and return their following move we're not interested in any smaller window sizes. We also call a fallback method which could return whatever you'd like but it's just in case we don't find any matches. There was also the greatest number of matches and this is where taking the match that occurs the most. It's similar to our previous bot getting the biggest window size but instead we get the last X number of moves and use this to scan the rest of their history. We count how many matches and take the following move so once we've looked through all the window sizes we counter the move for the greatest number of matches. Again we call our fallback method if we find we don't find any and then there was the frequency of following move and this takes every match of all window sizes and groups them by the following move. We then counter the following move that occurs the most. This code starts by setting up a hash of all the following moves starting with zero and the count is increased within the loop where we are again looping from our biggest window size to our smallest and it uses regex to find a match and again captures the following move and it uses the following move to add one to the matching move in the hash. After capturing all matches it takes the highest occurring following move and returns the counter move. So we have a selection of strategies now and we could make our bot play all of them but with only one move that we can play how do we choose which strategy? Well we can make it play them all and keep a tally of which is doing best over time but let's not worry about what for now. What we really need to worry about is our opponents guessing what we're doing and countering against it i.e opponent modelling aka mind games. Let's say we choose one of those strategies let's call it strategy A to play and let's say that strategy A tells us to play rock. Now let's assume our opponent is quite smart suppose our opponent knows the strategy we're using and thinks we're using it so they will play a counter to the move our strategy will tell us to play they will counter with strategy B and play paper. Now we think they may know our strategy and counter it in other words we think that they know that we have this strategy A that is telling us to play rock so we counter their counter and call bluff with strategy A prime so we play scissors now they may also think that we may bluff them in other words they think that we know that they know that we have this strategy A so they counter our bluff A prime with B prime and they play rock however breaking this down to the last two bluff moves we think that opponent has counter bluffed our bluff in other words we think that they know that we know that they though we have this strategy A so we double bluff with strategy A double prime and play paper but surely our opponent may counter double bluff our double bluff in other words they think that we know that they know that we know that they know that we have this strategy A so they may use strategy B double prime calling that causing them to play scissors now some of you might be a slightly ahead of me and be thinking aha next will be the triple bluff but that would be pointless as that gives us the same strategy as the straight move for all the talk about mind games these are just simple rotations if our straight no bluff move returns rock our bluff is scissors and our double bluff is paper and each of these bluff strategies can be combined with one of our previous strategies to make a meta strategy so meta strategy is a strategy and our rotation of one of our moves and each one of these cells is a meta strategy that we could apply now we have all the strategies we have before multiplied by our bluff strategies we need another strategy to decide which meta strategy to employ what we can do is store all the results of every one of the meta strategies so for each meta strategy that wins we add one to its score several meta strategies will win at the same time and when it loses we subtract one from its score the meta strategy that is doing the best at any given point is the strategy we choose this is our strategy to choose a meta strategy our meta meta strategy now let's jump to much further down the line we've played hundreds of rounds now and we've got a winning meta strategy here that we are playing and it continues to do well but what if our opponent suddenly changes strategy just for moving one from the value would take a long time for the numbers to change and for us to change our meta strategy an alternative would be to reduce by a percentage of the value eg 10 so the values could decay here we can see the value is quite quickly reduced and we can adapt to any new strategies our opponent uses and we are now at the point where our original meta strategy is much lower than our new winning meta strategy and it didn't take that many rounds to adapt we now understand ways in which we can manage our table and select our winning strategy we might also want our table to take into account amounts of past moves so we could use different lengths of move history when applying each strategy our rows for example we can run each of our strategies over the last five rounds over the last 50 rounds and over the last 100 rounds all at the same time then we can use the general meta strategy to figure out which one does best so in our new table here we have the past 10 rounds for every strategy and the past 100 rounds for every strategy we then have the same situation we have before giving scores as they win and lose but we are applying our strategies twice with the past 10 and over the past 100 and this could allow us to adapt to new strategies more quickly for example if our opponent changes strategies we could be evaluating their past 100 moves and their past 10 moves but our evaluation of their past 10 moves is less influenced by their previously used strategy allowing these meta strategies to do really well on top of this we can apply the same variation history of past moves to each of the meta strategies so we're here we have our table duplicated one table covering the past 1000 moves rounds and one covering the past 500 rounds then we can use the meta meta strategy as before to decide which meta strategy to use this is how i came powder by dan edinall won the 1999 world rock papers is that international bot competition now i say i told you it's going to be fascinating and as dars billing stated simple rules do not preclude a deep strategic complexity and as a final note i've built these strategies meta strategies and the meta meta strategy into a slack bot you can play the slack bot and see how it reacts so here it starts learning my move and then when i change move it adapts and starts beating me again i built this mostly for fun but also as an example of these strategies working so you too can have this in your slack bot if you go to the above address you'll find a add to slack button so you too can have this in your slack and the second link is the github link here prs welcome of course i'll share these slides later on twitter as well thank you