 Hello. Today I'm here to talk about understanding strategy and deception using game theory. A little bit about me, my name is Juno. I work as a red team consultant and also do security research. I moved to Texas in 2018 from Alaska and since then have become really involved with the local hacker community. I currently run DC 214 Dallas's DEF CON group and I've been very involved in the Dallas Hackers Association, which is where I actually got my start speaking about game theory. Outside of the hacker community, I also volunteer and teach through Girls Who Code, Black Girls Code, and it's in the local schools. I've got my bachelor's degree in computer science and economics and I'm getting my master's degree in cybersecurity from NYU. And my contact information is up on the screen if you have any questions or comments or just want to know more. First, a little bit of a roadmap. First, we're going to start by talking about what is game theory and how has it been used previously. And from there, we're going to look at the prisoner's dilemma, which was one of the most popular game theory examples. We're going to look at how to find a simple equilibria and then look at a repeated game where the players repeat the same interaction more than once. Next, we'll look at brinksmanship, how drastic action can force your opponent's strategy to change, and a signaling game where players have the opportunity to trade information that may or may not be true using a very simple attack-defense scenario and we'll look at finding an equilibria there as well. And there are a couple of definitions to keep in mind, the first of which game is the term I'm going to use to refer to the interaction that we're modeling, and players is the term we'll use for the ones doing the interaction, whether or not they're human. And rationality is defined by economists is a little bit different. It's defined as doing what is best for oneself 100% of the time. Now, that's quite a different definition than I'd say most of us have, but we need this as a rule in order to explain why players choose the highest payoffs in our models moving forward. So we're actually going to start with one of the most common game theory examples, the prisoner's dilemma. You're probably familiar with it. It's pretty popular either in the pilot episode of half a dozen cop shows or intro to psych and philosophy textbooks. But if you're not familiar, it's really where two individuals are arrested after committing a crime. They're separated and offered the same plea deal. That gives them two choices. The first choice we'll call defect, confess and you will go free, only your partner will be charged. Or cooperate, stay silent, don't say anything. However, if both players choose to defect, both will be charged. If both cooperate without communicating, both will get a lighter sentence. Now, obviously that's the best option, but is that what really happens? So here I have a payoff matrix for our two prisoners, Alice and Bob. And Alice and Bob both have the option to cooperate or defect. And the numerical values I have assigned as the payoff just represent the amount of good to each individual player. For example, if Alice chooses to defect while Bob chooses to cooperate, she gets that highest payoff of two, the best possible outcome for Alice getting off free. While Bob gets that payoff of negative one, the worst possible outcome for him, which is getting the maximum sentence. Now, because Alice and Bob are separated, they're essentially making their decisions independent of each other. This is a simultaneous game. So all each player has to go off of is what they believe their opponent will choose to do. So if Alice believes that Bob will choose to cooperate, she'll choose to defect for that higher payoff. Similarly, if she believes Bob will choose to defect, she'll choose to defect as well again for the higher payoff. And because Bob's in the exact same situation and has the exact same payoff, he'll make the same choices choosing to defect if he believes Alice is going to choose to cooperate or choosing and choosing to defect if he believes she'll choose to defect. This leaves us with a logical outcome of the game or the Nash equilibrium where both players defect. Now, obviously, this isn't the best outcome of the game for either player. If both players choose to cooperate, we have what's called a Pareto optimal outcome, an outcome that's better for both players. However, in real life, this doesn't just happen in a vacuum. Nothing in my previous model addressed any of the potential consequences of these decisions outside of the direct interaction. So we can start to make the model more realistic by looking at a repeating game where Alice and Bob end up in the same situation over and over and end up having to choose whether to cooperate or defect each and every time again and again. And when the same interaction happens repeatedly, players can begin to use a punishment strategy to motivate each other to make the choices that they would want. And there are many types of punishment strategy, but we're going to look at what's called the Grim Trigger punishment strategy. Cooperate until betrayed and then choose to defect forever after that. Now, obviously, this leaves the player open to betrayal at the beginning, but there's retribution if the player is betrayed. One thing we have to keep in mind when we're looking at the math for this, though, is the discount factor, which is a measure of patience. Essentially, future payoff is worth less than present payoff. $10 now is worth $10, $10 two days in the future is worth 10 times D, a number from zero to one that represents the decrease in payoff. So if you're in this scenario and you believe the opponent is using a Grim Trigger punishment strategy, you know that as long as you haven't defected, your opponent will cooperate with you. So we can find the payoff to cooperate is going to be equal to one minus the discount factor multiplied by one. The best possible payoff you can get in the scenario if you're sorry multiplied by one, the payoff you get for cooperating plus one, the payoff you get for cooperating multiplied by the discount factor on out and repeating out into the future. And this actually just simplifies down to one. And then if you choose to defect, your payoff is going to be equal to one minus the discount factor multiplied by two, that payoff you get for defecting right now plus zero, the best zero, which is the best possible payoff you can get after choosing to defect. Now that you know your opponent is choosing to defect every single time, multiplied by the discount factor on out into the future, and this simplifies down to two minus one D. And so when we, and when we put these equations together, we find that the payoff to cooperate is greater than the payoff to defect, as long as the discount factor is greater than one half. So for cooperation to have a higher payoff than defecting, the future payoff must be worth at least half of the current payoff. However, if you do choose to betray your opponent, you know that they will play defect indefinitely. So then your payoff to cooperate is going to be equal to one minus the discount factor, multiplied by negative one. The payoff you get for cooperating when you know your opponent is going to defect, plus zero, the best possible payoff you'll get the next round with your opponent defecting, multiplied by the discount factor on out into the future, which simplifies down to D minus one. And the payoff to defect is going to be one minus D, multiplied by zero, the payoff you get for defecting along with your opponent, plus zero, the best possible payoff you can get multiplied by the discount factor again on out into the future. And so the payoff to cooperate in this scenario is only greater to the payoff to defect when D is greater than one, which isn't possible because it's a percentage. So if you believe your opponent is using a grim trigger strategy, once you have defected, it is better to continue to defect. So next we're going to talk about brinksmanship, which is defined as the practice of trying to achieve an advantageous outcome by pushing dangerous events to the brink of active conflict. So here we have another scenario between Alice and Bob, where Alice and Bob are playing a game of chicken. And each player has the option to swerve or to stay. And if both players swerve, they tie. If one player swerves while the other one stays, the one that stays wins the game of chicken. However, if both players stay and neither of them swerve, they crash. Again, like in our last model, we can translate this into a payoff matrix, essentially ranking the outcomes for each player. So like our previous game, this is a simultaneous game, and all our players can base their decisions off of is what they believe their opponent will do. So if Alice believes that Bob will choose to swerve, she'll choose to stay to win the game for that highest payoff of two. And if she believes that Bob will choose to stay, she'll choose to swerve to avoid the crash and for that higher payoff of zero. And Bob is in the same situation and has the same payoffs, choosing to stay if he believes Alice will swerve and swerve if she stays. So our two Nash Equilibrams are where Alice stays and Bob swerves, and Alice swerves and Bob stays. Now, what if they had different motivations? What if instead of Bob being Alice's opponent, Alice's opponent was say a stuntman in a stunt car who could take a crash without as much damage as she could? In this scenario, our stuntman would choose to stay whether or not he believed Alice choose to swerve or stay. So this makes our Nash Equilibrium an outcome where Alice chooses to swerve and the stuntman would choose to stay instead of what we saw with two equally matched opponents like Alice and Bob. So going back to the Alice and Bob example, here we have a decision tree that models the same payoff matrix we showed for Alice and Bob. Alice has the choice to stay or swerve, and then next Bob has the choice to stay or swerve. Now, because this is a simultaneous game, Bob doesn't actually know whether Alice has chosen to stay or swerve. Swerve, he doesn't know which node on our decision tree he's at, and so this dotted line between our two Bob nodes denotes that lack of information. So, and brinksmanship is essentially the act of removing one of these options. So let's say, as Alice is driving, she calls out, hey, I cut my brakes. I can't stop. What Alice essentially does in that scenario is she removes half of the tree. She says, I am not going. I cannot swerve. I'm going to stay. You have to make your choice based on that. And so then Bob knows exactly where he is. He has to choose to stay or swerve based on his knowledge that Alice will choose to stay. And that's brinksmanship is, is essentially removing half of the decision tree using a dangerous action to force your opponent into making a decision based on that. So next, we're going to look at a very simplified down attack-defense scenario between two players, an attacker and a defender. And essentially the defender has the choice to choose to monitor their stuff or not. And the attacker has the choice to attack a system they come across or to ignore it and move on to the next thing. And so, and if an attacker believes that a defender is going to be monitoring a system, they're not, they're going to pass on attacking it. However, if they believe the system is not monitored, they're going to choose to attack. Likewise, if a defender believes their systems will be attacked, they'll choose to monitor, but if they believe they won't, they won't choose to monitor. So unlike in our previous games, we don't have a Nash equilibrium or a logical outcome to the game. So to start to find an equilibrium, we're going to look at P, the likelihood the defender will choose to monitor their systems in the first place. And so for an attacker, the expected value of attacking something they come across is equal to P, the likelihood the system is monitored, multiplied by the payoff the system they attack, they get if the system they attack is monitored, that payoff of negative 10, plus 1 minus P, or the likelihood the system is not monitored, plus 10, which is the payoff of attacking a system that's not monitored. So the expected value of attacking is going to be equal to negative 10 P, plus 10, multiplied by 1 minus P. And the expected value of passing is always going to be equal to 0. So when we set those equal to each other, we get P is equal to 1 half. This means that P is equal to 1 half, or if it's 50-50, whether the system is monitored, the attacker is forced into indifference about whether or not they attack. And we can see this with this graph that I have here. Essentially the expected value to the attacker goes down as the likelihood that the system is being monitored increases. And so by changing P, defenders can actually force the attackers into indifference or into making a different decision. Now, what happens when one player can lie? We saw how information can change an outcome of a game, but how do we actually change that information? Well, using a signaling game, we can start to look at what happens when a player starts to signal with information that may or may not be false in order to force their opponent into a different outcome. So for our signaling game, we're going to look at the same game with two players, attackers and defenders. But this time our defenders are either high or low quality, and attackers want to attack low quality defenders. Now, any defender can signal that it's a high quality defender and its systems are secure, but signaling comes at a cost, and this cost is higher for low quality defenders. And I should talk for a minute about what a signal actually looks like because there is a big difference between a credible signal and one that's not. For example, if a company tweets on Twitter, we're unhackable, we're going to see them go down in probably the next three hours. But a lot of signals we see are things like buying overpriced security systems that never actually get properly implemented, stuff like that. And this is a bit of an abstract example, or this model is a bit abstract, but it's good to keep those real world examples in mind. Now, one thing to keep in mind is the attacker does not know what capabilities the defender it comes across has. This makes this game a game with incomplete information. So here I have another decision tree for our signaling game. And the first move is just a move by nature, determining whether any given defender is high or low quality. And we're going to say high quality defenders appear at a percentage of one third or 33% of the time, and low quality defenders who can't monitor their stuff appear at a percentage of 66% of the time. Next is the choice by the individual defender to signal that they're a high quality defender or not. And of course that signal comes at a cost. And finally the choice by the attacker to attack or pass. And the dotted line between these attacker nodes here signals that the attacker does not know which part of the tree that it's on. The attacker knows only that they've seen a signal, whether or not the defender is high or low quality. So first we're going to look for a separating equilibrium where different players do different things. So we're going to say if a defender is high quality they'll choose to signal and if a defender is low quality they'll choose not to signal. In this scenario if an attacker sees a signal they'll choose to pass knowing that high quality defenders are signaling 100% of the time. And if they see no signal they'll choose to attack knowing that low quality defenders are not signaling 100% of the time. Now this equilibrium actually fails because if we see our low quality defenders are choosing not to signal and are getting attacked for that payoff of negative 10. However if attackers are passing on 100% of signals it's worth it for low quality defenders to choose to pay to signal for that payoff of 10 instead. And so this separating equilibrium doesn't work. So next we're going to look for a pooling equilibrium where all defenders choose to signal. If all defenders are signaling attackers are going to attack X% of the time based on the probability the signal is coming from a high or low quality defender. And the probability of the signal coming from a high quality defender is one third, the rate at which they appear based on nature. And the probability of signals coming from a low quality defender is going to be equal to two thirds based on the rate they appear in nature. So if attackers attack X% and the payoff for attacking a high quality defender is negative 10, the payoff for attacking a low quality defender is 10. So the expected value to an attacker of signal or of attacking given that they've seen a signal is going to be equal to the probability the signal is coming from a high quality defender multiplied by that payoff of attacking a high quality defender that negative 10 plus the probability the signal is coming from a low quality defender multiplied by the payoff of attacking a low quality defender. And this ends up being equal to three and a third. The expected value of passing is always going to be equal to zero. So in this scenario 100% of the time the expected value of attacking is greater than the expected value of passing. So attackers will attack 100% of the time. So next we're going to look for a semi-separating equilibrium. And that's where if a defender is high quality they'll choose to signal and if the defender is low quality they'll choose to signal X% of the time. And so attackers, given that they've seen a signal, will choose to attack Y% of the time and given that they've seen no signal will attack 100% of the time. So the probability, and here I have the equations for the probability that the signal comes from a high quality defender and the probability that comes from a low quality defender, which we'll need later. But I won't get into how I found that just right now. So first we're going to find Y. The probability the attacker will attack a defender given that they've seen a signal. And to find that we're going to look at the expected value to a low quality defender. The expected value to a low quality defender of signaling is going to be equal to Y. The probability they get attacked multiplied by the payoff of sending a signal and getting attacked, that negative 13. Plus 1 minus Y, the likelihood the player is not attacked, multiplied by the payoff of sending a signal and getting passed on, which is that payoff of 7. And we'll compare that to the expected value of not sending a signal, which is the payoff of getting attacked, or negative 10. And when we set those equal to each other, Y is going to be equal to 17 over 20. Meaning that attackers will attack if they see a signal 17 times out of 20. And so what this ends up looking like is the expected value of signaling decreases as Y increases. And when Y is equal to 17 over 20, low quality defenders are indifferent about whether or not they signal. And so from there, we're going to do the same thing to find X. The probability a low quality defender will signal their high quality. And that's going to be based on the expected value to an attacker of attacking, which is going to be equal to the probability the signal is coming from a high quality defender, multiplied by the payoff of attacking a high quality defender, plus the probability the signal is coming from a low quality defender, multiplied by the payoff of attacking a low quality defender. And the expected value of passing is always equal to 0. So when we set those equal and solve for X, X is going to be equal to 1 fourth. So in this scenario, a quarter of low quality defenders will signal that they can monitor their systems. And so again, we can graph the expected value to an attacker. As X, the likelihood that a low quality defender is choosing to signal increases, the expected value of attacking increases along with it. And when X is equal to 1 fourth, attackers are forced into indifference. So when we run through this game with these numbers, we find a semi-separating equilibrium where a quarter of low quality defenders will signal that they're high quality, and attackers will attack given that they've seen a signal 17 times out of 20. So where do we go from here? I hope to have given you a good demonstration of how we can break down conflict into steps. And by going through those steps and determining the payoffs, each step of the way we can start to model the outcomes using game theory. We can also watch how strategies interact and can change each other. And I want you to think about where do you see some examples of some of these similar strategies being used on a regular basis? What do you think you can start to apply this to? Now obviously there are some limitations. I'd say one of the biggest limitations of game theory is determining payoffs and motivation in a realistic fashion. The numbers I use in these examples were just that. Examples to make the math come out relatively nicely. However, when determining payoffs, I often try to look at it as essentially ranking the outcomes for each of the players and from that doing the amount of net positive or negative, but there are a lot of different ways. And it's often hard with some of these more complex games when there are motivations you potentially may not have been able to account for. And as these games get more and more complex, of course they're harder to model, especially without the use of a computer. But we've been using game theory to model conflict throughout centuries and I believe when applied to cybersecurity we can create complex and robust models of threat behavior. Anyways, my name is Juno. Thank you so much. I have my contact information below. Please feel free to reach out with questions and I have some of my sources and further reading up there. Thank you so much.