 A convicted criminal once tried to escape jail by threatening the guards with some painted fruit, claiming that it was a grenade, but it was pretty clearly just a prisoner's diet limit. I have some bad news. Your notorious life of crime has finally caught up with you, and you've been arrested for a robbery, along with another crook that you were working with. The cops don't have enough evidence to convict either of you for the robbery itself. They can only prove that you're implicated enough to give you a year of jail time, which isn't ideal, but it could be a lot worse. Of course, it could be a lot better too. The cops come to you while you're waiting for the trial and offer you a deal. If you testify and throw your erstwhile confederate under the bus, you go free. They'll serve three years, but for you, no jail time at all. You just walk. Sounds like a good deal. And then it hits you. The cops are offering the same deal to your former partner, trying to get them to rat you out. If you both talk, then you're both going down for this, serving two years each. What do you do? Do you keep your yaps shut and risk a three-year sentence if they snitch on you? Or do you happily turn them in in the hopes that you might get to go free? This is the infamous prisoner's dilemma, one of the foundations of game theory. Importantly, the whole set dressing of being arrested and cutting deals with the cops is just a convenient way of framing the real problem, which is how to net the best possible outcome for a player of this game. The more floaty abstract principles like honor among thieves or snitches get stitches don't apply here. We only really care about the numbers in this matrix, and which decision would lead to attaining the best possible result, or the highest numbers. What makes it a dilemma is that following the principles of game theory leads to a paradoxical conclusion. If you look at the options and their relative values, in every scenario snitching is better than not snitching. Either you serve no jail time instead of one year, or you only serve two years instead of three. No matter what, if each player is trying to get out as quickly as possible, they have to conclude that snitching is the better option, which means that they both end up serving two years instead of one year. By picking the best options, they get worse outcomes. The prisoner's dilemma is of particular interest to game theorists because it mirrors many real-life situations where cooperation would be best for everyone, but seemingly rational self-interest leads to inferior results. Election reform, nuclear weapon stockpiling, overfishing, doping in professional sports, a healthy number of the issues our society struggles with are examples of this same payoff matrix, and they suffer similar consequences. But there's an interesting evolution of the problem. The prisoner's dilemma, as originally stated, is a one-shot game. You make your choice, you get your result, game over. But that's very rarely how the real world works. We often find ourselves choosing between the same options over and over. Our lives are much more like an iterated prisoner's dilemma. Make a choice, get a result, try again. And again. It's the same game, of course, but played thousands of times with many different individuals, each of whom might have their own favorite strategy. It's not so easy to give a straight answer about what option you should pick in this scenario. The results aren't just three years or one year. They include all future results where your partner could use your past decisions to inform their future behavior. In fact, the game is now so complicated that it becomes much more difficult to mathematically demonstrate what strategy is best, best in what context, against whom. Repeated interactions with ambiguously-minded people who might be cooperative and helpful, or who might just try to screw you over. That sounds a lot like life. It sure would be nice to know how to get the best results from such a game. Robert Axelrod, political scientist and one of those legendary computer science nerds from the 60s, came up with a relatively straightforward way to figure that out. In the context of the Cuban Missile Crisis, the iterated prisoner's dilemma was on everyone's mind, for obvious reasons. So Axelrod organized a little competition among game theorists and programmers to figure out the best possible strategy empirically, by simply running submitted programs against each other in a battle royale. Whoever's program has the most points at the end gets a neat plaque in memorizing their victory. What could be simpler? The strategies that were submitted included just about everything you can imagine. A program called Vlucifer was a single-minded snitch, never cooperating, while Jesus was exactly the opposite, happily turning the other cheek and refusing to retaliate if it got screwed over. The Grudger strategy sounded a lot like the Cold War policy of mutually assured destruction. Cooperate unless your partner tries to pull something over on you, then tattle on them forever, no matter what they do, while the Pavlov program switched its choice every time it lost. There were also much more sophisticated algorithms, but the one that ended up winning was a relatively simple one. Tit for Tat, which starts off by cooperating, then just does whatever its opponent did on their last turn. If it went up against Lucifer, it would only take one round for it to figure out to never trust the snitch. If it went up against Jesus, both algorithms kept their mouths shut and got out in just a year. Against almost every other submission, Tit for Tat did reasonably well, making it the clear champion. But there was a problem. What happens if Tit for Tat plays a slightly less trusting version of itself? The aggressive Tit for Tat strategy starts off by tattling, then mirrors its opponent's decisions. If Tit for Tat runs into this minutely different algorithm, all hell breaks loose. Quite literally, because they both become Lucifer, caught in an endless cycle of retaliation against each other. Not great. In order to get the best possible results in every case, even against itself, the algorithm needed one slight modification. The ability to cut other algorithms some slack. Generous Tit for Tat behaved like the standard strategy, but one time out of ten it threw all caution to the wind and kept quiet, just giving its opponent the benefit of the doubt. In Axlrod's arena, this approach was the clear winner. Now, if two Tit for Tats happened to get off on the wrong foot, there would be an occasional opportunity to rebuild some trust and start cooperating again. They'd end up losing a little ground against programs like Lucifer, but they more than made up for it by finding other Tit for Tats to cooperate with, walking away from the competition with more points than any other algorithm, and the victory plaque. Axlrod saw a few lessons in the success of Tit for Tat, including in its relative simplicity. Many of the more ambitious programs tried to learn their opponent's strategy and exploit it, but this frequently ended in disaster as their own moves became unpredictable and subject to noise. While Tit for Tat chugged away doing its very obvious thing, these clever programs would sometimes end up sending mixed messages about what they were going to do next, often shooting themselves on the foot and losing tons of points. Axlrod summarized the results of the competition with a few pithy guidelines, describing the majority of the most successful algorithms. Be nice, meaning always start off by cooperating. Be provocable, meaning react in kind to other players who try to mess with you. Don't be envious, because the algorithms that focused on maximizing their score did better than the ones that worried about outpacing their opponents. And finally, don't be too clever, suggesting that clearly messaging what you were going to do was worth more in the end than complex and potentially confusing patterns of behavior. If you're like me and constantly searching for validation for your cooperative and trusting nature, those sound like pretty good takeaways, but that's not the end of the story. In 2012, physicist and computer scientist William Press was having trouble with a simple computer program that he'd written to test some unrelated theory about the prisoner's dilemma. The program kept crashing when it reached certain values. He noticed a pattern in the values that caused crashes, and realized that his code had a built-in assumption that each player's strategy had some effect on their final score. That sounds like a perfectly reasonable assumption, but it turns out that it's not always true. With the assistance of fellow physicist-mathematician Freeman Dyson, yes, the same dude who came up with Dyson spheres, Press realized that he'd discovered a set of strategies that are more about the opposing player's score, which was why his program kept crashing. For example, if you play by cooperating according to these probabilities, governed by the previous round's results, you'll end up with twice the number of points as the other player. The strategy itself doesn't really have any effect on your final score. It could be 100 points, it could be 0 points, depending on what your opponent does. It only guarantees that you'll probably end up with twice their score. These strategies, called zero-determinant strategies, or ZD strategies, have a few different possible outcomes depending on what sort of opponent they're up against. Against a naive algorithm like Jesus or Grudger, they can essentially set the other player's final score. Like, I don't know what I'm going to end up with, that's your call, but I can guarantee that you're only going to get 35 points. Against sophisticated algorithms that try to learn and adapt to maximize their own scores, ZD strategies have the option to extort points from the other player. After all, if they're trying to max out their payoffs and you say, okay, but I'm going to end up with double your score, what choice do they have? They can either hand the ZD player a ton of points or punish themselves and walk away with nothing. Interestingly, tit for tat turns out to be a border case of the ZD algorithm space. It's the most generous member of the zero-determinant family, one which doesn't attempt to extort any points and sets its opponent's score to one times its own score. Depending on the number and variety of opponents, generous tit for tat still wins, but in shorter competitions against a few similar algorithms, greedier types of ZD strategies can pull ahead. When ZD strategies are pitted against each other, an interesting sort of equilibrium develops, just like the tit for tat squared game, both players find their scores largely dependent on their partner, and the negotiation looks familiar. Each player says to the other, look, I'll cooperate when you do, but every time you decrease my final score, I'll do the same to you. Sure enough, when the dust clears, ZD1 and ZD2 will usually walk away with the same number of points. Generous tit for tat is still the king of the iterated prisoner's dilemma, but despite its slight edge in longer games, supposedly due to its deep wisdom, it's entirely possible to beat it with other zero-determinant strategies in the short run. If you notice that your partner is trying to adapt and maximize their own score, it's even possible to use ZD strategies to exploit that. I mean, being nice and not too clever can get you a long ways, but it seems that sometimes it pays to be very, very clever. I mean, Freeman Dyson clever. Are there decent life lessons in the success of generous tit for tat and other zero-determinant strategies in the iterated prisoner's dilemma? Or is it just a neat bit of trivia for mathematicians and game theorists? Please leave a comment below and let me know what you think. Thank you very much for watching. Don't forget to blah, blah, subscribe, blah, share, and don't stop thunking.