The good old Prisoner’s Dilemma.

You and a friend have committed a crime and have been caught. You are being held in separate cells. You are both offered a deal but have to decide what to do. But you are not allowed to communicate with your partner and you will not be told what they have decided until you have made a decision.


So there are two players, 1 and 2. Strategy C is cooperate, D is defect
There are four outcomes:
CC: 5, 5
CD: -10, 10
DC: 10, -10
DD: -5, -5

The Nash Equilibrium is DD. This is the worst possible outcome.

As Nash found, every competitive non-zero-sum game has at least one equilibrium. This is not always the optimal outcome, as is the case with the Prisoner’s Dilemma.

We see that a mutual cooperation strategy (CC) provides the largest payoff. But D is the dominant strategy because each side has a greater incentive to defect if their partner cooperates. They also have an incentive to defect to minimize their losses in the event the other player defects as well. The Players have no incentive to unilaterally change strategies. The Minimax is DD, so it is the most rational strategy.

Negotiations do not help. Any agreement is non-binding and unenforceable, so both players return to the DD strategy even if they are aware they can benefit more from a CC strategy.

What if there were multiple but finite set of games? Could the players learn to cooperate? No.

Take 100 repeated games and work backwards. One would think both players may learn to cooperate out of mutual self-interest until the last game.

What happens on the last game? There is no incentive for future cooperation, so they will both defect on the 100th game to maximize utility on that move. There is nothing left to lose so both sides have a higher incentive to defect. Since both know that game 100 will result in defections no matter what, this means that game 99 is the last game. By the same logic of game 100, they both defect on game 99. Then again on game 98… all the way back to game 1. Every game will result in DD. Thus, it’s the Nash Equilibrium.

If there are two PD games, what strategies are there?
There are eight strategies player 1 can pursue:
1) 1st move, choose C. 2nd move, choose C.
2) 1st move, choose C. 2nd move, copy Player 2’s first move (Tit for Tat).
3) 1st move, choose C. 2nd move, do the opposite of Player 2’s first move.
4) 1st move, choose C. 2nd move, choose D.
5) 1st move, choose D. 2nd move choose C.
6) 1st move, choose D. 2nd move, copy Player 2’s first move
7) 1st move, choose D. 2nd move, do the opposite of Player 2’s first move
8) 1st move, choose D. 2nd move, choose D.

This gives 64 possible outcomes which can be placed on a game matrix with 16 unique results. There are 4 possible CC results which give utilities of 10, 10. And 4 are DD results with utilities of -10, -10. The rest are variations of mixed CD and DC strategies. The best result for player 1 is 20, -20 with two DD moves if Player 2 moves CC both turns.

If you look at the strategies, there are counterstrategies for each one. Strategy 2 (Tit for Tat) seems like a popular approach. It starts by naively cooperating and is willing to punish defections. Strategy 4 counters tit for tat by bluffing. Both cooperate in turn one, Tit for Tat assumes that Strategy 4’s first move of cooperation will mean a second turn of cooperation. Strategy 4 stabs ‘Tit for Tat’ in the back and wins.

In a two turn game, Strategy 8 (Defect, Defect) will dominate, even though Strategy 1 (CC) is optimal. What happens over an infinite set of games without an end point? Can two players learn to cooperate?

For infinite iterations of the PD game, cooperation is theoretically possible.

Instead of counting individual games, we can use a time formula to measure past and future moves. So t = time or turn of play. There is (t-3) (t-2) (t-1) (t+1) (t+2) (t+3) etc. Insert any time you want and you can study the pattern of moves proceeding and following it. Mathematical genetic algorithms can predict potential strategies in an infinite set of games.

Axelrod hosts an Iterated Prisoner’s Dilemma Tournament, where programmers create computer players for a round-robin competition. Tit-For-Tat was the original winner, although the latest tournaments show new strategies winning. Tit-For-Tat was always one of a mix of good strategies.

In 2004, a team from Southampton beat Tit for Tat and other strategies.

Teams could submit multiple strategies, or players, and the Southampton team submitted 60 programs. These, Jennings explained, were all slight variations on a theme and were designed to execute a known series of five to 10 moves by which they could recognize each other. Once two Southampton players recognized each other, they were designed to immediately assume “master and slave” roles — one would sacrifice itself so the other could win repeatedly.

If the program recognized that another player was not a Southampton entry, it would immediately defect to act as a spoiler for the non-Southampton player. The result is that Southampton had the top three performers — but also a load of utter failures at the bottom of the table who sacrificed themselves for the good of the team.

For finite games, the only way to break the game is to change the mechanism and install a punishment for defection. That way cooperating will be more attractive than defections.

Advertisements