### A Paradox of Rationality and Cooperation

A walrus and a carpenter were walking down a beach on a sunny winter’s day. The sand was still frosty in places, but the frost was melting in the sun. The water was calm, and waves lapped gently on the beach.

“Isn’t it beautiful?” said the walrus, after they had been walking along in silence for a while.

“Yes, it is,” said the carpenter. “I’m glad you suggested this walk.”

“Well, I wanted to share this beautiful day with you,” said the walrus. “And I have a paradox to share with you as well.”

“Oh no!” said the carpenter, laughing, “This was a trap!”

“Well, maybe a little,” admitted the walrus, “but I really think you’ll enjoy this paradox.”

“Okay,” said the carpenter. “Hit me with it.”

“You know what the prisoner’s dilemma is, don’t you?”

“Yes, it’s an important concept in game theory. It’s an example of a game that isn’t zero sum.”

“Right. We can forget about the original idea of two prisoners, and just define it as a simple game with two possible moves: cooperate or defect. There are two players, and they simultaneously choose one of those moves, and then the payoffs are determined. The payoff matrix looks like this…”

He stopped talking and drew a table on the sand.

A's Move | B's Move | A's Score | B's Score |
---|---|---|---|

Cooperate | Cooperate | 1 | 1 |

Cooperate | Defect | -2 | 2 |

Defect | Cooperate | 2 | -2 |

Defect | Defect | -1 | -1 |

“If both players cooperate, they each receive one dollar. If both defect, they each lose one dollar. If one cooperates and the other defects, the cooperator loses two dollars, and the defector gains two dollars.”

“I remember it,” said the carpenter. “The point is that it always pays to defect, because that improves your payoff regardless of what the other player does. In both cases, it improves your payoff by one dollar.”

“Exactly,” said the walrus. “Your best strategy is to defect, and so we expect both players to defect.”

“In which case,” said the carpenter, “they will both lose one dollar. If both cooperated, they would both gain a dollar, but both have an incentive to defect, and so they will end up losing money. It’s a little tragedy.”

“Right,” said the walrus. “The prisoner’s dilemma is a very important concept. It explains why it is often difficult to create cooperation, even when it would benefit both sides.”

“Is that the paradox you wanted to share with me?” asked the carpenter.

“Oh no,” said the walrus. “That’s just the background. I assumed that you knew about the prisoner’s dilemma. I just wanted to refresh your memory.”

“So, what is the paradox?” asked the carpenter.

“We’ll get there eventually,” said the walrus, “but there’s one more bit of background knowledge that we have to review. Are you familiar with the iterated prisoner’s dilemma?”

“Yes,” said the carpenter, “It’s when you have multiple games of the prisoner’s dilemma in a row. By the way, let’s call them ‘PD games’ from now on. So, you have multiple PD games in a row with the same player, and in that scenario cooperation can emerge.”

“Right,” said the walrus. “In a series of PD games with the same player, you can signal your intent to cooperate by cooperating. You may do worse on one iteration, but if the other player reciprocates, then you both win in the long run. The best strategy is ‘tit-for-tat’, in which you cooperate or defect based on the other player’s previous move. If he cooperated, you cooperate. If he defected, you defect. Assuming that he is reasonably intelligent, he will recognize your strategy and adopt the same strategy. The result is that both players will cooperate and win, instead of defecting and losing. That is how social order emerges from selfish individuals.”

“So the iterated prisoner’s dilemma is also part of the background to this paradox?”

“Yes, we have to understand the iterated PD game and how it allows for cooperation to emerge. Now that we’ve established that, I can explain the paradox.”

“I’m all ears,” said the carpenter.

“Suppose that you are playing an iterated PD game with another person, and you know that it will consist of precisely 100 PD games in a row.”

“Okay. In that case, I would think tit-for-tat is the best strategy, at least until the last couple of moves.”

“Would you? Perhaps it is, but in my bookshelf at home I have a book written by a Harvard professor that says otherwise.”

“Well, I wouldn’t take a professor from Harvard *too* seriously,” said the carpenter. “What does he say?”

“According to the Harvard professor, the rational strategy is to defect on every iteration. And his argument seems pretty solid.”

“What is his argument?”

“On the last move, it is rational to defect, because it is just a single PD game. Do you agree?”

“Yes, that makes sense. On the last move both players will defect.”

“Right, so you can expect the other player to defect on the last move. It follows that you should defect on the second-to-last move, because if he’s going to defect on the last move anyway, there’s no point cooperating on the second-to-last move.”

“That makes sense,” said the carpenter. “The reason for cooperating is to get him to cooperate in the future. If you know he will defect on the next move, you should defect on the current move.”

“Exactly. And if your opponent is reasonably intelligent, you expect him to figure this out as well, and defect on the second-to-last move too. It follows that you should defect on the third-to-last move.”

“I see where this is going,” said the carpenter.

“I’m sure you do,” said the walrus. “We know that it is rational to defect on the last move. We also know that it is rational to defect on move X if the other player will defect on move (X + 1). Assuming that the other player is rational, it is rational to defect on move X if it is rational to defect on move (X + 1). By mathematical induction, we can conclude that it is rational to defect on every move.”

“Yes, I understand. I guess I was wrong. It is rational to defect on every move. Is that the paradox?”

“Not exactly,” said the walrus. “I don’t think it is rational to defect on every move. The paradox is: what is wrong with the professor’s argument?”

“I don’t see anything wrong with it,” said the carpenter.

“Suppose that we are playing the game, and it’s the first move. You have reasoned out that the rational strategy is to defect on every move, and so you defect on the first move. You naturally expect me to do the same. But to your surprise I cooperate on the first move. Now what do you do?”

“Huh,” said the carpenter. “Then the argument goes out the window. If you cooperate on the first move, then it is no longer rational for me to defect on every move. I guess I would cooperate on the next move, and hope that you’re following a tit-for-tat strategy.”

“Right,” said the walrus. “By cooperating on the first move, I can signal that I am not going to follow the supposedly rational strategy of defecting on every move, and that frees you up to do something else too. I can change the rationality of the strategy by refusing to adopt it.”

“So, in a sense, it’s rational to be irrational, to signal your irrationality to the other player, so you can break out of a vicious cycle.”

“That’s one way to think about it.”

“But what happens at the end of the game? At what point do you defect? We know that we’re both going to defect on the last move, and probably on the second-to-last move, and so on. The logic of the professor’s argument still applies.”

“I don’t think you can define the ‘right’ move on which to start defecting. If X was the right move to defect on, then (X - 1) would be the right move. It’s undecidable.”

“I guess that means the result won’t be a Nash equilibrium, unless both players defect on all moves. If they are cooperating for a while, and one defects first, then the other could have done better by defecting on the previous move.”

“Right, and that’s an interesting point. Normally, we think of a Nash equilibrium as predictive. In a single PD, (defect, defect) is a Nash equilibrium, and it is predictive. In an iterated PD with an unknown number of moves, (tit-for-tat, tit-for-tat) is a Nash equilibrium. In the iterated PD with a known number of moves, (defect-always, defect-always) is a Nash equilibrium, but it isn’t stable. If either player cooperates, the symmetry is broken and it collapses. So maybe Nash equilibria are not really that predictive.”

“Hmm…yes,” said the carpenter. “We expect games to ‘fall’ into Nash equilibria, but in this case it falls out of one. It’s like a hilltop instead of a valley.”

“Exactly,” said the walrus.

“You’ve given me a lot to think about,” said the carpenter.

“I knew you would enjoy this paradox.”

They lapsed into silence and continued walking down the beach. Overhead, a seagull soared in the blue sky.

It's rational to cooperate.

ReplyDeleteUsing your numbers, co-op/co-op x100 gives 100 points

Co-op/co-op x99 + co-op/defect gives 101/97

Co-op/co-op x98 + co-op/defect + defect/defect gives 100/95

Co-op x97 + coop/defect + defect x2 gives 98/93

96 co-opts give 96/91, so, defecting at 96 always has worse rewards than coopt until the end

Clearly it isn't rational to cooperate until the end, because you'll always do better by defecting on the last move. I don't know what your sums mean -- there's no point summing anything here.

Delete