### A Paradox of Rationality and Cooperation

A walrus and a carpenter were walking down a beach on a sunny winter’s day. The sand was still frosty in places, but the frost was melting in the sun. The water was calm, and waves lapped gently on the beach.

“Isn’t it beautiful?” said the walrus, after they had been walking along in silence for a while.

“Yes, it is.” said the carpenter. “I’m glad you suggested this walk.”

“Well, I wanted to share this beautiful day with you.” said the walrus. “And I have a paradox to share with you as well.”

“Oh no!” said the carpenter, laughing. “This was a trap!”

“Well, maybe a little,” admitted the walrus, “but I really think you’ll enjoy this paradox.”

“Okay.” said the carpenter. “Hit me with it.”

“You know what the prisoner’s dilemma is, don’t you?”

“Yes, it’s an important concept in game theory. It’s an example of a game that isn’t zero-sum.”

“Right. We can forget about the original idea of two prisoners, and just define it as a simple game with two possible moves: cooperate or defect. There are two players, and they simultaneously choose one of those moves, and then the payoffs are determined. The payoff matrix looks like this…”

He stopped talking and drew a table on the sand.

A’s Move | B’s Move | A’s Score | B’s Score |
---|---|---|---|

Cooperate | Cooperate | 1 | 1 |

Cooperate | Defect | −2 | 2 |

Defect | Cooperate | 2 | −2 |

Defect | Defect | −1 | −1 |

“If both players cooperate, they each receive one dollar. If both defect, they each lose one dollar. If one cooperates and the other defects, the cooperator loses two dollars, and the defector gains two dollars.”

“I remember it.” said the carpenter. “The point is that it always pays to defect, because it improves your payoff regardless of what the other player does. In both cases, it improves your payoff by one dollar.”

“Exactly.” said the walrus. “Your best strategy is to defect, and so we expect both players to defect.”

“In which case,” said the carpenter, “they will both lose one dollar. If both cooperated, they would both gain a dollar, but both have an incentive to defect, and so they will end up losing money. It’s a little tragedy.”

“Right.” said the walrus. “The prisoner’s dilemma is a very important concept. It explains why it is often difficult to create cooperation, even when it would benefit both sides.”

“Is that the paradox you wanted to share with me?” asked the carpenter.

“No, that’s just the background.” said the walrus. “I assumed that you knew about the prisoner’s dilemma. I just wanted to refresh your memory.”

“So, what is the paradox?” asked the carpenter.

“We’ll get there eventually,” said the walrus, “but there’s one more bit of background knowledge that we need to review. Are you familiar with the iterated prisoner’s dilemma?”

“Yes.” said the carpenter. “It’s when you have multiple games of the prisoner’s dilemma in a row. By the way, let’s call them ‘PD games’ from now on. So, you have multiple PD games in a row with the same player. In that scenario, cooperation can emerge.”

“Right.” said the walrus. “In a series of PD games with the same player, you can signal your intent to cooperate by cooperating. You may do worse on one iteration, but if the other player reciprocates, then you both win in the long run. The best strategy is ‘tit-for-tat’, in which you cooperate or defect based on the other player’s previous move. If he cooperated, you cooperate. If he defected, you defect. Assuming that he is reasonably intelligent, he will recognize your strategy and adopt the same strategy. The result is that both players will cooperate and win, instead of defecting and losing. That is how social order emerges from selfish individuals.”

“So the iterated prisoner’s dilemma is also part of the background to this paradox?”

“Yes. We need to understand the iterated PD game and how it allows for cooperation to emerge. Now that we’ve established that, I can explain the paradox.”

“I’m all ears.” said the carpenter.

“Suppose that you are playing an iterated PD game with another person, and you know that it will consist of precisely 100 PD games in a row.”

“Okay. In that case, I would think tit-for-tat is the best strategy, at least until the last couple of moves.”

“Would you? Perhaps it is, but in my bookshelf at home I have a book written by a Harvard professor that says otherwise.”

“Well, I wouldn’t take a professor from Harvard *too* seriously.” said the carpenter. “What does he say?”

“According to the Harvard professor, the rational strategy is to defect on every iteration. His argument seems pretty solid.”

“What is his argument?”

“On the last move, it is rational to defect, because it is just a single PD game. Do you agree?”

“Yes, I agree. On the last move, both players will defect.”

“So, you expect the other player to defect on the last move. It follows that you should defect on the second-to-last move.”

“That makes sense,” said the carpenter. “The reason for cooperating is to get him to cooperate in the future. If you know that he will defect on the next move regardless of what you do, then you should defect on the current move.”

“Exactly. And if your opponent is reasonably intelligent, you expect him to figure this out as well, and defect on the second-to-last move too. It follows that you should defect on the third-to-last move.”

“I see where this is going.” said the carpenter.

“I’m sure that you do.” said the walrus. “We know that it is rational to defect on the last move. We also know that it is rational to defect on move X if the other player will defect on move X + 1. Assuming that the other player is rational, it is rational to defect on move X if it is rational to defect on move X + 1. By mathematical induction, we can conclude that it is rational to defect on every move.”

“I see. I guess I was wrong. It is rational to defect on every move. Is that the paradox?”

“Not exactly.” said the walrus. “I *don’t* believe that it is rational to defect on every move. The paradox is: what is wrong with the professor’s argument?”

“I don’t see anything wrong with it.” said the carpenter.

“Suppose that we are playing the game, and it’s the first move. You have reasoned out that the rational strategy is to defect on every move, and so you defect on the first move. You naturally expect me to do the same. But, to your surprise, I cooperate on the first move. What do you do now?”

“Huh.” said the carpenter. “Then the argument goes out the window. If you cooperate on the first move, then it is no longer rational for me to defect on every move. I guess I would cooperate on the next move, and hope that you’re following a tit-for-tat strategy.”

“Right.” said the walrus. “By cooperating on the first move, I can signal that I am not going to follow the supposedly rational strategy of defecting on every move. That frees you up to do something else too. I can change the rationality of the strategy by refusing to adopt it.”

“So, in a sense, it is rational to be irrational, to signal your irrationality to the other player, so you can break out of a vicious cycle.”

“That’s one way to think about it.”

“But what happens at the end of the game? At what point do you defect? We know that we’re both going to defect on the last move, and probably on the second-to-last move, and so on. The logic of the professor’s argument still applies.”

“You can’t define the ‘right’ move on which to start defecting. If X was the right move to defect on, then X − 1 would be the right move. It’s undecidable.”

“So, there’s no uniquely rational solution.” said the carpenter. “And the result won’t be a Nash equilibrium, unless both players defect on all moves. If they are cooperating for a while, and one defects first, then the other could have done better by defecting on the previous move.”

“Right, and that’s an interesting point.” said the walrus. “Normally, we think of a Nash equilibrium as predictive. In the single PD game, (defect, defect) is a Nash equilibrium, and it is predictive. In the iterated PD game with an unknown number of moves, (tit-for-tat, tit-for-tat) is a Nash equilibrium. In the iterated PD game with a known number of moves, (defect-always, defect-always) is a Nash equilibrium, but it isn’t stable. If either player cooperates, the symmetry is broken, and it collapses. So, maybe Nash equilibria are not really that predictive.”

“Hmm…yes.” said the carpenter. “We expect games to ‘fall into’ Nash equilibria. But in this case, it falls out of one. The equilibrium exists, but it is not stable.”

“Exactly!” said the walrus.

“You’ve given me a lot to think about.” said the carpenter.

“I knew that you would enjoy this paradox.” said the walrus.

They lapsed into silence, and continued walking down the beach. Overhead, a seagull soared in the blue sky.

It's rational to cooperate.

ReplyDeleteUsing your numbers, co-op/co-op x100 gives 100 points

Co-op/co-op x99 + co-op/defect gives 101/97

Co-op/co-op x98 + co-op/defect + defect/defect gives 100/95

Co-op x97 + coop/defect + defect x2 gives 98/93

96 co-opts give 96/91, so, defecting at 96 always has worse rewards than coopt until the end

Clearly it isn't rational to cooperate until the end, because you'll always do better by defecting on the last move. I don't know what your sums mean -- there's no point summing anything here.

Delete