Anton Morgunov| ischemist
HomeCVPublicationsProjectsWritingsLeadership
HomeCVPublicationsProjectsWritingsLeadership

© 2018-2026 Anton Morgunov

The ProtocolML ModelsGitHubLinkedIn
Back to all posts
Back to all posts

On this page

  • A brief introduction to game theory
  • Single Interaction
  • Finitely Repeated Games
  • Infinitely Repeated Games
  • Back to Hell
  • The Supernatural Punishment Hypothesis

Hell is an Engineering Patch for the Prisoner's Dilemma

February 7, 2026 · Anton Morgunov
TL;DR

Hell is described as eternal boiling because to stabilize cooperation in an infinitely repeated game (life), you need a "grim trigger" strategy with a punishment severe enough to outweigh any possible gain from defection, regardless of the player's discount factor. This post re-derives the Supernatural Punishment Hypothesis from basic game theory concepts.

The other day I watched a video on political jokes (anekdotin Russian, an anecdote is not just a casual story with a point (like in English), it's a structured joke with a punchline, usually political or social satire. Basically a cultural meme.) in the Soviet Union, and one of the jokes was:

Hitler is boiling in a cauldron of tar in hell, up to his neck. He looks over and sees Beria in a pot nearby, but the tar only comes up to his waist. Hitler shouts: Hey! Why are you only in waist-deep? Beria grins: I'm standing on Stalin's shoulders.so now you can imagine the surprise I had when I read the advice that you should put an anecdote into your college personal statement!

This got me wondering: man, it's interesting that the concept of hell is seemingly overspecified. It's not just something abstractly bad, no, everyone at least once heard the constant boiling part. Why would you need that? It's almost as if you want to make sure the listener understands it's a permanent punishment, it's not just something you can adapt to.

But wait a second, that sounds almost like an engineering patch to reality implemented to enable a grim trigger cooperation strategy.

A brief introduction to game theory

Single Interaction

But before I can unpack, let's introduce game theory: a mathematical framework for studying interactions between rational actors. Rationality is defined as acting in accordance with one's beliefs, i.e. it's a much weaker criterion than what would be implied in the common parlance.but also arguably way more useful. also a reason why criticizing economical models with a "lmao you assume people are rational" is pointless.

The simplest and most famous example of the game (an interaction) that can be modeled with game theory is Prisoner's dilemma. Imagine two people, A and B, are held in separate solitary confinement cells in prison (for the same crime) with no ability to communicate. There's not enough evidence to convict them on an actual charge, so if both stay silent, they'll get convicted on a lesser charge and will have to serve, say, a year. However, each is approached separately by a prosecutor and offered to testify against the other. If A testifies against B (and B stays silent), it can walk free immediately and B will serve 6 years. If both testify, each will serve 5 years. Each outcome can be succinctly recorded as:

A testifiesA silent
B testifies(-5, -5)(-6, 0)
B silent(0, -6)(-1, -1)

where (πA,πB\pi_A, \pi_BπA​,πB​) records the utilityunits of goodness, whole idea of econ is: every agent has function which defines what and in which quantities makes agent happy, and agent's actions are directed towards optimization of that function. That function is called utility function. of players A and B.

It's clear that the collectively best outcome for A and B is for both to stay silent. However, if A knows that B will stay silent, A can increase its utility (0>−10>-10>−1) by deviating and testifying. But because B can model the thinking process of A and recognize A has an incentive to deviate (and thus will deviate), B is strictly better off deviating as well (−5>−6-5>-6−5>−6). Therefore, in the end, both testify.

Both testifying is a stable equilibrium because neither party can get better off by deviating from that strategy: if B testifies, A is strictly better off testifying than staying silent. Game theory is basically a framework that helps you determine when an outcome is stablecalled Nash equilibrium for more complicated game setups.

The fundamental behavior of the Prisoner's dilemma will not be changed if we add some constant to all outcomes, so let's perform a small rebranding and replace testifying and staying silent with more abstract notions of defection and cooperation.

A defectsA cooperates
B defects(1, 1)(0, 6)
B cooperates(6, 0)(5, 5)

Hopefully it's now much easier to empathize with A or B and recognize that (5,5)(5,5)(5,5) is the desired outcome that is unattainable because it's not a stable equilibrium. In a single game, cooperation is impossible because every party has an incentive to cheat.

Finitely Repeated Games

What if we play the defect/cooperate game thousands of times? You might think it should change the calculus, after all, it should be better to cooperate if that could unlock other player cooperating and everyone getting 5s instead of 1. Unfortunately, you can strictly prove that an optimal strategy in a finitely repeated game is derived by backwards induction.

Let's say the game ends at time TTT. Because there'll be no more games to play, the only stable outcome is the same as in a single game interaction: both parties defecting and getting (1,1)(1, 1)(1,1). At game T−1T-1T−1 the total payoff/utility table (for this game and the game in the future) looks like:

A defectsA cooperates
B defects(1+1, 1+1)(0+1, 6+1)
B cooperates(6+1, 0+1)(5+1, 5+1)

basically every outcome gets extra 1 because nothing you'll do in this round will affect the optimal strategy in the next round. So in T−1T-1T−1, both A and B are strictly better off defecting. And you can see how this thinking process gets repeated until T=0T=0T=0, the very first interaction, in which cooperation is not possible because every agent has an incentive to deviate.

Infinitely Repeated Games

The situation only gets different if you extend the horizon to infinity, in which case a grim trigger strategy becomes a stable equilibrium: both parties agree to cooperate until either party defects, after which the other party will always punish the defector by defecting.

First, let's calculate the total payoff if both parties cooperate. Each party gets 5 today, tomorrow, after ttt rounds, etc. The sum would diverge (explode), unless we introduce a discount factor δ<1\delta<1δ<1 (5 tomorrow is only worth 5δ5\delta5δ in today's unitsthis is most easily understood in terms of money: a dollar tomorrow is worth less than today (because you can put dollar today into treasuries and get >1 dollar tomorrow. but discount factor could also model impatience or assumed probability of the game ending))

PV=5+5δ+5δ2+5δ3+…=51−δPV = 5 + 5\delta + 5\delta^2 + 5\delta^3 + \ldots = \frac{5}{1-\delta}PV=5+5δ+5δ2+5δ3+…=1−δ5​

Similarly, if both parties defect, their payoff will be 1/(1−δ)1/(1-\delta)1/(1−δ). As a result, the decision matrix looks like this:

A defectsA cooperates
B defects(1+δ1−δ,1+δ1−δ)\left(1 + \frac{\delta}{1-\delta}, 1 + \frac{\delta}{1-\delta} \right)(1+1−δδ​,1+1−δδ​)(0+δ1−δ,6+δ1−δ)\left(0 + \frac{\delta}{1-\delta}, 6 + \frac{\delta}{1-\delta} \right)(0+1−δδ​,6+1−δδ​)
B cooperates(6+δ1−δ,0+δ1−δ)\left(6 + \frac{\delta}{1-\delta}, 0 + \frac{\delta}{1-\delta} \right)(6+1−δδ​,0+1−δδ​)(5+5δ1−δ,5+5δ1−δ)\left(5 + \frac{5\delta}{1-\delta}, 5 + \frac{5\delta}{1-\delta} \right)(5+1−δ5δ​,5+1−δ5δ​)

where the first term is the payoff for the current game and the second term is expected value of all future games in today's value.(which is why there's an extra δ\deltaδ The strategy in future games is predetermined by the outcome of game 1 by construction (if either party defects, everyone will keep defecting).

Player A will have an incentive to defect if and only if:

6+δ1−δ≥5+5δ1−δ  ⟹  1≥4δ1−δ  ⟹  δ≤0.26 + \frac{\delta}{1-\delta} \geq 5 + \frac{5\delta}{1-\delta} \implies 1 \geq \frac{4\delta}{1-\delta} \implies \delta \leq 0.26+1−δδ​≥5+1−δ5δ​⟹1≥1−δ4δ​⟹δ≤0.2

in other words, only if he's so impatient as to discount all future earnings by 80%. Otherwise, cooperation is strictly preferable.

Intuitively, you basically have to threaten the other party with eternal punishment if you want to remove an incentive to cheat and unlock what is arguably a better off outcome for everyone.

Note

one might think: if we want the result to be strictly (mathematically) correct, shouldn't we consider what happens if say I cooperate for a few rounds and then defect, or in some other complicated attempt to get potentially more than 5 by defecting. Turns out no, because of one-shot deviation principle.

Back to Hell

Knowing the game theory analysis of repeated interactions, the description of hell becomes remarkably similar to what you'd want to envision to ensure all parties consider themselves to be part of an infinitely repeated game, and thus enable cooperation as a stable equilibrium.

Surprisingly, the analogy doesn't stop there. One-shot deviation principle only works in the assumption of perfect information: the other player knows if you defect. How do you ensure that in real life? Well, you add omniscience to the punisher.

One might notice that the conclusion of the infinitely repeated game depends on the actual values of how much you gain by unilaterally defecting (in the example above, you can gain 1) and how much are you punished (effectively −4-4−4). If the gain were stronger, defection would be optimal even for very patient actors. How do you solve the issue? You maximize the punishment, as much as possible, ideally you slap -float('inf'), which is ... well, permanent boiling.

Game theory has a very annoying result called the Folk theorem,called that way because many researchers intuitively felt it to be true long before it was proven mathematically which is basically a monkey wrench into any attempt to model what's going to happen.

Formally, if one defines μi\mu_iμi​ as the minmax payoff, i.e. the minimal possible value you can attain if you always anticipate the other party doing everything possible to hurt you, the following is true:

Definition

The Folk Theorem: let vvv be some payoff vector within the boundaries VVV of what's possible, such that payoff of every player viv_ivi​ is at least μi\mu_iμi​, then there exists δˉ∈(0,1)\bar{\delta}\in (0,1)δˉ∈(0,1), such that for every δ>δˉ\delta > \bar{\delta}δ>δˉ there exists a subgame perfect Nash equilibrium such that the average value of each player iii is viv_ivi​.

In other words, you can construct a strategy in which one party defects 90% of the time, but promises to cooperate 10% of the time, and for the other side it'll be mathematically superior to suck up (and get 1.1) over having pride and defecting constantly (and get 1).

So if you were to ask a game theorist: "make a prediction, what's going to happen?" he can use Folk theorem to say "idk, man", "anything goes", could be total cooperation, could be constant defection, could be you giving me your money 99% of the time.

So if there are infinite mathematically stable equilibria, how could humans possibly coordinate on a single strategy? It'd probably require a single rulebook, almost like a ... scripture.

The Supernatural Punishment Hypothesis

In the discussion above, we basically rederived the supernatural punishment hypothesis from first principles. First articulated by Johnson and Kruger in 2004:

during our history, the fear of supernatural punishment (whether real or not) deterred defectors and may therefore represent an adaptive trait favoured by natural selection. Supernatural beliefs are universal among human societies, commonly connected to taboos for public goods, so it seems plausible that they serve an important function in fostering cooperation.

Shariff and Norenzayan later showed that the belief in punishing gods made participants of the study less likely to cheat on a math exam (Mean Gods Make Good People: Different Views of God Predict Cheating Behavior). The sample size is low (N=61), but the only question you should ask yourself with social science is: does it make sense? Does it make sense that a fear of supernatural punishment is a prerequisite for building of large high trust cooperative societies? It's kinda obvious once you refer to the game theory math above.

There is also a 2015 study Broad supernatural punishment but not moralizing high gods precede the evolution of political complexity in Austronesia which indicates that it doesn't have to be a fully formed monotheistic religion, though again, the math doesn't care where the -float('inf') comes from.

More in Long Form

January 1, 2026

Vibe coding killed Cursor

Cursor is dying because cost-optimization forces models into tunnel vision. RAG agents fail because they only see what they search for. The superior workflow for 2026 is massive context windows (gemini 2.5 pro) and manual control. Stop letting agents hide code from the model.

coding-agentsDX
December 13, 2025

AI will transform science. Just not the way you think.

science is infrastructure-constrained, not ideas-constrained. an argument on why better tooling, not smarter models, is the missing link in the AI for Science revolution.

infrastructureacademiaphilosophyai-for-science

Featured Work

December 14, 2025

RetroCast and SynthArena: building the infrastructure for the next breakthrough in chemistry

A breakdown of Procrustean Bed for AI-Driven Retrosynthesis: A Unified Framework for Reproducible Evaluation

a deep dive into how the metrics for ai chemical synthesis are broken, rewarding models for "solving" routes with impossible, hallucinatory chemistry. we present the data and the open-source tools to fix it.

retrosynthesisinfrastructurebenchmarkingopen-sourcesoftware-engineering
September 25, 2025·J. Chem. Inf. Model.

Upsampling the Signal: Active Learning with Proxy Spaces

A breakdown of ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation

we built an ai steering system for molecular design that efficiently guides a generative model to invent potent, protein-specific molecules, even rediscovering existing drugs from scratch

active-learningdrug-discoverygenerative-aisoftware-engineering