This is a follow-up to some interesting discussion in the comments on my previous post, involving a coin-flipping probability puzzle, and a comparison of Bayesian and frequentist approaches to “solving” it. For completeness, here is the original problem:
You have once again been captured by pirates, who threaten to make you walk the plank unless you can correctly predict the outcome of an experiment. The pirates show you a single gold doubloon, that when flipped has some fixed but unknown probability of coming up heads. The coin is then flipped 7 times, of which you observe 5 to be heads and 2 to be tails. At this point, you must now bet your life on whether or not, in two subsequent flips of the coin, both will come up heads. If you predict correctly, you go free; if not, you walk the plank. Which outcome would you choose?
A typical puzzle-solver would (rightly) point out that necessary information is missing; we cannot determine the optimal action without knowing how the coin (and thus its bias) was selected. Instead of providing that information, I stirred the Bayesian vs. frequentist debate by showing how each might reason without that information, and come up with differing conclusions.
One of the reasons that I like this problem is that the “Bayesian vs. frequentist” perspective is a bit of a ruse. The frequentist in the original post computes the maximum likelihood estimate of the probability of the coin coming up heads… and makes a betting decision based on that estimate. The Bayesian performs a slightly more complex calculation, involving updating a prior beta distribution using the observed flips, doing some calculus… but then makes a similar “threshold” betting decision based on that calculation.
The key observation is that any deterministic betting strategy whatsoever, whether wearing a frequentist hat, a Bayesian hat, or a clown hat, may be specified as a function
mapping the number of heads observed in total flips to 1 indicating a bet for two subsequent heads, and 0 indicating a bet against. Neither the underlying statistical philosophy nor the complexity of implementation of this function matters; all that matters is the output.
Actually, we can simplify things even further if we only consider “monotonic” strategies of the form “bet for two heads if or more heads are observed, otherwise bet against.” That is,
where is the unit step function.
As mendel points out in the comments on the previous post, the frequentist MLE strategy is equivalent to (i.e., bet on two heads with “5 or more” observed heads), and the Bayesian strategy is equivalent to (“6 or more”). We can compare these strategies– along with the seven other monotonic strategies– by computing the probability of their success, as a function of the unknown probability of heads for each single coin flip. That is, the probability of surviving the game with strategy is
The following figure shows the results for all nine strategies:
The MLE strategy (green) and Bayesian strategy (blue) are certainly contenders for the best reasonable approach. However, neither of these, nor any other single strategy, dominates all others for all possible values of the unknown probability of heads in a single coin flip. In other words, whether the Bayesian or frequentist has a better chance of survival truly does depend on the information that we are explicitly not given.