This is a follow-up to some interesting discussion in the comments on my previous post, involving a coin-flipping probability puzzle, and a comparison of Bayesian and frequentist approaches to “solving” it. For completeness, here is the original problem:

You have once again been captured by pirates, who threaten to make you walk the plank unless you can correctly predict the outcome of an experiment. The pirates show you a single gold doubloon, that when flipped has some fixed but unknown probability of coming up heads. The coin is then flipped 7 times, of which you observe 5 to be heads and 2 to be tails. At this point, you must now bet your life on whether or not, in *two subsequent* flips of the coin, *both* will come up heads. If you predict correctly, you go free; if not, you walk the plank. Which outcome would you choose?

A typical puzzle-solver would (rightly) point out that necessary information is missing; we cannot determine the optimal action without knowing how the coin (and thus its bias) was selected. Instead of providing that information, I stirred the Bayesian vs. frequentist debate by showing how each might reason *without *that information, and come up with differing conclusions.

One of the reasons that I like this problem is that the “Bayesian vs. frequentist” perspective is a bit of a ruse. The frequentist in the original post computes the maximum likelihood estimate of the probability of the coin coming up heads… and makes a betting decision based on that estimate. The Bayesian performs a slightly more complex calculation, involving updating a prior beta distribution using the observed flips, doing some calculus… but then makes a similar “threshold” betting decision based on that calculation.

The key observation is that *any deterministic betting strategy whatsoever*, whether wearing a frequentist hat, a Bayesian hat, or a clown hat, may be specified as a function

mapping the number of heads observed in total flips to 1 indicating a bet *for* two subsequent heads, and 0 indicating a bet against. Neither the underlying statistical philosophy nor the complexity of implementation of this function matters; all that matters is the output.

Actually, we can simplify things even further if we only consider “monotonic” strategies of the form “bet *for* two heads if or more heads are observed, otherwise bet against.” That is,

where is the unit step function.

As mendel points out in the comments on the previous post, the frequentist MLE strategy is equivalent to (i.e., bet on two heads with “5 or more” observed heads), and the Bayesian strategy is equivalent to (“6 or more”). We can compare these strategies– along with the seven other monotonic strategies– by computing the probability of their success, as a function of the *unknown* probability of heads for each single coin flip. That is, the probability of surviving the game with strategy is

The following figure shows the results for all nine strategies:

The MLE strategy (green) and Bayesian strategy (blue) are certainly contenders for the best reasonable approach. However, neither of these, nor any other single strategy, *dominates* all others for *all* possible values of the unknown probability of heads in a single coin flip. In other words, whether the Bayesian or frequentist has a better chance of survival truly does depend on the information that we are explicitly *not* given.

Am I correct in reading this chart as bet against two heads when p is below .7 (and some change) and for two heads otherwise?

The players do not know p, that’s the whole point.

For a given p, the pirate makes n=7 throws, and communicates the results to the players. The players then bet according to the strategies listed on the right, depending on the number of heads that has been communicated to them.

This number and the probability of their eventual survival obviously depend on p.

The MLE strategy is “bet against heads if h/n < 0.7071". h/n is unlikely to be equal to p.

RIght. The threshold is actually p=1/sqrt(2), where all of the curves meet, since then two heads (probability p^2) and not two heads (1-p^2) are equally likely, so it doesn’t matter which way you bet.

But as mendel points out, if we knew the value of p, we wouldn’t need the observation of prior coin flips to make our decision. Since we *don’t* know p, we must choose one of the f_k strategies, not knowing “where on the x-axis” we actually are in the plot above.

Succinctly summarized.

A nice bit about the graph: it’s immediately obvious that for a certain choice of p, this is a fair game: when the chance for two heads is exactly 50%, the actual bet becomes irrelevant.

The captive might want to not optimize his average survival probability, but rather ensure that his worst-case survival rate does not fall below 50%. This requires a mixed strategy: if 5 heads are showing, do bet against 2 heads with a probability of (approximately) p_mix = 0.51225. (I was too lazy to do the analysis and just did it numerically.)

If you let p_mix = 1, you get the k=6 case, and setting it to 0 gets k=5, so we could just define the above mixed strategy to have k= 5.51225.

Does k/n vary with n, i.e. does it matter for this strategy if we get shown more or less than 7 throws? It seems likely that it does, because 5.51225/7=0.78746428571 > sqrt(2), and I’m expecting k/n to approach sqrt(2) as n goes to infinity, because then we’d know p precisely and should bet accordingly.

What is the best mixed strategy for n=1 or even n=0?

I knew it would be a good idea to explicitly emphasize *deterministic* betting strategy :), because you’re right, this is not really a problem in statistics, but in game theory. How did you estimate the Nash equilibrium mixed strategy? Because my method was not very efficient at all: I just “discretized” the pirates’ choice of p (I think I stopped at around dp=0.0001), and solved the corresponding two-player zero-sum game (with a 9×10001 payoff matrix). The resulting strategy agrees with your result, picking f_6 with probabiilty about 0.512, and f_5 otherwise.

I think you are correct that this optimal strategy depends on the number of observations in a complex (but asymptotic) way, based purely on a single additional data point: for n=24 (where the problem is “interesting” with 17 heads and 7 tails), the optimal mixed strategy is to “act like a Bayesian” with probability about 0.537, and frequentist otherwise.

My quick’n’dirty method: I compute P(win) for p=0.from 0.707 to 0.708 in increments of 10^-7 (I started with larger interval and larger increments) and grab the p for which P(win) is minimal. Since your plot shows nice, continuous graphs for P(win), I assume there is only one minimum, and for it to be 0.5, the position of the minimum has to be exactly on sqrt(0.5). By hand, I “numerically approximate” (i.e. trial and error) to narrow p_mix down.

17,537 / 24 = 0.7307083, that fits nicely.

Hmm, maybe I should modify my program to approximate properly, and then vary n, and get a plot Or actually do the derivatives and compute the minimum.

Via a circuitious route, I’ve gone back to your original motivation, ” when pressed to actually take an action, when money is on the table, everyone becomes a Bayesian”, and I believe now that most people would simply flip a coin. You might assume that is because they’re not up to the mathematics involved, but I’ve come up with an argument for this strategy that sounds rational.

I have been thinking today about why this problem of chance can be improved by adding more chance to it (p_mix). It feels like a paradox to me: my chance of survival depends on my choice, for which I have two alternatives, and the roll of the dice.

My immediate concern is not my survival probability (which only has meaning if we play this game a fair number of times, says the “frequentist”), but my wish to choose “right” in my singular situation.

But for each of the choices I have, there is a p where that choice is bad, and I don’t know what kind of p situation I’m in. Given that I’m in a situation with a fixed but unknown p, my choice, no matter by what strategy I arrive at it, can be good or bad.

So if I want to minimize the chances of choosing badly, I can flip a coin: then the probability of making a bad choice is guaranteed to not exceed 50%, no matter what p is.

The rest is luck.

Right now, I distrust this line of reasoning, but I can’t spot the catch.

You are right that you can minimize the chance– namely, 1/2– of losing by flipping a (fair) coin to decide whether to bet for/against two heads. However, with this strategy your *maximum* probability of winning is *also* 1/2. We can improve on this, without sacrificing the 50% worst case chance of survival, using the more complex mixed strategy that we have discussed.

A summary of the “reasonable” strategies discussed so far can be seen here, for n=7. Note that the mixed strategy (bet for with 6 or more with probability 0.512, or more precisely (5-sqrt(2))/7, otherwise bet for with 5 or more) shown by the red curve *dominates* the “flip a coin” strategy shown by the black curve; we can always do at least as well as the coin flip.

Yes, I know that.

My train of thought goes like this: assume a pirate with p(heads)=0.75 (unknown to me). To have a >50% chance of survival, I have to bet on heads. If I flip a coin now, I will do that in 50% of the cases, assuring an even 50% chance of survival.

At this point, the mixed strategy assures me of a chance of survival that is greater than 50%.

Then the pirate flips the coin 7 times, it comes up 4 heads and 3 tails (there was a chance of approximately 1/6 of that happening). This is the point where the “flip a coin” strategy still gives me 50% survival chance, but the mixed strategy has me doomed — well, if you define “doomed” as “less than 44%.chance of survival”.

It’s always difficult to reason statistically about singular decisions. If this was a game, and player P(irate) may chose a p, knowing the strategy of C(aptive), and then a few rounds would be played, then C should always choose a strategy that has a 50% worst case; this forces P to select p=sqrt(0.5); this brings the game to 50-50; and since this is the best C can do, given that P plays optimally, C might as well abandon strategy and flip a coin, with the same result.

Some mixed strategy k are very close to the integer values, e.g. k=14.0066 for n=19. Are my numbers wrong ( http://ideone.com/KyMBV0 ), or do we need to consider strategies that are mixed/randomized for more than one case?

I’ve analyzed n=0 this morning.

m ;= probability to bet on two heads

p := probability of random Bernoulli process

P(win)=(2m-1)p²+(1-m)

P(win|p=0)=1-m

P(win|p=1)=m

There’s always one extreme point at p=0, and that is a maximum or minimum depending on m, obviously.

P(win|m=0.5)=0.5

=> flipping a coin optimizes worst-case performance.

If p is uniformly distributed, always betting against 2 heads is probably what the Bayesian would do? But then the pirate would know that, and always chose a p “on the outside”? And the Bayesian would account for that, and adjust his prior accordingly? And then it comes down to outguessing each other, and then we’re going to flip a coin again to decide whether pirate has chosen p greater than sqrt(0.5) or smaller.

Also, one might argue that if the pirate uses a random process with p<0.5,he'll ask you to bet on two tails coming up, so the uniform prior should really only go from p=0.5 to 1?

The interesting question is this: when I have no information about the random process, getting my worst-case survival chance up to 50% is as simple as flipping a coin. When the pirate gives me more information (such as a run of 7 trials with 5 coming up heads), it required a complicated mixed strategy to achieve this.

The setup is really simple: assume the pirate has already flipped 9 similar coins, shows me the first 7 coins, and I am to guess at the remaining two coins. What is the frame for the recommendation that statistics makes?

"If you flip a coin, your chance of survival is 50%". Because my coin flip either coincides with the 2 hidden coins or not. But is that true? Once I have decided what to bet on, doesn't my survival chance depend on how the coins fell?

"If you follow the k= 5.51225 mixed strategy, your chance of survival is also at least 50%, but it may be higher than that." Is that true, given that the two coins are already flipped? (Does it change anything if they are not? The future could be pre-ordained.) It is not true, because for any specific instance, the mixed strategy will counsel betting one way or the other with a non-50% probability, and that could be the wrong counsel.

The first recommendation is indifferent to what the coins do, and the second can't be. With the decision, the 9-coin case collapses all probability, and loss or win become certainty.

So what are we saying, then? "If you flip a coin, our universe splits into two alternate universes, and you will survive in half of them." "If you follow a mixed strategy, the universe splits in alternate universes on each pirate coin flip and your mixed strategy decision, and you will survive in at least half of these universes and possibly more." This is equivalent to saying, "The chance that you will find yourself in a universe where you survive is P(win)."

If we do an a-priori, we have split our universe again: say, for any p there's an equal number of universes in which the pirate has chosen this p. Then we have split up into alternate universes at this point, and the statistical assertion concerns me *before* this split. The problem is, is that even of value to me, given that I'm already past this split point, down one of the branches of the "trousers of time" (Pratchett)? I just don't know which one. So should I be altruistic and adopt a strategy that will help more of my alternates survive, even if it disadvantages me? Or should I be considerate and give every one of my alternates (possibly myself!) a fighting chance, no matter which alternate universe they're in?

Clearly, I'm still confused about this.

The other question is, how do "Bayesians" approch the question of worst-case behaviour? Do they have tools for that?

@mendel: When you say that the mixed strategy’s chance of survival *changes* from “at least 50%” to “less than 44%,” that requires knowledge of the coin’s bias that the player doesn’t actually have. For example, consider exactly the same situation from another player’s perspective: he initially plans to execute his mixed strategy, confident in at least a 50% chance of survival. He then also observes 4 heads and 3 tails… but in this case, the pirate’s coin has P(heads)=0.25 (also unknown to the player). If the player sticks with his initial strategy, we “know” that he has a 94% chance of survival… unless he changes his mind and flips a fair coin, *dropping* his chances to 50%.

(Of course, that particular observation is even less likely, only about 5%. But the *player* doesn’t know that.)

Put another way, imagine that we aren’t playing for our lives, but instead have a chance to play this game *repeatedly*. (I’m wearing my frequentist hat now. :)) If player A uses the randomized mixed strategy every time, while player B repeatedly changes his mind at the last minute and reverts to the “safe” strategy of flipping his own fair coin, then in the long run player A will perform at least as well as B, *even* if the pirate uses a *different* coin each time. Indeed, A will perform *strictly better* than B unless the coin *always* has the critical bias p=1/sqrt(2).

“Some mixed strategy k are very close to the integer values, e.g. k=14.0066 for n=19. Are my numbers wrong ( http://ideone.com/KyMBV0 ), or do we need to consider strategies that are mixed/randomized for more than one case?Checking against my calculation of these same values, they look correct to me. For example, the case n=19 yields a mixed strategy of choosing f_15 with probability (122 – 85*sqrt(2))*7/1938.

I agree, the statistical implications are much clearer when we’re thinking about playing the game repeatedly.

When I read, “that requires knowledge of the coin’s bias that the player doesn’t actually have”, it became clear to me that the “chance of survival” isn’t objective in this case if it depends on the knowledge a subject has: I think what changes when crucial information is revealed is not the chance itself, but rather the player’s prediction of it.

Player belief is, of course, what the Bayesians are concerned with.

The AI people are also concerned not with making predictions for large numbers (e.g. playing a game repeatedly), but rather enabling AIs to make reasonable decisions in singular situations under uncertainty. They’ve been using “fuzzy logic” for half a century, and the wikipedia article on that has a section “Comparison to probability”. (I’m wondering if there are Finetti texts online.)

So the player is uncertain about his survival, and his prediction for survival expresses that uncertainty in a way. But what expresses how certain we are of these predictions? Much like in descriptive statistics, we need not only the mean, but also the variance, some measure of how much information we’re still missing could be helpful in deciding what to do: to either act on the prediction, or to expend effort on gathering more information first.