Analysis of adversarial binary search game

Introduction

Alice and Bob are at it again, this time playing a game called Binary Search. Alice begins by secretly selecting an integer key from 1 to n=100, then Bob makes a series of integer guesses, to each of which Alice responds indicating whether the guess is correct, or higher than her selected key value, or lower. Bob must pay Alice one dollar for each guess (including the final correct guess). How much should Alice be willing to pay to play this game with Bob?

This post is motivated by a recent blog post by John Graham-Cumming, discussed on Hacker News, about a similar game apparently used by Steve Ballmer in Microsoft job interviews. In the linked clip, Ballmer plays the role of Alice, and offers $6 to play the game. You’re Bob interviewing for a job: should you take the bet? Ballmer suggests that the answer is no, since as he puts it, “I [Alice/Steve] can pick numbers specifically that are hard for you [Bob] to get.”

As many have noted in the subsequent discussion, this comment certainly seems to allow for Alice/Steve picking numbers adversarially, that is, not necessarily uniformly at random. But presumably Bob is then allowed similar leeway, not being forced to execute textbook binary search that always guesses the midpoint of the interval of remaining possible numbers. So who has the greater strategic advantage? I’m not sure. I think this is a hard problem, as I’ll try to make the case below.

Solving smaller problems

I wrote some Mathematica code (on GitHub) to compute exact optimal strategies for Alice and Bob for smaller versions of this problem, up to n=13. Alice’s pure strategies are simply the set of n possible secret keys. Bob is more interesting; we can identify each possible pure search strategy as an in-order binary tree with n vertices. The following function enumerates these (it’s always a nice surprise when my favorite sequence turns up in another unexpected place):

(* List all possible search strategies with key space[n]. *)
strategies[n_] := strategies[n] = Module[
   {guess},
   If[n === 0, {{}},
    Join @@ Table[Join @@ Outer[
        {guess, #1, #2} &,
        strategies[guess - 1],
        guess + strategies[n - guess],
        1
        ],
      {guess, 1, n}
      ]]]

Armed with Alice’s and Bob’s pure strategies, we can compute the matrix of payoffs resulting from each possible strategy match-up, then solve the corresponding linear programming problem to compute the expected value of the game and the Nash equilibrium mixed strategies for each player.

Results and conjectures

The results for n up to 13 exhibit a consistent pattern suggesting that an optimal strategy for Alice is to pick numbers “almost uniformly” at random… but twice as likely to pick a number on the edge. More precisely, pick 1 or n each with probability 2/(n+2), and pick any other number with probability 1/(n+2). I think it is likely that this is an optimal strategy for Alice in the n=100 case as well. (However, it’s interesting to observe that the presumed “Ballmer strategy” of picking uniformly from the subset of numbers for which the normal binary search would yield the maximum number of guesses \lfloor\log_2 n\rfloor + 1 has the same expected payoff.)

Bob’s strategy, on the other hand, is harder to describe in simple terms. Let’s consider n=13 for a specific example. Of the 742,900 possible pure search strategies, an optimal mixed strategy for Bob is to select from just the following baker’s dozen with the indicated probabilities, where each search strategy is specified as the pre-order traversal of the corresponding binary tree:

Prob.      Strategy
------------------------------------------------
0.381579   7  3  1  2  5  4  6 11  9  8 10 13 12
0.131579   8  4  2  1  3  6  5  7 12 10  9 11 13
0.131579   6  2  1  4  3  5 10  8  7  9 12 11 13
0.0625     9  4  1  2  3  6  5  8  7 12 10 11 13
0.060855   5  2  1  4  3 10  8  6  7  9 13 12 11
0.055921   9  4  2  1  3  6  5  8  7 12 10 11 13
0.055921   8  3  1  2  5  4  6  7 12 10  9 11 13
0.055921   5  2  1  4  3 10  8  6  7  9 12 11 13
0.055099   6  2  1  4  3  5 11  9  8  7 10 13 12
0.003289   9  5  2  1  4  3  7  6  8 12 10 11 13
0.003289   5  2  1  4  3  9  7  6  8 12 10 11 13
0.001645   5  2  1  4  3 10  7  6  9  8 13 12 11
0.000822   7  2  1  4  3  5  6 11  9  8 10 13 12

In other words, roughly 38% of the time Bob should conduct the usual even-split binary search, guessing the midpoint 7 to start. However, sometimes Bob might start with an initial guess as low as 5 or as high as 9.

In the original game where n=100, I think it’s certainly the case that Bob should similarly be willing to start with guesses north of 51 or south of 50, but it’s unclear how extreme those departures might be.

Finally, what is Alice/Steve’s expected return in this game? Let’s set things up similarly to the Ballmer interview version of the problem, and suppose that Alice pays \lfloor\log_2 n\rfloor dollars to play, and wins back a dollar for each of Bob’s guesses. Then the following figure shows Alice/Steve’s expected return as a function of the number n of possible keys to pick from.

My suspicion is that this sawtooth behavior persists, continuing to straddle zero expected return, so that it seems difficult to predict whether the particular game at n=100 has Alice in the red or the black, when both she and Bob use truly optimal strategies. This simple brute force computational approach certainly won’t work: the matrix of payoffs has 100 rows and roughly 10^57 columns.

Edit 2024-09-09: I was referred to Konstantin Gukov‘s very interesting article about this game, where they compute an explicit strategy for the n=100 case that proves that Alice/Ballmer’s expected return is indeed negative in that case. They do this by restricting Bob the Guesser’s set of available binary search strategies, from all C(2n,n)/(n+1) \approx 10^{57} possible strategies to just 586 carefully chosen ones, so that it is feasible to solve the corresponding linear programming problem. Since Alice’s expected return is negative against this handcuffed version of Bob, it must also be against a truly optimal guesser.

Gukov’s selection of subset of strategies is indeed well-chosen. I ran Gukov’s Python code to generate their search strategies for all n up to 100, then computed the resulting expected return in each case. (Recall that we are assuming that Alice pays \lfloor\log_2 n\rfloor dollars to play, so that we can compare performance across a range of key space sizes.) The results are shown in the figure below, with Ballmer’s return against Gukov’s restricted guesser shown in red, and the performance against an optimal guesser shown in blue (essentially a reproduction of the earlier figure above).

This gives more evidence for our earlier speculation about how the advantage varies with n, with a clearer picture of the pattern: it seems that Ballmer would have been safest by offering to play the game choosing numbers from 1 to 127– or 1 to 63, or 1 to 31, etc.– instead of 1 to 100. Powers of two are the worst for Ballmer, but as the key space increases from there, it eventually becomes advantageous again… until reaching the next power of two.

Robot simulator updated to VPython 7.6.5

Years ago, I wrote a Python module using the VPython 3D graphics library to simulate a three-wheeled robot modeled after the Parallax Scribbler, that could execute Logo-like “turtle” motion commands. It worked really well as an educational tool for programming students, since using VPython’s graphics meant that you could manipulate the robot even from the interpreter prompt, a line at a time, and rotate and zoom the 3D view even while the robot is moving. No window event loop, no multithreading to think about.

I soon added a stall sensor and proximity sensors for detecting and navigating around obstacles, such as the walls of a maze as shown in the screenshot below.

from vturtle import *
robot = Robot(obstacles=maze())

In the intervening 14-ish years, VPython has undergone a significant refactoring. I’ve recently updated the robot simulator code accordingly, so that it now works in the currently latest VPython version 7.6.5. The code is on GitHub.

Another whodunit logic puzzle

Once again, you are a detective investigating a robbery, with a cast of suspects having made the following statements:

  • Alice says, “Bob could claim that I did it.”
  • Bob says, “Charlie could claim that I did it.”
  • Charlie says, “David could claim that I did it.”
  • David says, “Zoe could claim that I did it.”
  • Zoe says, “At least one of us is innocent.”

You do not know which, nor even how many, of the suspects were involved in the crime. However, you do know that every guilty suspect is lying, and every innocent suspect is telling the truth. Which suspect or suspects committed the crime?

Problem 2: What if instead Zoe had said, “At least two of us are innocent”?

(As usual, I like these both as math problems and as programming problems for students. The added wrinkle here is the need to encode the somewhat “meta” nature of most of the statements, talking about the feasibility of someone else’s hypothetical assertions.)

Strongly solving Gobblet Gobblers using retrograde analysis

Introduction

Despite my nephews and nieces growing older, it seems like I continue to encounter mathematically interesting children’s games. I recently played Gobblet Gobblers, a deceptively complex variant of Tic Tac Toe marketed for “ages 5 and up.”

The rules are simple:

  • Like Tic Tac Toe, the game is played on a 3×3 board, with the winner being the first player to get 3 of their pieces in a row, column, or diagonal.
  • Each player starts with 6 pieces: 2 pieces in each of 3 sizes. Each piece has a hollow dome shape, so that a smaller piece may be covered by a larger piece played on top of it.
  • At each player’s turn, they may either play a new piece, or move one of their pieces already on the board (possibly uncovering a smaller piece underneath).
  • Whether playing a new piece or moving one already played, the piece may be played/moved either to an empty square, or over a smaller piece (of either player’s) already on the board.

The neat wrinkle is being able to move pieces around without having to add a new piece to the board with each move. This not only increases the game’s state space complexity– there are 341,024,631 possible states up to board and player symmetry– but also complicates a straightforward minimax search, which must deal with the possibilities of repeated game states, and game tree paths hundreds of moves deep even without ever repeating a game state.

This post describes my approach to computing optimal strategy using retrograde analysis— I wasn’t familiar with this term from the chess world, I would have just called it dynamic programming with some extra sauce. The C++ implementation is on GitHub, with which you can play the game, show optimal moves for either player at any point in the game, and “rewind” to experiment with alternate playouts.

Playing the game and its variants

When you run the program, you are first asked for three parameters specifying the supported rule variations:

  • The number of piece sizes (1, 2, or 3)
  • The number of pieces of each size (1 to 9)
  • A 0-1 Boolean indicating whether pieces may be moved after initially played.

For example, (3,2,1) corresponds to Gobblet, while (1,5,0) corresponds to plain old Tic Tac Toe, as shown below:

Enter rules (num_sizes, num_per_size, allow_move): 1 5 0
Searching... found 765 states.
Solving... solved 614 win/loss states.
| |
| |
0| 1| 2
------|------|------
| |
| |
3| 4| 5
------|------|------
| |
| |
6| 7| 8
Player 1, enter move (-size | start, end), or (0, 0) for best move, or (-1, -1) to undo move: -1 4

Solving Tic Tac Toe is trivial and quick, and most other rule variants take at most a couple of minutes to “think” before beginning the game. But Gobblet (3,2,1) is by far the most complex variant, taking about 25 minutes on my laptop to compute optimal strategy… but that’s a one-time cost, and includes saving the strategy to disk, so that you can quickly reload it to play again later without having to sit through the calculation again.

A move is specified by a pair of integers, indicating the piece to play-or-move and where to move it, with non-negative values 0 through 8 indicating a square on the board as labeled above. A negative value (-1, -2, or -3) indicates playing a new piece of the corresponding negated size. Thus, the example move (-1, 4) above indicates Player 1 (X) starting the game by playing in the center square.

Hidden pieces

Although technically a game of perfect information, in practice there is a memory component: as the rules warn, “The first player to align 3 pieces in a row wins, so when you wish to move a piece try to remember what is under it!” The displayed board preserves this practical difficulty, only showing the topmost piece in each square, indicating its owner (X or O) and size (1, 2, or 3). In the example below, O responds to X’s opening move by covering the smaller X1 with their “medium” sized piece O2:

Enter rules (num_sizes, num_per_size, allow_move): 3 2 1
Loading from gobblet_3_2_1.dat
| |
| |
0| 1| 2
------|------|------
| |
| |
3| 4| 5
------|------|------
| |
| |
6| 7| 8
Player 1, enter move (-size | start, end), or (0, 0) for best move, or (-1, -1) to undo move: -1 4
| |
| |
0| 1| 2
------|------|------
| |
| X1 |
3| 4| 5
------|------|------
| |
| |
6| 7| 8
Player 2, enter move (-size | start, end), or (0, 0) for best move, or (-1, -1) to undo move: -2 4
| |
| |
0| 1| 2
------|------|------
| |
| O2 |
3| 4| 5
------|------|------
| |
| |
6| 7| 8

Optimal strategy

Gobblet is a win for the first player in 13 moves (plies), meaning that optimal strategy does indeed involve moving some pieces around the board. X’s first move can be anything but a medium piece (“X2”); the only caveat is that X3 to an edge, or X1 to the center or corner, allows the second player O to delay losing for 15 moves instead of 13.

(There is a Racket version of the game, with a much better user interface, that appears to implement optimal first player strategy in its “smart mode,” but seemingly only for its specific choice of principal variation, beginning with X3 to the center. For example, manually playing X1 to the center instead, then using this implementation’s optimal strategy for the second player– enter 0 0 for the move at any point to see a best move– against subsequent Racket auto-play moves for the first player, results in a win for O.)

If we prohibit moving pieces already played– that is, rule variant (3,2,0), with the next largest space complexity of 148,599,441 states– then the game is a draw, where X’s first move must be either X3 to the center or X1 anywhere.

The only other rule variants that are not a draw are (2,3,0) and (2,3,1)– that is, still six pieces per player, but in two sizes of three pieces each instead of the other way around– both of which are also first player wins, in 9 and 11 moves, respectively.

Game state bitboards

This was a really fun programming exercise. If there is any moral to this story, I think it is to remember that the standard library was written for everyone’s problems, not for your problem.

We use a std::uint64_t to represent each game state as a 54-bit bitboard: each of the 3×3=9 squares requires 6 bits, 2 bits for each piece size (332211), indicating whether there is a piece of that size in the corresponding square:

  • 00 indicates no piece of that size.
  • 01 indicates a piece of the player to move.
  • 10 indicates the opponent’s piece.

Gobblet is symmetric, allowing us to reduce the size of the state space by “swapping seats” after each move, so that the state is always interpreted with respect to the player to move next. We can further reduce the number of states by roughly another factor of 8 with the usual trick of rotating and reflecting the board, and using the minimum value among all resulting states.

I think I got nerd-sniped here a bit: rotating the board was awkward using this bitboard encoding, and renumbering the squares to make rotations simpler just made reflections even worse. I finally settled instead on alternating between two reflections– one about the middle row, the other about the off-diagonal– chosen to minimize the number of “shift-and-mask” bitwise operations in each.

As a first ding on the C++ standard library, I found it interesting that the final implementation, shown below, is different at all, let alone noticeably faster, than what I initially started with using the clearer min_s = std::min(min_s, s):

State canonical(State s)
{
    State min_s = s;
    s = flipud(s);        min_s = s < min_s ? s : min_s;
    s = antitranspose(s); min_s = s < min_s ? s : min_s;
    s = flipud(s);        min_s = s < min_s ? s : min_s;
    s = antitranspose(s); min_s = s < min_s ? s : min_s;
    s = flipud(s);        min_s = s < min_s ? s : min_s;
    s = antitranspose(s); min_s = s < min_s ? s : min_s;
    s = flipud(s);        min_s = s < min_s ? s : min_s;
    return min_s;
}

Chris Wellons’s MSI hash map

Retrograde analysis involves mapping every possible game state to its corresponding win/loss value (more on this below). To do this, I started out with a std::map (or std::unordered_map if you like), starting development with smaller versions of the game that executed more quickly, such as rule variant (2,2,1), with only 252,238 states, then on to variant (2,3,0), with 1,964,786 states.

But with this approach, available memory was dwindling fast. At the time, before I knew Gobblet’s state space complexity exactly, a reasonable back-of-the-envelope estimate was

\frac{1}{8}(\sum\limits_{a=0}^2 \sum\limits_{b=0}^2 {9 \choose a}{9-a \choose b})^3

or well over 300 million states. I just couldn’t afford the overhead of red-black tree pointers, node colors, or hash bucket linked list pointers in the standard library’s map implementations.

Enter the “mask, step, index” (MSI) hash map described by Chris Wellons [1]. This worked like a charm, being not just lean but fast. Here is the entire implementation, in place of std::map, squeezing all 2.5 GB worth of game states into a 4 GB array:

const int HASH_EXP = 29;
const std::size_t HASH_MASK = (1ull << HASH_EXP) - 1;
const State STATE_EMPTY = 0x3; // 0x0 is the (valid) initial board state
const State STATE_MASK = (1ull << 54) - 1;
std::vector<State> hash_map(HASH_MASK + 1, STATE_EMPTY);

// Return pointer to hash map entry for given game state.
State* lookup(State s)
{
    std::uint64_t h = hash(s);
    std::size_t step = (h >> (64 - HASH_EXP)) | 1;
    for (std::size_t i = h;;)
    {
        i = (i + step) & HASH_MASK;
        if (hash_map[i] == STATE_EMPTY || (hash_map[i] & STATE_MASK) == s)
        {
            return &hash_map[i];
        }
    }
}

That’s it. Or almost– we also need the hash(s) function used to compute the starting index and step size when skipping past collisions. I ended up using SplitMix64, which I was expecting to be an overly big hammer, but turned out to result in significantly faster execution than all of the simpler hashes that I tried. I wonder if this is due to the rather heavy loading (we’re using over 63.5% of the allocated space)?

At this point, we really have just a hash table. We need a hash map, from the 54-bit state as the key, to the win-or-lose value of the game played optimally from that point. This is where the STATE_MASK above comes in: we will pack the not-yet-computed value for the 54-bit state key into the remaining upper 10 bits of the std::uint64_t.

Retrograde analysis

Which brings us, finally, to the algorithm for computing the values of all possible game states, implicitly also computing the optimal strategy from any state (by choosing the move that leads to the highest value state). To do this, we map each game state to an ordered pair (value, moves), where value indicates +1 for a win, -1 for a loss, or 0 for a draw for the player to move, assuming optimal play from that point; and moves indicates the “depth to win,” or number of moves (plies) from that point to the end of the game assuming optimal play.

We “solve” for these values using retrograde analysis, starting with the terminal win or loss states that are initialized with their endgame value and moves=0, and working backward from there, identifying immediately previous game states and propagating solved win/loss values, incrementing moves accordingly.

In the process, I ended up using a slightly weird encoding of (value, moves) into the upper 10 bits of the std::uint64_t for each game state. Consider the function to compute the best move from a given game state:

Move best_move(State s)
{
    Move best{};
    State max_next = 0;
    for (auto& m : get_moves(s))
    {
        State next = *lookup(canonical(swap(move(s, m))));
        if (next > max_next)
        {
            max_next = next;
            best = m;
        }
    }
    return best;
}

In other words, optimal strategy is simply to choose the move leading to the state with the largest possible std::uint64_t value, which is in turn dominated by the (value, moves) pair encoded in the upper 10 bits. The value is in the highest 2 bits, and the moves are in the lower 8 bits… but negated (and offset by one) using two’s complement for losses. That is:

  • 0100000000 indicates (+1, 0), an immediate win for the player to move.
  • 0100000001 indicates (+1, 1), the player to move can win in 1 move.
  • 0100000010 indicates (+1, 2), the player to move can win in 2 moves.
  • 10######## indicates (0, #), game is a draw.
  • 1111111101 indicates (-1, 2), the player to move will lose in 2 moves.
  • 1111111110 indicates (-1, 1), the player to move will lose in 1 move.
  • 1111111111 indicates (-1, 0), an immediate loss for the player to move.

If we are computing the best move by choosing from among these values as next possible states, the “player to move” is our opponent. Thus, the ideal move would lead to the largest value with high order bits equal to 1111111111, an immediate loss for our opponent– or barring that, a loss for our opponent in the fewest possible moves. If we can’t guarantee a win, then we want to force a draw. And if we can’t force a draw, then our opponent has a winning strategy, but we want to delay the win for as many moves as possible.

Reference:

  1. Wellons, C., The quick and practical “MSI” hash table [HTML]

Another coin flip puzzle

Alice and Bob are playing a game. They flip a fair coin 100 times, noting the resulting sequence of 100 heads (H) or tails (T). For each HH in the sequence, Alice scores a point; for each HT, Bob scores a point. For example, given the sequence THHHT, Alice scores 2 points and Bob scores 1 point. Whoever scores the most points wins the game. Who is most likely to win?

I was referred to this problem, which I suspect originally came from this recent tweet. I reeallly like this problem… but not just for the usual reasons of involving probability and having a very unintuitive solution. (Alice’s expected total score is exactly equal to Bob’s expected score, so maybe it’s most likely a tie? On the other hand, Alice’s maximum score is 99, since her pattern can overlap with itself, while Bob can score at most 50 points, so maybe Alice is most likely to win?)

I also like this problem because I didn’t even recognize it– as a special case of a problem I had not only seen before, but discussed here before.

Because I didn’t recognize it, I ended up solving it in a different way, this time using generating functions. It’s a nice obligatory follow-on puzzle to do this in a one-liner in Mathematica or Wolfram Alpha, but it’s not too bad in Python either using SymPy:

from sympy import symbols, expand
n = 100
x = symbols('x')
h, t = 0, 1
for k in range(n):
    h, t = expand(h * x + t), expand(h / x + t)
g = h + t
alice, bob, tie = (sum(g.coeff(x, k) for k in range(1, n)),
                   sum(g.coeff(x, k) for k in range(-n, 0)),
                   g.coeff(x, 0))

This counts the number of equally probable outcomes where Alice wins, Bob wins, or they tie.

Inverse Coupon Collector’s Problem

This is motivated by a recent blog post by John D. Cook [2] discussing the problem of estimating the number s of words in an author’s vocabulary, given an observation of the number k of distinct words in the author’s work of length n words.

The blog post references a paper by Brainerd [1], suggesting that it gives an estimate of s as the root of the following equation:

\sum\limits_{j=0}^{k-1}\frac{s}{s-j}=n

However, although Brainerd discusses several different estimation methods, I was unable to find any mention of this particular one, either explicitly or in derivation.

At any rate, the motivation for this post is simply to note that this is essentially the “inverse” of the coupon collector’s problem, discussed several times here before, which is usually stated in terms of expectation: if each purchased box of cereal contains one of s types of coupon, what is the expected number n of boxes we must purchase to obtain all s types of coupon?

This expected value is the above sum evaluated at k=s. More generally, even when k<s, the above sum is the expected number of boxes we must purchase to first observe k distinct coupon types. This suggests a nicely intuitive, although not rigorous, argument for using the above equation to solve the “inverse” problem: if we have observed k distinct coupon types among n purchased coupons, then a reasonable estimate for the unknown s is that value for which our number of observations n is the expected number of coupons needed to first observe k distinct types.

It’s interesting that this actually works. Sort of, anyway. That is, Dawkins [3] solves this inverse problem by computing the maximum likelihood estimate for s, which turns out to be one of the integer values on either side of the (generally non-integral) root of the equation above.

References:

  1. Brainerd, B., On the Relation between Types and Tokens in Literary Text, Journal of Applied Probability, 9(3) September 1972, p. 507-518 [JSTOR]
  2. Cook, John D., Estimating an author’s vocabulary [HTML]
  3. Dawkins, B., Siobhan’s Problem: The Coupon Collector Revisited, The American Statistician, 45(1) February 1991, p. 76-82 [JSTOR]

Lemonade Stand

This article was discussed on Hacker News.

Introduction

This was a fun nostalgia trip. But it ended up being an attempt to collect and preserve some retro computing history as well… and I also learned– or forgot and remembered?– another interesting quirk of that programming environment from nearly 45 years ago.

In 1981, I had a friend down the street who had an Apple II+. Later that year, my parents hired a tutor for programming lessons. I feel like I owe much of the subsequent course of my life to that time with “Mrs. Cathy,” who had several computers in her home: I remember an Apple II+, a VIC-20, and an IBM PC.

But it wasn’t until 1983 that we got our own home computer, an Apple //e. One of the programs that came with the computer, Lemonade Stand, is shown in the screenshot below.

The game is pretty simple: each morning, based on the weather forecast for the day, you decide how many glasses of lemonade and how many advertising signs to make, and how much to charge for each glass. These decisions and the day’s weather affect demand, you sell some number of glasses, rinse and repeat.

It’s about as fun as it sounds… but I remember being intrigued by the prospect of “reverse engineering” the game. The Applesoft BASIC source code contains the formula for computing the demand and the resulting profit as a function of the player’s inputs and the weather. We can, in principle, “solve” for the inputs that would maximize the expected profit. This post is motivated by my failed attempt to do this.

History

All of the code discussed here, both old and new, is available on GitHub. Let’s start with the old code: lemonade.bas is the Applesoft BASIC source by Charlie Kellner, from the .dsk image of my original 5.25″ diskette. I believe this is the earliest Apple version of the game– for completeness I’ve also included source extracts of two subsequent released versions (lemonade_v4.bas and lemonade_v5.bas), but for our purpose here their behavior is identical, with cosmetic updates to the graphics and sound by Drew Lynch, Bruce Tognazzini, and Jim Gerber.

For comparison, I’ve also extracted the source for selll.bas (as in “sell lemonade”), from the Minnesota Educational Computing Consortium (MECC, of Oregon Trail fame) Elementary Volume 3 disk A704. Although appearing later in 1980, I think this version is somewhat closer to the actual original version of the game written by Bob Jamison in 1973 for the UNIVAC mainframe. That original 1973 source seems to be lost to the mists of antiquity… but I think selll.bas retains more of its parentage than the Kellner version, which removed some helpful comments explaining the terms in the demand function, mistranslated some variable names and GOTO line numbering resulting in unreachable code, etc.

Maximizing profit

Now for the new code: lemonade_strategy.py implements the profit-maximization described above. It’s pure brute force: given the objective function below,

def expected_profit(glasses, signs, price, cost, r1=1, p_storm=0):
    n1 = 54 - 2.4 * price if price < 10 else 3000 / (price * price)
    demand = min(int(r1 * n1 * (2 - math.exp(-0.5 * signs))), glasses)
    revenue = demand * price
    expense = glasses * cost + signs * 15
    return ((1 - p_storm) * revenue - expense, revenue - expense,
            glasses, signs, price)

where r1 and p_storm are functions of the weather forecast, we simply evaluate the expected profit for all feasible inputs and find the maximum, with the wrinkle that the feasible region– how many glasses and signs we can possibly make– also depends on our current assets:

max(expected_profit(glasses, signs, price, cost, r1, p_storm)
    for glasses in range(min(1000, assets // cost) + 1)
    for signs in range(min(50, (assets - cost * glasses) // 15) + 1)
    for price in range(101))

I ended up in this rabbit hole in the usual way, after discussion with a student. My objective wasn’t really to “solve” this decades-old game, but just to give an explicit example of how much slower computers were then: this same brute-force approach in Applesoft BASIC, with the Apple //e’s 1 MHz processor, took nearly 4 hours to compute optimal strategy just for the first sunny day in Lemonsville. That’s over 200,000 times slower than the above Python code, which runs in a fraction of a second on my decade-old 2.6 GHz laptop– that is, with a clock speed “only” 2,600 times faster.

Changing the weather

You can play an Apple version of the game in a browser at the Internet Archive, linked from the game’s Wikipedia page. I did just that, with my Python script running alongside to compute optimal strategy… and it didn’t work.

The first day’s weather was “hot and dry.” With $2.00 and each glass of lemonade costing 2 cents, the optimal strategy should have been to make 74 glasses, 3 advertising signs, and charge 12 cents per glass, for an expected profit of $6.95. But when I entered these values in the game, I only sold 37 glasses, for a lousy $2.51 profit.

What was going on? It turns out that there are many different copies of Lemonade Stand out there, both at the Internet Archive as well as in various Apple II disk image archives… but the particular one linked from Wikipedia is unique among all of these in that it contains an important modification to the source code appearing in none of the other copies, highlighted below:

400  REM   WEATHER REPORT
410 SC = RND (1)
420 IF SC < .4 THEN SC = 2: GOTO 460
430 IF SC < .7 THEN SC = 10: GOTO 460
440 SC = 7
460 REM IF D<3 THEN SC=2

(You can play the original unmodified version at the Internet Archive here.)

I don’t know where these changes came from. And I believe these are indeed changes from the original release (i.e., not the other way around). But just these three edits were enough to keep my optimal strategy calculation from producing the correct result. This was strange at first glance, since the above section of code merely computes the randomly generated weather forecast for the day, which is an input to the subsequent profit calculation. In the above modified version, the weather forecast is sunny (SC=2) 40% of the time, cloudy (SC=10) 30% of the time, and hot and dry (SC=7) 30% of the time. The original threshold values are 0.6 and 0.8, corresponding to probabilities (0.6, 0.2, 0.2) of (sunny, cloudy, hot and dry).

The REM in line 460 is a “remark” comment, effectively disabling the original behavior of forcing sunny forecasts for the first two days (when D=1 and D=2) of the game. But again, why should this matter? This is all input to the profit calculation, which remains identical to the original, so what is causing the difference in behavior?

Undeclared variables

To make it easier to poke (!) around this problem, I wrote lemonade.py, which is my attempt at a shot-for-shot remake of the original lemonade.bas in Python, minus the graphics and sound– that is, just a canonical mode text interface, but an otherwise line-for-line direct translation with identical behavior.

That translation process was an interesting exercise, almost a logic puzzle, converting the unstructured GOTO-based flow control of Applesoft BASIC into structured Python– I was surprised at the end of it all to find that only a single defined function and extra temporary Boolean variable were needed, despite all of the GOSUBs.

That translation helped me to understand the cause of the different behavior I was seeing in the original. The problem is in the following section of code, with the offending lines highlighted:

700  REM   AFTER 2 DAYS THINGS CAN HAPPEN
710 IF D > 2 THEN 2000
...
2000 REM RANDOM EVENTS
2010 IF SC = 10 THEN 2110
2030 IF SC = 7 THEN 2410
...
2410 X4 = 1
2430 PRINT "A HEAT WAVE IS PREDICTED FOR TODAY!"
2440 R1 = 2
2450 GOTO 805

The variable R1 is an input to the profit function. But although the code change in line 460 above allows the first day’s displayed forecast to be hot and dry– or anything else other than sunny– line 710 still prevents actually setting R1 to reflect that non-sunny forecast when computing profit.

The effect is that, for the first two days, the weather forecast might indicate “hot and dry,” or “cloudy,” etc., but it’s really still sunny for those first two days, just like the original version.

All of which leads to the actual motivation for this post: when I initially tried to reproduce the effect of the three changed lines in the Python version, I got a “NameError: name 'D' is not defined“, due to commenting out line 460 in the BASIC version above: the variable D is never declared or assigned a value prior to line 460.

Python can’t handle this… but apparently Applesoft BASIC can. Variables do not need to be declared before use: when evaluating an expression containing a variable not previously assigned, the value of that variable defaults to zero (or the empty string for string variables).

This behavior is actually depended on in a few other places in the game as well. I admit that I think this is new to me– in all of those early days of learning programming, I don’t remember ever being aware of nor taking advantage of this “feature.” And it’s not a feature of the original Dartmouth BASIC. And I couldn’t find documentation of it anywhere in the books, manuals, and magazines that I have… but I did find it written down in the original “Blue Book” Applesoft Reference Manual. From page 9: “Another important fact is that if a variable is encountered in a formula before it is assigned a value, it is automatically assigned the value zero. Zero is then substituted as the value of the variable in the particular formula.

Exact analysis of Dreidel game

Introduction

A few weeks ago I read an interesting article by Hillel Wayne [6], discussed on Hacker News, describing analysis of Dreidel, a game traditionally played during the Jewish holiday of Hanukkah. Each player starts with the same number of tokens– often foil-wrapped chocolate coins– and antes one coin each into the center pot. Players take turns spinning the dreidel, with one of four equally likely outcomes:

  • Nun (“nothing”): player does nothing, turn passes to next player on the left.
  • Gimel (“everything”): player takes all of the coins in the pot. With the pot now empty, everyone, including the current player, antes one coin back into the pot.
  • Hei (“half”): player takes half of the coins in the pot. If there are an odd number of coins in the pot, the player rounds up, taking the larger half. If this empties the pot, i.e. if only one coin was in the pot, everyone– again including the current player– must ante one coin back into the pot.
  • Shin (“put”): player puts one coin into the pot.

A player is out of the game if they cannot put a coin into the pot when required. Play continues until one player remains, who wins all of the coins.

The motivation for this post is to try to improve upon past analyses of Dreidel, which involve various limitations, simplifying assumptions, and/or approximations in their solution; and to present exact probabilities and expected values for the game, that raise an interesting conjecture that appears to be new. The Python code is available on GitHub.

Another token-passing game

What initially caught my attention to this problem was how much it has in common with the Left-Center-Right dice game discussed in the previous post. Both Dreidel and Left-Center-Right are “games” with no strategy, with all “moves” dictated solely by the random outcome of a spin/roll. Both games involve passing tokens (coins) around among the players and a center pot, with the game ending with one player winning all of the coins.

We can thus apply essentially the same method to analyzing Dreidel, solving a sequence of linear systems of equations, indexing matrix rows and columns by ranking corresponding weak compositions of the coins in play… but with one modification. In Left-Center-Right, the number of players remains fixed throughout the game, but the number of coins in circulation is always decreasing; once a coin enters the pot, it stays there. In Dreidel, the opposite is true: all of the coins remain in circulation throughout the game, entering and leaving the pot, but the number of players decreases as players go “out” when they are unable to put a coin into the pot. Thus, in Dreidel the sequence of linear systems is indexed by the changing number of players, instead of by number of coins.

(As an aside, since I’m always on the lookout for nice exercises for students in mathematics or computer science: show that this same row/column indexing approach used in Left-Center-Right isn’t “wasteful” when applied to Dreidel. That is, show that every distribution of coins among the players, with a positive number of coins in the pot, is a reachable state of the game.)

Rule variations

Although this analysis assumes the specific rules of the game as described above, there are several variations supported in the Python code. For example, the article [6] that motivated this post seems unique in specifying min_coins=1, requiring players to always have a positive minimum number of coins, going out immediately if they ever have zero coins. We instead follow the convention assumed in all of the other references that min_coins=0: a player is only out when they can’t pay.

Another variation deals with taking half the pot when there are an odd number of coins. Robinson and Vijay [4] specify half_odd=0, rounding down, with the player taking the smaller half of the pot; but we follow the elsewhere-convention of half_odd=1, rounding up.

Results and conjecture

References [1] and [4] observe that Dreidel takes a long time to play, with a number of spins that varies (at most) quadratically as the number of players. The exact expected length of the game is shown in the figure below.

Expected number of turns (i.e., dreidel spins) and hours to complete a game, vs. number of coins each player starts with.

Assuming 8 seconds per spin as in [1], a four-player game would take almost 2 hours on average to complete if each player starts with 10 coins, and nearly 4.5 hours if each player starts with 15 coins.

What I found most interesting about Dreidel, however, is how the advantage varies with player position. All of the references below suggest that the first player to spin has an advantage (although based on simulation and/or various simplifying approximations), and that things get worse as we move around the table, with the last player to spin having the least probability of winning.

But if we compute these probabilities exactly, and present results as the expected return, or the average number of coins “gained” relative to initial wagered coins at the start of the game, we see the following interesting behavior:

Expected net coins won vs. number of starting coins wagered, for the starting player (blue) and for the last player (red).

I’ve sacrificed completeness to try to preserve clarity, only showing the first player’s return in blue, and the last player’s in red; the other players’ returns vary monotonically between these two extremes.

This figure suggests that a player’s expected return seems to very quickly approach a non-zero limit as the starting number of coins increases, that depends on the position of the player around the table. For example, in a game with four players, the first player to spin can expect to net about half a coin per game, no matter whether the players start with 2 coins each or 15, while the last player can expect to lose about half a coin per game.

References:

  1. Blatt, B., Dreidel: The classic Hannukah game is painfully slow. It’s time to speed it up, Slate, December 2014 [HTML]
  2. Feinerman, R., An Ancient Unfair Game, The American Mathematical Monthly, 83(8) October 1976, p. 623-625 [JSTOR]
  3. Lucas, S., Who wins when playing Dreidel, 2nd Mathematics Of Various Entertaining Subjects (MOVES) Conference, Museum of Mathematics, New York, August 2015 [PDF]
  4. Robinson, T. and Vijay, S., Dreidel lasts O(N^2) spins, Advances in Applied Mathematics, 36(1) January 2006, p. 85-94 [arXiv]
  5. Trachtenberg, F., The Game of Dreidel Made Fair, The College Mathematics Journal, 27(4) September 1996, p. 278-281 [JSTOR]
  6. Wayne, H., I formally modeled Dreidel for no good reason [HTML]

Analysis of Left-Center-Right dice game

Introduction

This game was surprisingly fun to play. Each player starts with three tokens– buttons or coins if you’re playing with kids, or dollar bills or even higher denominations if the adults want to make it more exciting. On each player’s turn, they roll one die for each of their tokens (up to a maximum of three dice rolled):

  • If a 1 is rolled, pass the token to the player’s left.
  • If a 2 is rolled, put the token in the “pot” in the center of the table.
  • If a 3 is rolled, pass the token to the player’s right.
  • If a 4, 5, or 6 is rolled, keep the token.

If a player has three or more tokens on their turn, only three dice are rolled. If a player has just one or two tokens, then they roll one or two dice, respectively. Most importantly, if a player has no tokens, they are not out of the game, since they might later receive tokens during a neighboring player’s turn; play simply passes to the next player with at least one token.

Play continues until only one player remains with any tokens. That player wins the game, collecting all of the tokens or bills or whatever, including those in the center pot.

I say “surprisingly fun” because, as was quickly observed during play, this is a “game” of pure chance. There is no skill involved; there are no decisions to be made. All of the token-passing game mechanics are merely a more time-consuming– but fun– way of randomly selecting which player will eventually win the game.

But that random selection isn’t quite uniform. Someone has to be the first player, the first to start risking loss of tokens, while the last player at the table gets to wait and watch. It’s an interesting problem to compute exactly how much effect playing order has on probability of winning.

Analysis results

The following table shows the exact probabilities (to given precision) of winning for each player, along with the expected number of turns to determine a winner, for games from two to seven players. (A turn is a roll of the dice, so that we do not count a turn for skipping a player without any tokens.) The Python code is available on GitHub.

# Players
234567
E(# turns)5.8098917.5158928.9454239.7020150.0633660.20762
Player
P(win)0.382070.306840.239260.193710.162120.139181
0.617930.328020.243180.193810.160990.137602
0.365140.255350.199940.164400.139523
0.262220.206320.169400.143174
0.206230.172620.146815
0.170470.148256
0.145467
Expected length of game and player win probabilities for each of up to 7 players, with lowest probability for a given number of players shown in red, highest probability in green.

The table layout is the same as this presentation of a Monte Carlo simulation analysis of the game. However, because those simulation results are only estimates, the order statistics indicated by red (lowest probability) and green (highest) are different in some places. For example, with five players, the first player, not the second, has the lowest probability of winning, and the next-to-last player has the greatest advantage.

These are close shaves– certainly close enough not to matter in a game played with children. But if this were a game played for money, then even these small differences in advantage are comparable to, for example, the magnitude of difference between an edge for the house or the player depending on minor variations in the rules for single-deck blackjack.

A sequence of linear systems

There were a few interesting implementation details. As discussed here before, we can compute the above probabilities (or expected values) as the solution of a system of linear equations. There is a variable x_{\mathbf{s},j} for each possible game state (\mathbf{s},j), where \mathbf{s} is a k-tuple indicating the number of tokens held by each of k players, and j \in \{0, 1, 2, \ldots, k-1\} is an integer indicating which player is to roll for the next turn.

Each variable x_{\mathbf{s},j} represents the probability of the first player winning the game (or the expected number of additional turns to complete the game), conditioned on the corresponding current game state (\mathbf{s}, j). That’s a lot of variables: k{4k \choose k} variables, which for k=7 players is already over 8 million variables.

But we don’t have to solve the entire system at once. The number of tokens in play never increases, so we can solve the very small subsystem involving only the game states with just one token in play, then use those cached results to construct and solve the subsystem of game states with two tokens in play, etc.

Ranking and unranking game states

To compute the resulting sparse matrix representing each of these linear systems, it is convenient to perfectly hash the game states to matrix row and column indices. In each game state, the tuple \mathbf{s} is a weak composition of the total number t \leq 3k of tokens remaining among the players (i.e., not in the center pot). Hashing weak compositions of t with k parts reduces to hashing k-1-subsets of \{1, 2, \ldots, t+k-1\}.

In most of the literature this process of perfect hashing from an object– in this case, a subset, or combination– to a consecutive integer index and back is referred to as ranking and unranking. We’ve done this before with permutations; for combinations, ranking is pretty straightforward:

def rank_subset(S, n=None):
    """Return colex rank of k-subset S of {0..n-1}."""
    return sum(math.comb(c, j + 1) for j, c in enumerate(S))

Note that this simple combinadic formula ranks subsets in colexicographic order. Lexicographic order take a bit more work, but in many applications such as this one, we just need an ordering, we don’t really care which one.

Unranking, or converting an integer index back to the corresponding subset, is the interesting subproblem that really motivated this post. Following is the basic guess-and-check algorithm:

def unrank_subset(r, n, k):
    """Return k-subset S of {0..n-1} with colex rank r."""
    S = [0] * k
    while k > 0:
        n = n - 1
        offset = math.comb(n, k)
        if r >= offset:
            r = r - offset
            k = k - 1
            S[k] = n
    return S

Those binomial coefficient calculations can be expensive, depending on the implementation, caching, etc. The following version implements the same basic algorithm, but takes advantage of the fact that the binomial coefficients needed at each iteration are “adjacent,” and thus differ by a simple factor:

def unrank_subset(r, n, k):
    """Return k-subset S of {0..n-1} with colex rank r."""
    S = [0] * k
    offset = math.comb(n, k)
    while k > 0:
        # Decrease n and update offset to comb(n, k).
        offset = offset * (n - k) // n
        n = n - 1
        if r >= offset:
            r = r - offset
            k = k - 1
            S[k] = n
            if k < n:
                offset = offset * (k + 1) // (n - k)
    return S

Finally, the following binary search version is generally worse than either of the above implementations when amortized over unranking of all possible subsets (as is needed in this game analysis), but it’s safer, so to speak, in that it still works in a reasonable amount of time even for astronomically large universal sets, i.e., when k \ll n.

def unrank_subset(r, n, k):
    """Return k-subset S of {0..n-1} with colex rank r."""
    S = [0] * k
    while k > 0:
        # Use binary search to decrease n until r >= comb(n, k).
        lower = k - 1
        while lower < n:
            mid = (lower + n + 1) // 2
            if r < math.comb(mid, k):
                n = mid - 1
            else:
                lower = mid
        r = r - math.comb(n, k)
        k = k - 1
        S[k] = n
    return S

But after all this, it’s worth acknowledging that none of this is strictly necessary for this particular problem. There is a space-time tradeoff to weigh here, where we could skip the ranking and unranking altogether, and simply pre-compute a dictionary mapping game states to array indices, via enumerate(itertools.combinations(...)) or any other convenient means of generating all states… that in this problem we have to store anyway to cache the probabilities (or expected values).

Counting Olympic medal podiums

From Rosen’s Discrete Mathematics and Its Applications textbook [1]:

Section 6.3 Exercise 47: There are six runners in the 100-yard dash. How many ways are there for three medals to be awarded if ties are possible? (The runner or runners who finish with the fastest time receive gold medals, the runner or runners who finish with exactly one runner ahead receive silver medals, and the runner or runners who finish with exactly two runners ahead receive bronze medals.)

I think it’s an interesting pedagogical question whether this is a “good problem” or not, at least in an academic setting. I think it could be a good problem… but like so many problems in combinatorics, this one generated quite a bit of confusion– justifiably, in my opinion– solely due to imprecise wording. Try it out for yourself: is the answer 260? Or 450? Or 873?

Before tackling this problem in more detail, I think it’s helpful to back up just a couple of exercises in the same section of the textbook, for a simpler and less ambiguous variant:

Section 6.3 Exercise 45: How many ways are there for a horse race with three horses to finish if ties are possible? [Note: Two or three horses may tie.]

With just three horses, an explicit case analysis isn’t too bad, and is perhaps the intended approach for students in an introductory course, without generating functions or other more heavyweight tools. But students are familiar with recurrence relations at this point, which yields a pretty elegant solution that also generalizes to compute the number b_n of race outcomes for any number n of horses:

b_n=\sum\limits_{k=1}^n {n \choose k}b_{n-k}, b_0=1

Coming back now to awarding medals to runners: although the parenthetical explanation of the rules above seems reasonably clear, the confusion arises in the possible conflict with the statement of the question itself. That is, are we counting only “podiums” where (a) exactly three medals are awarded, despite the rules allowing for more than three runners to receive medals in some cases of ties? Or are we perhaps counting podiums where (b) all three types of medal (gold, silver, and bronze) are awarded, possibly more than one of each type (although per the stated rules this would only ever involve multiple bronze medals, behind exactly one gold and one silver)?

Or are we simply counting podiums where (c) three or more medals are awarded, of three or fewer types, accounting for ties as explained in the subsequent description of the rules? This turns out to be the intended problem, with a modestly complex case analysis described in this top math.stackexchange.com answer… but trying to generalize to more than n=6 runners, or worse, to more than m=3 medal types (i.e., steps on the podium), seems daunting.

A recurrence relation can simplify the solution here as well, with a_{n,m} counting the number of such podiums:

a_{n,m}=\sum\limits_{k=1}^n {n \choose k}a_{n-k,m-k}

where a_{n,m}=1 if n \leq 0 or m \leq 0… or answering the other variants with the same recurrence, but with different initial conditions.

Finally, part of the motivation for this post was my lack of knowledge of how this has worked in actual events. Without the clarification in the problem statement, I thought that all runners with the fastest time would receive gold medals, and all runners with the second fastest time would receive silver, and all runners with the third fastest time would receive bronze. That hasn’t been the case at least in the Olympics, although there have been some isolated exceptions.

Reference:

  1. Rosen, K. H. (2019). Discrete Mathematics and Its Applications (8th ed.). New York, NY: McGraw-Hill. ISBN-13: 978-1-259-67651-2