NCAA bracket scoring systems


During the 2015 NCAA men’s basketball tournament, I won our office pool by (1) picking then-undefeated Kentucky to lose– although earlier than their actual Final Four loss to Wisconsin– and (2) picking Duke to win the championship game.  It was a come-from-behind victory for my bracket, moving from 14th place to 7th to 1st… over the span of the last three games in the 63-game tournament.

But should I have won?  Our pool used the common bracket scoring system of assigning:

  • 1 point for each correct pick in the first round of 64 teams,
  • 2 points for each correct pick in the second round of 32 teams,
  • 4 points for each correct pick in the third round of 16 teams,
  • 8 points for each correct pick in the fourth round of 8 teams,
  • 16 points for each correct pick in the two Final Four games,
  • 32 points for correctly picking the champion.

This “doubling” system has several reasonable mathematical motivations.  For example, each round of games is potentially worth the same number of points (32).  Also, assuming all of the teams are evenly matched– or equivalently, assuming that you make all of your picks by flipping a fair coin– then the expected number of points scored decreases by exactly half with each round.

But teams aren’t evenly matched, and you don’t make your picks by flipping coins.  Intuitively, then, it seems like this doubling system might be over-weighting the importance of later rounds, and perhaps a better system involves less extreme increases in points per game from one round to the next.  One of the more amusing common suggestions is a progression based on the Fibonacci sequence, with games in each round worth 2, 3, 5, 8, 13, and 21 points, respectively.  My goal in this post is to describe a means of more accurately evaluating and comparing these and other bracket scoring systems.

Probability model for tournament games

First, we need a way to model the probability of correctly picking any particular game.  A reasonably simple starting point is to assume that all games are independent, with each outcome’s probability depending only on the teams’ seeds.  More precisely, let P be a 16×16 matrix with entries

p_{i,j} = 1 - p_{j,i} = \frac{1}{2} + k(s_i - s_j)

indicating the probability that seed i beats seed j, where s_i is some measure of “strength” of seed i (decreasing in i), and k is a scaling factor that effectively determines the range of resulting probabilities.  For example, if k=0, then every game is a coin flip; at the other extreme, if k=1/(2(s_1 - s_{16})), then a 16th seed has zero probability of a first-round upset against a 1st seed.  For this discussion, k will be chosen so that

p_{1,16} = \frac{124+1}{124+2} = \frac{125}{126}

based on the observation that, in 124 match-ups over the last 31 years of the current tournament format, a 1st seed has so far never lost to a 16th seed.  This probability is the expected value of the corresponding beta distribution.

I used a simple version of this model a year ago to estimate the probability of picking a “perfect bracket,” that is, picking all 63 games correctly, using a linear strength function:

s_i = -i

so that p_{i,j} depends only on the difference between seeds.  Even this very simple model isn’t too bad, as shown in the following updated figure, with the linear prediction model in red, and the last 31 years of historical data shown in blue, with corresponding 95% confidence intervals in black.  As the often very wide confidence intervals suggest, 31 years is still not much data; for example, there have been only 7 match-ups between seeds differing by 10: 1st vs. 11th are split 3-3, and a single 2nd seed won over a 12th.

Probability of winning as a function of seed difference: point estimate (blue), 95% confidence interval (black), and linear prediction model (red).

Probability of winning as a function of seed difference: point estimate (blue), 95% confidence interval (black), and linear prediction model (red).

As usual, it turns out that this was not a new idea; Schwertman et. al. (see References at the end of this post) considered this same model back in 1991, as well as another non-linear strength function that turns out to be a better historical fit:

s_i = \Phi^{-1}(1 - \frac{4i}{n})

where \Phi^{-1} is the quantile function of the normal distribution, and n=351 is the total number of Division I men’s basketball teams.  The idea is that the “strengths” of all teams are normally distributed, with the 64 teams in the tournament comprising the “strongest” teams in the upper tail of this distribution.  I will use this strength function for the remainder of this discussion.

Computing probabilities of correct picks

Given whatever matrix P of probabilities we choose, we can use it to compute the resulting distribution of the seed winning any particular game in the tournament.  If \mathbf{a} and \mathbf{b} are 16-element column vectors with a_i (b_i) indicating the probability that the home (visiting) team in a particular game is seeded i, then the distribution of the seed winning that game is given by

\mathbf{a}\circ(P\mathbf{b}) + \mathbf{b}\circ(P\mathbf{a})

where \circ is the element-wise Hadamard product.  In the first round, each \mathbf{a} and \mathbf{b} is a basis vector.  Note that including both terms in the summation is really just a computational convenience, at least within a region, since for a given seed, only one of the two terms’ corresponding components will be non-zero.

By applying this formula iteratively for each game in each successive round, we can eventually compute the probability of each seed winning each game in the tournament.  For example, the following Python code computes the distribution of the winner of any one of the four regional championships (among 16 teams each):

import numpy as np

def bracket_seeds(num_teams):
    """Seed given number of teams in single-elimination tournament."""
    seeds = np.array([1])
    while len(seeds) < num_teams:
        seeds = np.array([seeds,
                          2 * len(seeds) + 1 - seeds]).transpose().flatten()
    return seeds

def dist_game(dist_a, dist_b, P):
    """Compute distribution of winner of a vs. b with probability model P."""
    return dist_a * ( + dist_b * (

def dist_region(P):
    """Compute distribution of regional champion with probability model P."""
    games = np.eye(16)[bracket_seeds(16) - 1]
    for round in range(4):
        games = [dist_game(games[k], games[k + 1], P)
                 for k in range(0, len(games), 2)]
    return games

The resulting predicted probabilities are shown in the following figure in red– using the normal quantile strength function above– compared with the actual frequencies in blue.

Winner of regional championship: actual frequency (blue) and predicted probability (red).

Winner of regional championship: actual frequency (blue) and predicted probability (red).

Bracket scoring systems

Now that we have a means of computing the probability of any particular team winning any particular game, we can evaluate a completed bracket by computing the expected number of correct picks in each round.  For example, suppose that our bracket simply picks the favorite (i.e., the higher seed) to win every game.  Then the expected number of correct picks will be:

  • 23.156 of 32 games in the first round,
  • 9.847 of 16 games in the second round,
  • 4.292 of 8 games in the third round,
  • 1.792 of 4 games in the fourth round regional championships,
  • 0.540 of 2 games in the Final Four,
  • 0.156 of the final championship game.

At this point, we can compare various bracket scoring systems by comparing the expected number of points scored in each round using those systems.  For example, the following table shows the expected points per round for the two systems mentioned so far: the doubling system (1, 2, 4, 8, 16, 32) and the Fibonacci system (2, 3, 5, 8, 13, 21), normalized to 1 point per first-round game.

Round Doubling Fibonacci/2
1 23.156 23.156
2 19.694 14.771
3 17.168 10.730
4 14.334 7.167
Final Four 8.636 3.508
Championship 4.978 1.633

Which of these or any other systems is “best” depends on what kind of pool you want.  With the doubling system (or even greater progressions), you can have an “exciting,” horse race-ish pool, with lead changes and multiple entries having a chance of winning throughout all six rounds.  With the Fibonacci system (or even more gradual progressions), you can have a pool that rewards research and accurate prediction of early-round upsets… but such a pool may be effectively over well before the Final Four.


  1. Schwertman, N., McCready, T., and Howard, L., Probability Models for the NCAA Regional Basketball Tournaments, The American Statistician, 45(1) February 1991, p. 35-38 [PDF]

Appendix: Historical data

The following matrices contain the record of all wins and losses, by round and seed match-up, for the 31 tournaments in the current format from 1985 through 2015.  First, the following 16×16 matrix indicates the number of regional games– that is, in the first through fourth rounds– in which seed i beat seed j.  Note that the round in which each game was played is also implicitly determined by the seed matchup (e.g., 1 vs. 16 is in the first round, etc.).

   0  21  13  32  30   6   4  51  56   4   3  19   4   0   0 124
  21   0  23   2   0  23  53   2   0  26  12   1   0   0 117   0
   8  14   0   2   2  38   7   1   1   9  25   0   0 104   1   0
  15   4   3   0  36   2   2   3   2   2   0  21  99   0   0   0
   7   3   1  30   0   1   0   0   1   1   0  80  11   0   0   0
   2   6  28   1   0   0   3   0   0   4  81   0   0  13   0   0
   0  20   5   2   0   3   0   0   0  76   0   0   0   1   2   0
  12   3   0   5   2   1   1   0  63   0   0   0   1   0   0   0
   5   1   0   0   1   0   0  61   0   0   0   0   1   0   0   0
   0  18   4   0   0   2  48   0   0   0   0   0   0   1   4   0
   3   1  13   0   0  43   3   0   0   2   0   0   0   5   0   0
   0   0   0  12  44   0   0   1   0   0   0   0   8   0   0   0
   0   0   0  25   3   0   0   0   0   0   0   3   0   0   0   0
   0   0  20   0   0   2   0   0   0   0   0   0   0   0   0   0
   0   7   0   0   0   0   1   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

The following matrix, in the same format, is for (fifth round) Final Four games:

  12   6   2   5   1   0   1   1   1   0   0   0   0   0   0   0
   4   2   3   1   0   1   0   0   0   0   1   0   0   0   0   0
   4   2   0   2   0   0   0   0   0   0   1   0   0   0   0   0
   1   0   0   1   1   0   0   0   0   0   0   0   0   0   0   0
   0   1   0   0   1   0   0   1   0   0   0   0   0   0   0   0
   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   2   0   0   0   0   0   0   0   0   1   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

And finally for championship games:

   6   6   1   2   3   1   0   0   0   0   0   0   0   0   0   0
   1   0   3   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   2   1   0   0   0   0   1   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0
   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0


Posted in Uncategorized | Leave a comment

Hangman, Scrabble, and Google Books


I think a computer version of the paper-and-pencil game Hangman is a great programming project for students.  Actually, “Evil” Hangman is even better as a more advanced project: instead of the computer selecting a random dictionary word as usual at the start of the game– and remaining bound to that initial selection throughout the game– it merely selects a word length.  Then in response to each player guess, it “scores” the guess in a way that maximizes the number of possible dictionary words that are still consistent with the clues provided so far.  This is “evil” in that it is arguably cheating… but in a way that is indistinguishable from fair play from the perspective of the human guesser.

In any case, a necessary step in such a project is identifying a list of words to use as a dictionary.  A Scrabble dictionary seemed like a reasonable starting point… but considering playable words like zyzzyvas, I thought it might be useful to trim the dictionary down to a shorter list of “easier” words in more common use.  But how to determine which words are more common than others?  We can do that, with help from Google Books.

Word Lists

I started with the OTCWL2, the second edition of the Official Tournament and Club Word List used for Scrabble in the United States and Canada.  This is the latest “official” reference accessible in electronic form, containing 178,691 words of 2 to 15 letters.

The OSPD4 is the fourth edition of the Official Scrabble Players Dictionary, a less rigorously edited publication containing a subset of the words in the OTCWL2 with at most 8 letters (with a small tail of some longer inflections), and excluding words that are considered offensive or otherwise inappropriate for school and family games.  The figure below shows the distribution of word lengths for both of these word lists.

Distribution of word length in the OTCWL2 and OSPD4 Scrabble dictionaries.

Distribution of word length in the OTCWL2 and OSPD4 Scrabble dictionaries.

To sort these lists of words by frequency of occurrence in natural language, I used the Google Books Ngrams data set (English Version 20120701), containing a list of numbers of occurrences of “1-grams”– roughly, whitespace-delimited tokens– in the more than 8 million books scanned by Google.  I aggregated these counts for all “words” containing only the case-insensitive alphabetic characters A-Z, with no other special characters, but with no other restriction on word length or frequency of occurrence.  (Peter Norvig did a very similar data reduction analysis recently, but (1) he discarded rarely-occurring but still “playable” words, and (2) I cannot reproduce his data; both I and another commenter observe identical word counts that are, strangely, roughly half what his analysis indicates.  I could use more eyes on this if someone wants to take a third look.)

The result is nearly 5 million distinct words, each from 1 to 133 letters in length, occurring over 378 billion times throughout the corpus.  The following figure shows the distribution of lengths of distinct words, truncated to the same 15-letter maximum for comparison with the figure above.

Distribution of lengths of distinct 1-grams in the Google Books corpus.

Distribution of lengths of distinct 1-grams in the Google Books corpus.

This distribution has a shape similar to that of the OTCWL2, with comparable average lengths of 8.86713 and 8.61226 letters for Scrabble words and 1-grams, respectively.  However, shorter words are more common in actual use: if we weight each 1-gram by its number of occurrences in the corpus, the resulting distribution shown below is skewed downward significantly, with an average length of only 4.84264 letters.

Distribution of length of 1-grams, weighted by number of occurrences.

Distribution of length of 1-grams, weighted by number of occurrences.

Word Frequencies

Armed with frequencies of occurrence of all 1-grams in the Google Books corpus, we can now map Scrabble dictionary words to their corresponding frequencies, and sort.  The following table lists the top 150 most frequently occurring words, which are common to both OTCWL2 and OSPD4.  (See the usual location here to download the complete lists.)

the      26548583149
of       15482969531
and      11315969857
to        9673642739
in        8445476198
is        4192081707
that      4000373255
for       3272628244
it        2870028984
as        2850302594
was       2751347404
with      2591390604
be        2409421528
by        2351529575
on        2297245630
not       2261359962
he        2055218371
this      1913024043
are       1850210733
or        1833848095
his       1805683788
from      1734598284
at        1706718058
which     1570109342
but       1396171439
have      1388716420
an        1363116521
had       1308007510
they      1231059987
you       1168867586
were      1135241275
their     1076488415
one       1074487479
all       1031379950
we        1028643905
can        832705210
her        816638617
has        816070042
there      811848203
been       809520620
if         782488523
more       773623990
when       759867879
will       742569455
would      736404476
who        732299236
so         724571145
no         700307542
she        696346928
other      691591710
its        684717118
may        659022046
these      652892753
what       611266211
them       599817013
than       592508982
some       591157776
him        590383891
time       583894803
into       572112004
only       567615117
do         558298911
such       550034776
my         541136893
new        536629127
about      535438145
out        524101460
also       519307893
two        517040036
any        491821918
up         487701411
first      461575987
could      443444946
our        437481406
then       436395729
most       430481826
see        423246243
me         409783516
should     407122235
after      401110215
said       378997761
your       375384324
very       373437241
between    372032027
made       368828362
many       366336877
over       357758573
like       354924476
those      345833904
did        344906403
now        339668198
even       338753297
well       337707194
where      335821693
must       331496692
people     329358030
through    323545720
how        315034600
same       312263785
work       310897825
being      308096733
because    306160671
man        304508612
life       303632580
before     302583542
each       300640666
much       300630798
under      292804847
years      281623235
way        279891710
great      278574742
state      276687623
both       271888302
use        267896038
upon       266216636
own        262133879
used       262032076
however    259603813
us         258486246
part       257902754
good       254969414
world      252934367
make       251401534
three      243893653
while      241720369
long       238665543
day        235502808
without    232791602
just       227081780
men        227046739
general    225306151
during     224086601
another    222347954
little     221427582
found      218308716
here       216154769
system     214324790
does       213337247
de         212139309
down       211277897
number     210832441
case       208128805
against    205222260
might      204780135
still      204131780
back       203273772
right      202859545
know       202770927
place      200698028
every      196653070

You may have noticed that the single-letter words a and I are missing, since Scrabble words always contain at least two letters.  After these two, which are 6th and 19th, respectively, the next most frequently occurring 1-grams that are not Scrabble-playable are (ignoring other single letters): American, York, IILondon, British, England, non, America, William, France, and vol.

Although Scrabble OTCWL2 dictionary words make up just about 3.3% of all 1-grams, they are not exclusively “common” words.  The following figure shows, among a given quantile x of most frequently occurring 1-grams in the corpus, the fraction y of the Scrabble dictionary appearing among those 1-grams.

Cumulative distribution of Scrabble-playable words among 1-grams (sorted by frequency).

Cumulative distribution of Scrabble-playable words among 1-grams (sorted by frequency).

If Scrabble words were the most common 1-grams, the red curve for example would increase linearly from (0,0) to (0.033,1) and clamp there.  The curves do increase sharply at the beginning, indicating that most of the dictionary is indeed among the most frequently occurring 1-grams.  But since the curves instead remain strictly increasing throughout, this indicates that there are Scrabble words of any desirable rarity: unpuckers, for example, is one of the more amusing of the 26 playable words that occurs the minimum of just 40 times in the entire corpus.

But 40 isn’t really the minimum; there are Scrabble words that do not appear even once as a 1-gram.  As indicated by the x=1 intercepts in the figure above, about 7% (12,867 to be exact) of the OTCWL2 words do not appear anywhere in the Google data set: zyzzyvas mentioned earlier is one such word.

Hangman Strategy

A final thought: as a programming project, Hangman is even more challenging with the computer acting as the guesser.  This DataGenetics blog post contains an interesting analysis of incremental improvements in strategy for the guesser in Hangman.  For example, the author rightly points out the distinction between guessing letters based on their frequency of occurrence in natural English text, and the better strategy of guessing based on the number of remaining possible words in the dictionary that contain a candidate letter.

However, this latter “greedy” strategy is still not optimal, even assuming uniformly random “non-evil” selection of the unknown secret word.  Rather than just maximizing the probability of a successful next guess, the player really wants to maximize the overall probability of winning— that is, of not exceeding the allowed total number of incorrect guesses.

For a realistic dictionary size, this is a much harder problem to solve.  But we can see that this is indeed strictly better than the greedy strategy with the following minimal counterexample: consider the dictionary with just four 2-letter words {am, an, at, me}, where the player is allowed zero incorrect guesses.  Using the greedy strategy described in the DataGenetics post, the probability of winning is 1/4; but an optimal strategy wins with probability 1/2.


Posted in Uncategorized | Leave a comment

Time management in distributed simulation


Much of my work involves discrete event simulation, where a simulation advances, in discrete jumps, from one “interesting” point in time (called an event) to the next.  Managing execution of such a simulation is pretty straightforward when everything happens in a single thread.  But what if multiple processes, each representing just one component of a larger simulation, want to play nicely together?  How do we control the advance of each process’s “logical time,” so that events are always handled in time order, with no one lagging behind or jumping too far ahead?  In particular– and the question that motivated this post– what is lookahead, and why does my simulation need to know about it?

My goal in this post is to describe how this works at an introductory level: the protocol for communicating events and time advances, and the constraints imposed by that protocol.  But just as writing a compiler or interpreter can improve understanding of programming languages, I think actually implementing this protocol can help to understand why those constraints exist.  And rather than stick to pseudo-code as in most of the relevant literature, I wanted to write something that actually runs.  The end result is less than 90 lines of Python code, included below, which may be used either as a sandbox for single-step experimenting at the interpreter prompt, or to coordinate an actual distributed simulation:

"""Conservative time management for distributed simulation."""

import collections
import heapq
from functools import reduce

federations = collections.defaultdict(dict)
epsilon_time = 1e-9

class Federate:
    """Manage communication and time constraint/regulation for a federate."""

    def __init__(self):
        """Create an "orphan" federate not joined to any federation."""
        self.federation = None

    def join(self, federate_name, federation_name, lookahead):
        """Join federation as time-constrained/regulating federate."""
        assert(self.federation is None)
        assert(lookahead > epsilon_time)
        federation = federations[federation_name]
        assert(not federate_name in federation) = federate_name
        self.federation_name = federation_name
        self.federation = federation
        time = reduce(max, [fed.time for fed in federation.values()], 0)
        self.time = max(time - lookahead + epsilon_time, 0)
        self.requested = self.time
        self.lookahead = lookahead = []
        self.event_tag = 0
        federation[federate_name] = self

    def resign(self):
        """Resign from federation."""
        if self.federation:
        self.federation = None

    def send(self, event, time):
        """Send future event in timestamp order."""
        assert(time >= self.requested + self.lookahead)
        for fed in self.federation.values():
            if not fed is self:
                               (time,, self.event_tag, event))
        self.event_tag = self.event_tag + 1

    def request(self, time):
        """Request advance to given future time."""
        assert(time > self.time)
        assert(self.requested == self.time)
        self.requested = time

    def receive(self, event, time):
        """Called when timestamp order event is received via send()."""
        print('{}.receive(event={}, time={})'.format(, event, time))

    def grant(self, time):
        """Called when next event request() is granted."""
        print('{}.grant(time={})'.format(, time))

    def next_grant_time(self):
        time = self.requested
        if time > self.time and
            time = min(time,[0][0])
        return time

    def advance(self):
        galt = reduce(min,
                      [fed.next_grant_time() + fed.lookahead
                       for fed in self.federation.values() if not fed is self],
        if self.next_grant_time() < galt:
            while and[0][0] <= self.requested:
                self.requested, source, tag, event = heapq.heappop(
                self.receive(event, self.requested)
            self.time = self.requested

    def push(self):
        for fed in self.federation.values():
            if fed.requested > fed.time:

From one to many

Before jumping into distributed simulation, though, let’s start simple with just a single process, and consider what an event-based simulation framework might look like:

import heapq
import collections

class Simulation:
    def __init__(self):
        """Create an "empty" simulation."""
        self.time = float('-inf') = []
        self.event_tag = 0
        self.listeners = collections.defaultdict(list)

    def publish(self, event, time):
        """Insert event into the local queue."""
        assert(time >= self.time)
                       (time, self.event_tag, event))
        self.event_tag = self.event_tag + 1

    def subscribe(self, event_type, listener):
        """Register listener callback for events of the given type."""

    def run(self):
        """Run simulation."""
            self.time, tag, event = heapq.heappop(
            for listener in self.listeners[event.type]:
                listener(self, event, time)

The idea is to maintain a priority queue of future events, sorted by time.  As the simulation is run(), we “advance” time by iteratively popping the next time-stamped event from the queue, and notifying any listeners that subscribe() to events of that type, who may in turn publish() (i.e., insert) additional future events into the queue.

There are several things to note here:

First, your discrete event simulation may not look like this at all.  It could be process-based, or a simple batch for-loop, or whatever.  That’s okay– this is just a convenient generic starting point, so that once we start interacting with other components in a distributed simulation, we can see how to modify the publish() and run() methods in a natural way.

Second, at this point the only real constraint is that we cannot publish events in the past (see the assertion in line 14).  That is, while handling an event that occurs at a given time t, a listener is free to publish new events that occur at any time greater than or even equal to t.  We will have to restrict this freedom somewhat when participating in a distributed simulation.

Finally, because multiple events can have the same time stamp, repeatability is a concern.  We want to ensure that “ties” are broken in a deterministic, repeatable way, independent of the particular implementation of the underlying priority queue.  To do this, note that elements of the queue are actually tuples of the form (time, tag, event), with the natural lexicographic ordering.  The tag is just a monotonically increasing counter that provides such a tie-breaker.  It will eventually be useful to add a source element to tuples like this, indicating the string name of the simulation generating the event… but only once we start receiving events from other external simulation components (with different names).

Federation of federates

In what follows, I will try to use terminology consistent with the High-Level Architecture (HLA), a rather ridiculously generic name conveying little information about what it actually is: an IEEE-standardized definition of services for managing a distributed simulation via a centralized run-time infrastructure (RTI).  But although this will be in HLA-speak, it’s worth noting that the ideas presented here are applicable to distributed simulation in general, not just to HLA in particular.

An HLA federation is a distributed simulation consisting of a collection of simulation components called federates, interacting via the RTI.  The RTI acts as both the mailman and timekeeper for the federation, coordinating the communication of events between federates and the granting of requests by each federate to advance their clocks forward in (simulated) logical time.

Simulation federates interacting via centralized RTI.  (Image created by Arichnad 2009, retrieved from

Simulation federates interacting via centralized RTI. (Public domain image created by Arichnad 2009, retrieved from

Note that in the figure above, the federates never communicate directly with each other.  Everything goes through the RTI: we will assume that each federate communicates only with the RTI via its own reliable, in-order channel (e.g., a TCP socket).  This simplifies the use of the code provided here, since we can leave out the multi-threaded synchronization details of the actual inter-process communication, and instead focus on the single “main” thread in which the RTI responds to the already-serialized sequence of messages received from various federates.  (However, it is an exercise for the reader–or another post– to show that this centralization is not strictly necessary.)

Properties and states of a federate

To manage a federation of n federates from the RTI, let’s maintain the following properties of each federate:

  • t_i is the current logical time of federate i.  Note that at any given point in wall clock time, different federates may in general have different logical times.
  • r_i \geq t_i is the requested time to which federate i has requested advance.  A federate may be in one of two states: if r_i > t_i, then federate i is in the Time Advancing state.  If r_i = t_i, then federate i is in the Time Granted state.
  • \Delta_i > 0 is the lookahead for federate i; more on this shortly.
  • Q_i is the priority queue of future time-stamped events, received by the RTI from other federates, to be delivered to federate i.

The message protocol

A federate may send four types of messages to the RTI: it can (1) join a federation, (2) exit or “resign” from a federation, (3) request a time advance, or (4) send an event to other federates:

  • join(federate, federation, lookahead) sent by a simulation component indicates a request to join a named federation (or create it if it does not yet exist) as a federate with the given name and positive lookahead.  The name of the federate will be used to deterministically order identically time-stamped events as mentioned above.  The name of the federation may be used to allow a single RTI process to service multiple federations (with distinct names) simultaneously.  A joining federate is effectively in the Time Advancing state, and must wait until it receives an initial grant() message (see below) to initialize its logical time.
  • resign() sent by a federate indicates an exit from the currently joined federation.
  • request(t) sent by federate i indicates a request to advance to future logical time t > t_i.  Federate i must currently be in the Time Granted state (i.e., r_i = t_i), and upon sending this message the federate transitions to the Time Advancing state (i.e., r_i is updated to the given future time t).  Note that the federate’s current logical time t_i remains unchanged; the federate must wait until it receives a grant() message in response from the RTI (see below).
  • send(e, t) sent by federate i indicates that event e should be sent to all other federates in the federation with time stamp t \geq r_i + \Delta_i.  (The event message is not immediately delivered, however, but is instead inserted into the RTI’s central queues Q_j for all other federates j.)  A federate is free to send these event messages in either the Time Granted or Time Advancing states; however, note that all such “externally published” events must have strictly future time stamps, as specified by the positive lookahead \Delta_i.

In the other direction, the RTI may send just two types of messages back to a federate… and will only do so when a federate is in the Time Advancing state:

  • receive(e, t) sent to federate i indicates that external event e should be effectively inserted into the federate’s local event queue with time stamp t, where t_i < t \leq r_i.  The federate may receive zero or more additional receive() event messages, all with the same time stamp, followed by:
  • grant(t) sent to federate i indicates that the federate has transitioned from the Time Advancing to the Time Granted state, with its new logical time t_i— and requested time r_i— updated to the given time t \leq r_i.  Note that the granted time may be less than the originally requested time… but this will only be the case if one or more external events are received between the request() and the grant(), in which case the granted time will be equal to the time stamp on the received event(s).

The punch line is that, in return for federates adhering to this protocol, the RTI “promises” to maintain the following relatively simple invariant throughout execution of a federation:

A federate will only receive external events with strictly future time stamps.  That is, at all times, a federate has already encountered all possible external events with time stamps up to and including its current logical time.

The Python code above implements the RTI’s role in this protocol.  Call the Federate class methods join(), resign(), send(), and request() to “send” the corresponding messages to the RTI, and the receive() and grant() methods are callbacks indicating the corresponding response messages.


Coming back now to the framework for a single simulation component, we can see how to modify the event loop to participate in a distributed simulation.  Before we process the next event in our local queue, we first “peek” at its time stamp, and request advance to that logical time.  While waiting for the time advance grant, we insert any externally received events into our local queue– possibly ahead of the event time to which we initially tried to advance.

def run(self):
    """Run simulation."""

        # Peek at next event time stamp in local queue.
        next_time, tag, event =[0]

        # Send request() message, wait for grant().
        self.time, tag, event = heapq.heappop(
        for listener in self.listeners[event.type]:
            listener(self, event, time)

def receive(self, event, time):
    """Insert external event into the local queue."""
    self.publish(event, time)

Note that this event loop is actually more “relaxed” than it could be.  That is, when we request() a time advance, we simply block until we get the grant(), storing any intervening receive() events into our local queue… without immediately processing any of them.  Thus, we only ever send() external events when we are in the Time Granted state, although the time management protocol supports sending when in the Time Advancing state as well.

I have obviously simplified things quite a bit here.  For example, in the actual HLA interface, there are complementary notions of time constraint (blocking for grant() callbacks) and time regulation (observing the lookahead restriction when sending events), that can be independently enabled or disabled.  A federate’s lookahead can be modified at run time.  And other time advance services are available beyond the one discussed here (typically referred to as the “Next Message Request” service), etc.  But hopefully this still serves as a useful starting point.

And I have not even mentioned what I actually find to be the most interesting aspect of this problem: a proof of correctness.  That is, how do we know that this implementation actually preserves the “Thou shalt receive only future events” invariant promised above?  In particular, why is strictly positive lookahead critical to that proof of correctness, and what goes wrong when we try to support zero lookahead?  And how can we “decentralize” the algorithm without changing the basic protocol from the perspective of the individual federates?  These are all interesting questions, maybe subject matter for a later post.


  1. 1516.1-2010 – IEEE Standard for Modeling and Simulation (M&S) High Level Architecture (HLA)– Federate Interface Specification (Section 8). [HTML]
  2. Fujimoto, R. M., Parallel and Distributed Simulation Systems. New York: John Wiley and Sons, 2000 (Chapter 3).
  3. Lamport, L., Time, Clocks and the Ordering of Events in a Distributed System, Communications of the ACM21(7) July 1978, p. 558-565 [HTML]
  4. Misra, J., Distributed Discrete-Event Simulation, Computing Surveys, 18(1) March 1986, p. 39-65 [PDF]


Posted in Uncategorized | Leave a comment

Update: Chutes and Ladders is long, but not *that* long

It occurred to me, as I was failing to pay attention in a class this past week, that I neglected an important detail in my recent post analyzing the expected number of turns to complete the game Chutes and Ladders: namely, that one generally does not play the game alone.

Recall from the previous post that we can express the expected value x_i of the number of die rolls (or spins of the spinner) needed for a player to reach the final square 100, starting from square i, “recursively” as

x_i = 1 + \frac{1}{6} \sum_{j=1}^6 x_{f(i,j)}, i < 100

x_{100} = 0

where f(i,j) is the number of the square reached from square i by rolling j (thus encoding the configuration of chutes and ladders on the board).  Solving this system of 100 equations yields the value x_0 of about 39.5984 turns on average for our hypothetical player to finish the game.

However, it is a bit misleading to stop there, since the expected total number of turns in a game with multiple players does not simply scale directly as the number of players.  That is, for example, in a game with two players, we should not expect nearly 80 total rolls of the die, 40 for each player.  As simulation confirms, the game typically ends much more quickly than that, with an actual average of about 52.5188 total rolls, or only about 26 turns for each player.

(Why is this?  This is a good example of the common situation where we can gain insight by considering “extreme” aspects or versions of the problem.  In this case, first note that the shortest possible path from the start to the final square 100 is just seven moves.  It is very unlikely that any single player will actually take this short route, with a probability of only 73/46656, or about 1 in 640.  But now suppose that instead of just one or even two players, there are one million players.  It is now a near certainty that some one of those million players will happen to win the lottery and take that short route… and so we should expect that the average number of total turns should be “only” about 7 million, instead of 40 million as the earlier post might suggest.)

We can still compute this two-player expected value exactly, using the same approach as before… but it starts to get expensive, because instead of just 100 equations in 100 unknowns, we now need 100^2 or 10,000 equations, to keep track of the possible positions of both players:

x_{i,j} = 1 + \frac{1}{6} \sum_{k=1}^6 x_{j,f(i,k)}, i,j < 100

x_{i,100} = 0, i < 100

Solving yields the desired value x_{0,0} \approx 52.5188.


Posted in Uncategorized | 1 Comment

Analysis of Chutes and Ladders


A friend of mine has been playing the board game Chutes and Ladders recently with his son.  The game is pretty simple: starting off of the board (effectively on “square zero”), each player in turn rolls a die (actually, spins a spinner) and– if possible– moves forward the indicated number of squares in serpentine fashion on the board shown below:

Chutes and Ladders board layout.  In some variants, the chute from 48 to 26 is from 47 to 26.

Chutes and Ladders board layout. In some variants, the chute from square 48 to 26 starts at square 47 instead.

If a player lands on the tail of an arrow, he continues along the arrow to the indicated square, ending his turn there.  (“Chutes” are arrows leading backward, and “ladders” are arrows leading forward.)  The first player to land exactly on square 100 wins the game.

My friend emailed me with the question, “What is the expected number of turns before the game is over?”  I think this is a nice problem for students, since it yields to the usual two-pronged attack of (1) computer simulation and (2) “cocktail napkin” derivation of an exact solution.

This is certainly not a new problem.  See the references at the end of this post for just a few past analyses of the game as a Markov chain.  My goal here is to provide an intuitive derivation of the exact expected length of the game as a solution to a system of linear equations, while staying away from the more sophisticated machinery of Markov chains, fundamental matrices, etc., that younger students may not be familiar with.

Unbounded vs. infinite

Before getting to the exact solution, though, it is worth addressing a comment in the referenced DataGenetics blog post about Monte Carlo simulation of the game:

“Whilst the chances of a game running and running are essentially negligible (constantly landing on snakes [i.e., chutes] and going around in circles), there is a theoretical chance that the main game loop could be executing forever. This is bad, and will lock-up your code. Any smart developer implementing an algorithm like this would program a counter that is incremented on each dice roll. This counter would be checked, and if it exceeds a defined threshold, the loop should be exited.”

I strongly disagree with this.  The valid concern is that there is no fixed bound on how many turns a particular simulated game might take; that is, for any chosen positive integer r, there is a positive probability that the game will require at least r turns, due to repeatedly falling backward down chutes.  However, the probability is zero that the game will execute forever; we can be certain that the game will always eventually terminate.  There is a difference between unbounded and infinite execution time.  In fact, such explicit “safety clamping” of the allowed number of moves implicitly changes the answer, modifying the resulting probability distribution (and thus the expected value) of the number of moves, albeit by an admittedly small amount.

There are situations where games can take an infinitely long time to finish– Chutes and Ladders just isn’t one of them.  For example, in this multi-round card game, everything is fine as long as p > 1/2.  And even when p = 1/2, the game is still guaranteed to finish, although the expected number of rounds is infinite.  But when p < 1/2, there is a positive probability that the game never finishes, and indeed continues “forever.”

Recursive expected values

To see how to analyze Chutes and Ladders, first consider the following simpler problem: how many times should you expect to flip a fair coin until it first comes up heads?  Let x be this desired expected value.  Then considering the two possible outcomes of the first flip, either:

  • It comes up heads, in which case we are done after a single flip; or
  • It comes up tails… in which case we are right back where we started, and so should expect to need x additional flips (plus the one we just “spent”).

That is,

x = \frac{1}{2}(1) + \frac{1}{2}(1 + x)

Solving yields x=2.  More generally, if P(success) is p, the expected number of trials until the first success is 1/p.


We can apply this same idea to Chutes and Ladders.  Let x_i be the expected number of turns needed to finish the game (i.e., reach square 100), starting from square i, where 0 \leq i \leq 100 (so that x_0 is the value we really want).  Then

x_i = 1 + \frac{1}{6}\sum_{j=1}^6 x_{f(i,j)}, i < 100

x_{100} = 0

where f(i,j) is the number of the square reached from square i by rolling j— which is usually just equal to i+j, but this function also “encodes” the configuration of chutes and ladders on the board, as well as the requirement of landing exactly on square 100 to end the game.

So we have a system of 100 linear equations in 100 unknowns, which we can now talk the computer into solving for us (in Python):

import numpy as np

n, m = 100, 6
chutes = {1: 38, 4: 14, 9: 31, 16: 6, 21: 42, 28: 84, 36: 44,
          48: 26, 49: 11, 51: 67, 56: 53, 62: 19, 64: 60, 71: 91,
          80: 100, 87: 24, 93: 73, 95: 75, 98: 78}

A = np.eye(n)
b = np.ones(n)
for i in range(n):
    for j in range(1, m + 1):
        k = i if i + j > n else chutes.get(i + j, i + j)
        if k < n:
            A[i][k] -= 1.0 / m
x = np.linalg.solve(A, b)


The following figure shows all of the resulting values x_i.  For example, the expected number of turns to complete the game from the start is x_0 = 39.5984.

Expected number of (additional) turns to finish game starting from the given square.

Expected number of (additional) turns to finish game starting from the given square.

Note that we can easily analyze the “house rule” used in the DataGenetics post– allowing a player to win even when “overshooting” square 100– simply by changing the k=i in line 12 to k=n.  The result is a modest reduction in game length, to about 36.1931 turns.  (Edit: However, this is not the whole story.  See this subsequent post addressing the fact that the game involves multiple players.)


  1. Althoen, S. C., King, L., and Schilling, K., How Long Is a Game of Snakes and Ladders?  The Mathematical Gazette, 77(478) March 1993, p. 71-76 [JSTOR]
  2. DataGenetics blog, Mathematical Analysis of Chutes and Ladders [HTML]
  3. Hochman, M., Chutes and Ladders [PDF]
Posted in Uncategorized | 5 Comments

Searching for chocolates

The following problem occurred to me, with Valentine’s Day approaching next week:

A couple of years ago, I bought a box of chocolates for Valentine’s Day.  The box contained n chocolates, indistinguishable from the outside, but the inside of each containing one of n different fillings (e.g., cherry, caramel, etc.).  On the cover of the box was a map, with labels indicating the type of each chocolate in the corresponding position.  Using the map, it was easy for me to find my favorite type of chocolate: I just find it on the map, pick it out of the box, and eat that one single chocolate.

Last year, I bought another box of chocolates.  It was the same box, containing the same n different types of chocolate… but the map was no longer printed on the box cover.  So to find my single favorite type of chocolate, I was forced to eat chocolates one at a time until I found the one I was looking for.  With no information about which chocolate was which, I expected to have to eat (n+1)/2 chocolates on average to find my favorite, by simply picking one after the other at random.

This year, I have bought another box of chocolates.  Again, it is the same box, with the same n types of chocolate, and this time the map is printed on the box cover… but as a puzzle, my wife secretly opened the box and rearranged all of the chocolates, so that all of the labels on the map are incorrect.  Now what is the optimal strategy for finding my favorite type of chocolate, and what is the corresponding (minimized) expected number of chocolates that I have to eat?

Posted in Uncategorized | 4 Comments

One-card draw poker


Let’s play another simple card game: after a $1 ante, you (the dealer) shuffle a standard 52-card poker deck, and deal one card face down to each of us.  After looking at our respective cards, we each have the option either to “stay” and keep our card, or to “switch,” discarding and drawing a new card from the top of the deck.  The player with the highest-ranked card (ace low through king high) wins the pot, where a tie splits the pot.  What is the optimal strategy and expected value of playing this game?

(Note that although all of the games discussed here are playable– and are even more interesting and complex– with three or more players, I am focusing on just two players here.)

One-card Guts

There are two interesting variations on these basic rules.  First, suppose that we declare our choice of whether to stay or switch simultaneously, as in the game Guts: each player takes a chip or bottle cap under the table, either places it in a closed fist (to stay) or not (to switch), then holds the closed fist over the table.  All players simultaneously open their hands palm-down to indicate their choice.

This is the simpler version of the game.  With two players, it can be shown that the optimal strategy for both players is to stay if they are dealt a nine or higher.  That the game is fair (i.e., the expected value is zero) is clear from symmetry; no one player is “preferred” or distinguished in any way by the rules of the game.

One-card draw poker

Now suppose instead that we play more like a typical draw poker game: after the initial deal, each player in turn decides whether to discard and draw a new card.  Remember that you are the dealer, so you get to make your decision after I have made mine.  Can you exploit this advantage?

This is the version of the game that motivated this post.  I came up with this while trying to find examples of games for a student, games that were “small” enough to be tractable to analyze and experiment with via simulation, but still with interesting strategic complexity.  It’s worth noting that the name “One-Card Poker” is also used to refer to a similar “stud” form of the game, where the card play is simpler (no discarding), but the betting rounds are as in normal poker, resulting in a game that is much more complex to analyze.

(One final note: use of an actual 52-card deck complicates the analysis slightly, in a mostly non-interesting way.  I find it convenient to present the game using an “infinite deck,” where the probability distribution of card ranks remaining in the deck does not change as cards are dealt.)


Posted in Uncategorized | Leave a comment