The following figure shows the progression of COVID-19 over the last seven weeks or so, as measured by cumulative *confirmed cases*. I restricted attention to the eight countries currently having at least 2000 confirmed cases.

Although it’s interesting to try to interpret this view of past history, I think it’s difficult to use it to predict even the near future. Note how similar is the exponential growth (note the figure is on a logarithmic scale) for, well, pretty much everyone but China and South Korea, who appear to have taken the most drastic-but-apparently-successful measures to contain the virus. Comparing with Italy, which is now struggling with hospital capacity, we here in the United States appear to be on our way to very similar numbers of cases in a matter of 11 days or so, assuming recent growth continues.

Except that this is potentially misleading, for several reasons. On the pessimistic side, this figure only shows *confirmed positive tests*— the United States might *already* have (and in my opinion, almost certainly does have) many more people with the virus, given how little testing has been done so far.

On the other hand, the United States is a larger country than Italy, with roughly five times the population. The following figure attempts to account for this, showing the cumulative number of confirmed cases *per million in population* (population data obtained here).

Importantly, this has no effect on the “slope,” i.e., the exponential rate of growth of cases. It merely delays the same end result– this figure suggests that it might take two and a half weeks, instead of a week and a half… but we’re still headed where Italy is now.

I think an actual prediction of this sort is difficult to make confidently, though. Many interesting dials have been turned, even if only in the past few days. Human behavior has changed, with some significant steps taken on both large scales and small. Whether the eventual effects will be no more disastrous than waiting for the next truck to deliver more toilet paper, I’m not sure. The next two weeks or so will be interesting.

]]>What makes dice fair? Intuitively, when we roll a fair die with sides, we expect each of its possible outcomes to have the same probability of occurring. What shapes have this property?

I think this is an interesting problem, in part because it *seems* like there should be an elegant mathematical solution, but some unpleasantly complicated physics gets in the way. For example, even a standard six-sided die can be vulnerable to manipulation by a skilled cheat. The difficulty is that the fairness of a particular shape of die may depend on assumptions about *how* the die is rolled– what is the probability distribution from which we randomly draw the die’s initial position, velocity, and angular momentum? What surface does the die land on– can it slide and/or bounce, and if so, with what coefficients of friction and restitution, etc.?

(For this discussion, we will focus on dice that are convex polyhedra, having flat polygonal faces with no holes or indentations. There are other interesting possibilities, though. For example, consider flipping a cylinder as a “three-sided coin,” which may land on heads, tails, or on its curved “edge.”)

**Symmetry**

One way to get around this dependence on physics modeling assumptions is to appeal to *symmetry*: if there is *any* shape of die that could possibly be considered fair, then a sufficient (but perhaps not necessary) condition for fairness is *face-transitivity*— the group of symmetries of the die should act transitively on its faces. That is, given any pair of faces and , there must be a rigid transformation of the die into itself that maps to .

To see why this is the criteria that we want, suppose that Charlie is a skilled-but-myopic cheat with the ability to roll the die biased toward any face that he desires. However, he can only see the resting position and shape of the rolled die, not the labels on its faces. So, after the roll, but *before the outcome is announced*, an objective third party, Oscar, selects a face uniformly at random, and has an opportunity to secretly rotate the rolled die, *preserving its original location*, to show the randomly selected face , instead of the face that Charlie intended to roll.

Intuitively, if the die is fair, then Oscar should be able to prevent Charlie from having undue advantage, without Charlie being aware that the die was moved. No matter what face Charlie tries to roll, and no matter what truly random face Oscar selects, it should be possible to rotate one to the other without changing the “space taken up” by the die.

**Rotations vs. reflections**

A convex polyhedron that is face-transitive is called an isohedron. Wolfram’s *MathWorld* page (as well as a great Numberphile video with Persi Diaconis) describes 30 types of isohedra, noting that they “make fair dice” … but face-transitivity is a property that requires some qualification. These 30 types of dice all have the property that their *full* symmetry group acts transitively on their faces, where by “full” we mean that not just rotations but also reflections– think turning the die “inside out”– are allowed.

Since we can’t expect Oscar to secretly turn a physical die inside out, if we more realistically constrain the allowed transformations to include rotations only, then we can ask whether this *proper* symmetry *sub*group also acts transitively on the faces of a die. It turns out that there are six types of isohedra– including two infinite classes of dipyramids– where this is not the case, i.e., the *proper* symmetry group does *not* act transitively on the faces. Instead, these “less fair” dice each have two distinct orbits of faces, where it is impossible to rotate a face from one orbit into a face from another orbit while preserving the overall space taken up by the die.

The figure below shows all 30 isohedra, with the six less fair dice shown with their two orbits of faces in red and green.

**Implementation notes**

Models of these isohedra and Python code to compute their symmetry groups and face orbits are available on GitHub. I started with the models on the *MathWorld* page, but modified them to provide exact coordinates for all vertices, scaled to have unit minimum edge length. Computing the symmetries mapping one face to another is very similar to the telescope registration problem described here, although there are some interesting additional wrinkles due to working in limited floating-point precision.

**References**:

Years ago I wrote an article about solving the 2x2x2 Rubik’s cube, using an efficient representation of the scrambled states of the cube, and using VPython as a means of visualizing the cube and rotations of its faces. The primary motivation at the time was to describe the use of a linear-time algorithm for converting *permutations* of the cubies into consecutive integer array indices… but I ignored the arguably more complex details of representing the *orientations* of the cubies. The objective here is to describe that representation in more detail, with some additional code (available on GitHub) to help visualize what’s going on.

**Cubies and cubicles**

To begin, we establish notation for how to “hold” the cube and apply moves by rotating its faces. The figure below shows the solved cube, with its eight cubies labeled 0 through 7, and three rotation axes that we will use as shorthand to refer to each possible move.

Note that cubie zero is not visible in the back; we hold the cube by this cubie zero so that it remains fixed, and apply moves by rotating any of the three faces not involving cubie zero about the axes shown. For example, pressing **x** in the GUI applies move **x** to rotate the red face counterclockwise about the *x*-axis, so that– from the solved state– cubie 1 moves to where cubie 3 was, cubie 3 moves to where cubie 7 was, etc. (Press capital **X** to rotate the face clockwise; similarly for **y**, **Y**, **z**, and **Z**.)

Having labeled the *cubies*, which move around as we rotate faces of the cube, let’s also assign labels 0 through 7 to the corresponding *cubicles*, or the locations that remain fixed relative to the axes even as we permute the cubies “within” the cubicles. Specifically, for each *i* from 0 to 7, cubicle *i* is the location of cubie *i* in the solved state.

Since cubie zero never moves, we can represent an arbitrary cube state with a permutation , where is the label on the cubicle containing cubie , or equivalently, is the label on the cubie in cubicle . We can represent each of the six moves in the same way (by its action on the solved state), so that applying move to state yields the new cube state .

In the Python implementation, we represent a permutation as an array `p`

, with `p[i]`

= (arrays are indexed starting with zero), so that using Numpy’s array indexing operator, applying move `m`

to state `p`

yields the new cube state `p[m]`

.

**Tags on cubicles and cubies**

Now that we have a means of representing the permuted positions of the cubies, we need to handle their orientations as well. A particular cubie in a particular cubicle may be in any of three different orientations, differing by rotations of 120 degrees about the diagonal through the cubie’s “outer” corner. We need a way to represent the orientations of all cubies in a given cube state, as well as a way to transform this representation corresponding to each possible move.

In preparation for doing this, let’s begin again with the cube in the solved state, and for each *cubicle*, we select exactly one of its three faces and mark it with a *cubicle tag*. Having done so, we subsequently mark each *cubie* as well with a *cubie tag* on exactly one face… namely, the same face as the corresponding cubicle tag.

Recall that the cubicles– and thus the cubicle tags– remain fixed in space and do not move, but the cubie tags “follow” their corresponding cubies as we apply moves that rotate the cubies from one cubicle to another.

We have some freedom here: this selection of a face per cubicle to tag is arbitrary, and any of the possible such selections will work. The figure below shows the convention used in the Python implementation, with the cubicle tags applied to the opposite orange and red faces orthogonal to the (red) *x*-axis:

In `rubik2_gui.py`

, set `draw_axes`

and `draw_labels`

to `True`

to experiment with this. The cubicle tags are in black, and remain fixed relative to the rotation axes. The cubie tags are in gray, and move with the cubies to which they are attached.

**Orientation of cubies**

We are now ready to encode the orientations of the cubies. For a given cube state, let’s consider a single cubie: that cubie has a cubie tag, and the cubicle in which it currently resides has a cubicle tag. We encode the orientation of the cubie as an integer 0, 1, or 2, indicating the number of 120-degree clockwise rotations of the cubie needed to align the cubie tag with the cubicle tag.

For example, in the figure above showing a particular scrambled cube state, the cubie with cubie tag 2 (in gray, on the orange face) in cubicle 7 has orientation 2; cubie 6 in cubicle 3 has orientation 0, since both of its tags are on the same orange face; and cubie 4 in cubicle 6 has orientation 1 (since the black cubicle tag must be on the face that we can’t see). Also, note that since cubie zero (not shown) never moves, its orientation is always 0.

We now have everything we need to encode an arbitrary cube state as an ordered pair , where encodes the permuted positions of the cubies as described earlier, and is a vector of integers encoding the orientations, with indicating the orientation of the cubie in cubicle .

**Applying moves**

But even better, this encoding not only makes it easy to represent a cube *state*, it also makes it easy to apply cube *moves*, i.e., face rotations. Given an arbitrary cube state , and one of the six moves represented by the state that results from applying the move to the *solved* state, we can show that the result of applying move to state yields the new state , where the group action permutes the coordinates of (with the same Numpy array indexing implementation described earlier)… and the vector addition is simply element-wise mod 3.

To convince ourselves that the modular addition of cubie orientations works, first note that in a face rotation, if a cubie’s *position* does not change, then neither does its orientation, and so adding zero to the orientation encoding has no effect as desired. For a cubie that *does* move, it suffices to focus on a single rotation– say a counterclockwise quarter turn about the *x*-axis– and a single “source” and “destination” cubicle, say from cubicle 3 to cubicle 7. (To see this, note that clockwise and half turns can be expressed as compositions of counterclockwise turns, and for any other rotation axis and source/destination cubicles, we can rotate the entire cube to match the “cubie in cubicle 3 rotated by **x** to cubicle 7″ geometry.)

Then there are effectively 3×3=9 cases to consider, one for each possible selection of faces where we could have placed cubicle tags on the source (3 choices) and destination (3 choices) cubicles. For each such pair of choices, we can verify– by brute force enumeration if needed– that the counterclockwise arrangement of the (0,1,2) possible orientation codes for a *cubie* in the source cubicle is preserved as it is rotated into the destination cubicle.

**The Fundamental Theorem of Cubology**

A couple of final notes: first, the above description of the representation of a cube state as an ordered pair suggests that there are possible cube states. This isn’t quite true; we have overcounted by a factor of 3, due to the following invariant that is part of what is commonly referred to as the “fundamental theorem of cubology:” for any valid cube state, the sum of the integers in the orientation encoding is congruent to zero mod 3. (This can be verified for a particular encoding by first noting that the solved state has all entries of equal to zero, and that for each possible move the sum of entries in is equal to zero.) Thus, there are only distinct possible cube states, where in implementation we can, for example, discard the last entry from our orientation encoding since its value is determined by the other six.

Second, the ideas presented here are also applicable to the original 3x3x3 Rubik’s cube. A cube state represents the 8 corner cubies with a permutation in and an orientation vector in , and the 12 edge cubies with a permutation in and an orientation vector in , with similar parity constraints on each.

]]>The dice game craps is played by repeatedly rolling two six-sided dice, and making various wagers on the outcomes of the sequence of rolls. The details of the wagers don’t concern us here– instead, let’s consider just one particular example scenario involving a common wager called the “pass line bet”:

Suppose that we have just rolled a 4 (which, by the way, occurs with probability 3/36 on any single roll). Having thus established 4 as “the point,” our objective now is to roll a 4 *again*, rolling repeatedly if necessary… but if we ever roll a 7 (which, by the way, occurs with probability 6/36 on any single roll), then we lose. If and once we roll a 4, we win.

To summarize: we are trying to roll a 4 (with probability 3/36). If we roll anything else except a 7 (where “rolling anything else except 7” has probability 27/36), we continue rolling.

So, here is a puzzle, let’s call it **Problem 1:** how long does it take on average to *win*? More precisely, what is the expected length of a *winning* sequence of rolls, i.e., where we never roll a 7?

**Thoughts**

This problem is not new, nor is it even particularly sophisticated. But it is very similar to a problem that circulated a few years ago, and that generated some interesting discussion. Here is that earlier problem, let’s call it **Problem Zero:**

Roll a single fair six-sided die repeatedly until you roll a 6. What is the expected number of rolls required, given that we observe that all rolls are *even*?

My motivation for this post is two-fold. First, this is a sort of pedagogical thought experiment. Problem Zero has already been shown in the wild to be dangerously non-intuitive. Problem 1 is *the same problem*— that is, it is essentially equivalent to Problem Zero, just dressed up with different values for the single-trial probabilities. But is Problem 1 inherently easier, less “tricky?” And if so, why? Is it because the numbers are different, or is it that the problem is cast in a more concrete setting as a casino game, etc.?

I don’t know if these problems are actually equally “tricky” or not. But at least in the case of the known trickiness of Problem Zero, I have a theory. Both of these problems are like many others that I have enjoyed discussing here in the past, in that they are *mathematical* problems… but of a sort that a student may approach not just with pencil and paper, but also with a computer, even if only as an initial exploratory tool.

Which brings me to my theory, based on observation of past debate of Problem Zero. We can begin tackling this problem with pencil and paper, or by writing a simulation… and I suspect that, in this case, starting with a simulation makes it much *harder* to come up with a (the?) *wrong* answer.

Riffle shuffle a deck of cards once. What is the probability that the original *top* card ends up on the *bottom* of the shuffled deck (or vice versa)? This is very unlikely… but suppose that we shuffle again… and again, etc. How many shuffles on average are required until we first see the original top card moved to the bottom of the deck? The motivation for this post is to capture my notes on this problem, which turns out to have a very nice solution.

**Approach**

To begin, we use the Gilbert-Shannon-Reeds (GSR) model of a riffle shuffle, that captures– in a realistic but mathematically tractable way– the random imperfections in how most non-expert humans shuffle cards: by cutting the deck roughly in half, then interleaving the cards back together from the two halves, with some clumping.

There are several different characterizations of the GSR model, all yielding the same probability distribution of possible permutations of the cards in the deck. For our purpose, the most convenient way to model a single random GSR shuffle of a deck with cards is as a uniformly random string of bits (imagine flipping a fair coin times). The *number* of zero bits indicates how many cards to cut from the top of the deck, and the *positions* of the zero bits indicate how those top cards are interleaved with the cards from the bottom part of the deck (represented by the one bits). The following Python code illustrates how this works:

import numpy as np def riffle_shuffle(cards, rng=np.random.default_rng()): bits = rng.integers(0, 2, len(cards)) return [cards[k] for k in np.argsort(np.argsort(bits, kind='stable'))] print(riffle_shuffle(range(52)))

This representation of a riffle shuffle as a bit string makes it easy to calculate probabilities of various types of shuffles. For example, of all possible equally likely GSR shuffles, how many move the card from position down to position , where ? This number is equal to the number of bit strings with at least zeros (since the card we want to move *down* must be in the *top* half of the cut), and exactly ones before the -th zero. That is,

It’s an exercise for the reader to show that we can extend this approach to also count shuffles that move a card *up* in the deck (), or that *fix* a card () so that it remains in its original position, with the result being a transition matrix given by

where indicates the probability that a single GSR shuffle will move the card in position to position .

At this point, we have what we need: consider a Markov chain with states corresponding to the current position of a particular distinguished card in the deck. If this distinguished card starts in position, say, (i.e., the top card of the deck), then our initial state distribution is the -th basis (row) vector, and we can track the distribution of possible future positions of the card after shuffles as .

But we originally asked for the *average* number of shuffles needed to move the card initially at location to location . To calculate this, let be the matrix obtained from by deleting row and column , and compute the fundamental matrix . Then the expected number of shuffles until the target card starting in position first reaches position is the sum of the values in the -th row of (or the -st row if ).

**Solution**

The answer is interesting and perhaps surprising. First: it typically takes a *long* time for the top card to reach the bottom of the deck, over 100 shuffles on average. Second: this average is actually *exactly* 104 shuffles! More generally, given a deck with cards, the expected number of GSR shuffles to move the top card to the bottom of the deck appears to be exactly . This suggests once again that there may be a nice intuitive argument for this result that I don’t yet see.

Finally, and perhaps most surprising: it takes about that same hundred-or-so shuffles on average to move *any* card to the bottom of the deck!

In the above figure, the *x*-axis indicates the starting position of the card that we want to track. The *y*-axis indicates the expected number of shuffles needed to move that card to the bottom of the deck. For example, even the next-to-bottom card of the deck takes about 102.6 shuffles on average to reach the bottom of the deck.

There are at least a couple of different possible problems here, depending on what constitutes a matching pair of socks. Arguably the most natural setup is that all pairs are distinct (e.g., each pair of my dress socks is a different color), so that each individual sock has exactly one mate. This is what has been described as *the* sock matching problem in the literature; see the references below.

My athletic socks, on the other hand, are essentially identical pairs, with each individual sock being distinguished only by a “left” or “right” label stitched into it, so that each sock may be matched with any of the other “differently-handed” socks. In this case, it’s a nice problem to show that

and thus

But what I found most interesting about this problem is that appears to be very well approximated by , with an error that I conjecture is always less than 1/2, and approaches zero in the limit as grows large. I don’t see how to prove this, though.

**References:**

Whether being on a streak yourself, or trying to defend a player on a streak, on the court this certainly *feels* like a real phenomenon. But is the hot hand a real effect, or just another example of our human tendency to see patterns in randomness?

A famous 1985 paper (Gilovich, Vallone, and Tversky, reference below) argued the latter, analyzing the proportion of successful shots immediately following a streak of three made shots in various settings (NBA field goals, free throws, and a controlled experiment with college players). Not finding any significant increase in proportion of “streak-extending” shots made, the apparent conclusion would be that a past streak has no effect on current success.

But that’s where this puzzle comes in: even if basketball shots are truly *iid* with success probability , we should *expect* a negative bias in the proportion of shots made following a streak, at least compared to the intuitively expected proportion . Miller and Sanjurjo argue that the *absence* of this bias in the 1985 data suggests that the hot hand is *not* just “a cognitive illusion.”

Both papers are interesting reads. In presenting the problem here as a gambling wager, I simplified things somewhat down to a “win, lose, or push” outcome (i.e., were there more streak-extending successes than failures, or fewer), since the resulting exact expected return can be computed more efficiently than the expected *proportion* of successes following a streak:

Given remaining trials (basketball shots, coin flips, whatever) with success probability , noting outcomes following streaks of length , and winning (losing) the overall wager if the number of streak-extending successes is greater (less) than the number of streak-ending failures, the expected return is , computed recursively via

where, using the setup in the previous post where we flip fair coins with probability of heads, looking for heads extending streaks of length , the expected return is about -0.150825.

**References:**

- Gilovich, T., Vallone, R., and Tversky, A., The Hot Hand in Basketball: On the Misperception of Random Sequences,
*Cognitive Psychology*,**17**(3) July 1985, p. 295-314 [PDF] - Miller, Joshua B. and Sanjurjo, Adam, Surprised by the Hot Hand Fallacy? A Truth in the Law of Small Numbers,
*Econometrica*,**86**(6) November 2018, p. 2019-2047 [arXiv]

“If after tossing four heads in a row, the next coin toss also came up heads, it would complete a run of five successive heads. Since the probability of a run of five successive heads is [only] 1/32, a person might believe that the next flip would be more likely to come up tails rather than heads again. This is incorrect and is an example of the gambler’s fallacy.”

That is, having observed a streak of four heads in a row, we are actually just as likely to observe heads *again* on the subsequent fifth flip as we are to observe tails. Similarly, even after betting on red at the roulette wheel and losing four times in a row, we should still expect to win a fifth such bet on red the same stubborn 18/38 of the time (assuming a typical double-zero American wheel).

Right?

So, here is what I think is an interesting puzzle: let’s play a game. I will flip a fair coin times, and prior to each flip, if you have observed a current streak of or more consecutive heads, then make a note of the outcome of the subsequent flip. After all 100 coin flips, tally the noted “streak-following” flips: if there are more heads than tails, then I pay you one dollar. If there are more tails than heads, then you pay me one dollar. (If there are an equal number of heads and tails, then we push.)

If the gambler’s fallacy is indeed a fallacy, then shouldn’t this be a fair bet, i.e., with net zero expected return? But I claim that I have a significant advantage in this game, taking more than 15 cents from you on average every time we play! Following a streak of heads, we expect to observe a much larger proportion of “trend-correcting” tails than “streak-extending” heads.

And there is nothing special or tricky about this particular setup. Try this experiment with a different number of coin flips, or a longer or shorter “target” streak length , or even a roulette-like bias on the coin flip probability. Or instead of focusing only on streaks of consecutive *heads* (i.e., ignoring streaks of tails), look for streaks of *either* or more heads or or more tails, and note whether the subsequent flip is *different*. The effect persists: on average, we observe a larger-than-expected proportion of outcomes that *tend to end the streak*.

We often need “random” numbers to simulate effects that are *practically non-deterministic*, such as measurement noise, reaction times of human operators, etc. However, we also need to be able to reproduce experimental results– for example, if we run hundreds of Monte Carlo iterations of a simulation, and something weird happens in iteration #137, whether a bug or an interesting new behavior, we need to be able to go back and reproduce that specific weirdness, exactly and repeatably.

Pseudo-random number generators are designed to meet these two seemingly conflicting requirements: generate one or more sequences of numbers having the *statistically representative appearance* of randomness, but with a mechanism for exactly reproducing any particular such sequence repeatably.

The motivation for this post is to describe the behavior of pseudo-random number generation in several common modeling and simulation environments: Python, MATLAB, and C++. In particular, given a sequence of random numbers generated in *one* of these environments, how difficult is it to reproduce that same sequence in *another* environment?

**Mersenne Twister**

In many programming languages, including Python, MATLAB, and C++, the default, available-out-of-the-box random number generator is the Mersenne Twister. Unfortunately, despite its ubiquity, this generator is not always implemented in exactly the same way, which can make reproducing simulated randomness across languages tricky. Let’s look at each of these three languages in turn.

(As an aside, it’s worth noting that the Mersenne Twister has an interesting reputation. Although it’s certainly not optimal, it’s also not fundamentally broken, but lately it seems to be fashionable to sniff at it like it is, in favor of more recently developed alternatives. Those alternatives are sometimes justified by qualitative arguments or even quantitative metrics that are, in my opinion, often of little *practical* consequence in many actual applications. But that is another much longer post.)

**Python**

Python is arguably the easiest environment in which to reproduce the behavior of the original reference C implementation of the Mersenne Twister. Python’s built-in `random.random()`

, as well as Numpy’s legacy `numpy.random.random()`

, both use the same reference `genrand_res53`

method for generating a double-precision random number uniformly distributed in the interval [0,1):

- Draw two 32-bit unsigned integer words .
- Concatenate the high 27 bits of with the high 26 bits of .
- Divide the resulting 53-bit integer by , yielding a value in the range .

So given the same generator state, both the built-in and Numpy implementations yield the same sequence of double-precision values as the reference implementation.

Seeding is slightly messier, though. The reference C implementation provides two methods for seeding: `init_genrand`

takes a single 32-bit word as a seed, while `init_by_array`

takes an array of words of arbitrary length. Numpy’s `numpy.random.seed(s)`

behaves like `init_genrand`

when the provided seed is a single 32-bit word. However, the built-in `random.seed(s)`

*always* uses the more general `init_by_array`

, even when the provided seed is just a single word… which yields a different generator state than the corresponding call to `init_genrand`

.

**MATLAB**

Although MATLAB provides several different generators, the Mersenne Twister has been the default since 2005. Its seeding method `rng(s)`

is “almost” exactly like the reference `init_genrand(s)`

— and thus like `numpy.random.seed(s)`

— accepting a single 32-bit word as a seed… except that `s=0`

is special, being shorthand for the “default” seed value `s=5489`

.

The `rand()`

function is also “almost” identical to the reference `genrand_res53`

described above… except that it returns random values in the *open* interval (0,1) instead of the half-open interval [0,1). There is an explicit rejection-and-resample if zero is generated. It’s an interesting question whether this has ever been observed in the wild: it’s confirmed *not* to occur anywhere in the first random draws from any of the sequences indexed by the possible single-word seeds.

**C++**

Both Visual Studio and GCC implementations of the Mersenne Twister in `std::mt19937`

(which is also the `std::default_random_engine`

in both cases) allow for the same single-word seeding as the reference `init_genrand`

(and `numpy.random.seed`

, and MATLAB’s `rng`

excepting the special zero seed behavior mentioned above).

However, the C++ approach to generating random values from `std::uniform_real_distribution<double>`

is different from the reference implementation used by Python and MATLAB:

- Draw two 32-bit unsigned integer words .
- Concatenate all 32 bits of with all 32 bits of .
- Divide the resulting 64-bit integer by .

By using all 64 bits, this has the effect of allowing generation of more than just equally-separated possible values, retaining more bits of precision for values in the interval (0,1/2) where some leading bits are zero.

Another important difference here from the reference implementation is that the two random words are reversed: the *first* 32-bit word forms the *least* significant 32 bits of the fraction, and the second word forms the most significant bits. I am not sure if there is any *theoretically* justified reason for this difference. One possible *practical* justification, though, might be to prevent confusion when comparing with the reference implementation: even for the same seed, these random double-precision values “look” nothing at all like those from the reference implementation. Without the reversal, this sequence would be within the eyeball norm of the reference, matching in the most significant 27 or more bits, but not yielding *exactly* the same sequence.

Some role-playing games involve a “roll and keep” dice mechanic, where you roll some number of dice, but only “keep” a specified number of them with the largest values, where the result of the roll is the sum of the kept dice. For example, rolling five dice and keeping the largest three (sometimes denoted 5d6k3) could yield a score between 3 and 18, inclusive. What is the probability of each possible score?

The motivation for this post is (1) to capture my notes on the general solution to this problem, and (2) to describe some potential optimizations in implementation to handle very large instances of the problem, such as this Hacker Rank version of the problem, where we may be rolling as many as 10,000 dice.

**The solution**

To start, let’s simplify the problem by just rolling and keeping all of the dice without discarding any, since the order statistics are what make this problem complicated. Given dice each with sides, the number of equally likely ways to roll a score is given by

so that the probability of rolling a score is .

If we now consider keeping only of the dice, then we can group the desired outcomes with score by:

- the smallest value among the largest dice that we keep,
- the number of kept dice that are strictly greater than , and
- the number of discarded dice that are strictly less than .

This yields the following summation:

**Implementation details**

At this point, we’re done… except that if the number of dice rolled is large, then a straightforward implementation of this nested summation will be pretty slow, involving calculation of a large number of *very* large multinomial coefficients.

My first thought was to speed up calculation of those very large coefficients via their prime factorization using Legendre’s formula. The paper by Goetgheluck linked at the end of this post describes a more optimized approach, but the following simple Python implementation is already significantly faster than the usual iterative algorithm for sufficiently large inputs (the implementation of `primes(n)`

is left as an exercise for the reader, or a future post):

def binomial(n, k): """Large binomial coefficient n choose k.""" if k < 0 or k > n: return 0 c = 1 for p in primes(n): c = c * p ** (v_fact(n, p) - v_fact(k, p) - v_fact(n - k, p)) return c def v_fact(n, p): """Largest power of prime p dividing n factorial.""" v = 0 while n > 0: n = n // p v = v + n return v

(It’s worth noting that the Hacker Rank problem actually only asks for the number of ways to roll a given total *modulo a prime*, suggesting that a similar but even more effective optimization using Lucas’s theorem is probably intended.)

But we can do much better than this, by observing that in the “usual” iterative algorithm, all of the intermediate products inside the loop are also binomial coefficients… and we need all of them for this problem. That is, we can rewrite the summation as

and then “embed” the usual iterative calculation of the binomial coefficients inside the nested summations. The following implementation eliminates all but a single explicit calculation of a binomial coefficient:

def rolls_keep(s, n, d, m): """Ways to roll sum s with n d-sided dice keeping m largest.""" result = 0 for a in range(1, d + 1): choose_b = 1 for b in range(0, m): sum_c = 0 choose_c = 1 pow_c = 1 for c in range(0, n - m + 1): sum_c = sum_c + choose_c * pow_c choose_c = choose_c * (n - b - c) // (c + 1) pow_c = pow_c * (a - 1) result = result + choose_b * rolls(s - a * m, b, d - a) * sum_c choose_b = choose_b * (n - b) // (b + 1) return result def rolls(s, n, d): """Ways to roll sum s with n d-sided dice.""" if s == 0 and n == 0: return 1 if d == 0: return 0 result = 0 choose_k = 1 for k in range(0, (s - n) // d + 1): result = result + (-1) ** k * choose_k * binomial(s - k * d - 1, n - 1) choose_k = choose_k * (n - k) // (k + 1) return result

**Reference:**

- Goetgheluck, P., Computing Binomial Coefficients,
*The American Mathematical Monthly*,**94**(4) April 1987, p. 360-365 [JSTOR]