This is motivated by a recent interesting post on the DataGenetics blog about various algorithms for randomly shuffling a list of elements, e.g. a deck of playing cards. There is a similar article from 2007 at Coding Horror, and I am sure there are others as well. The gist is that (1) a naive first attempt at implementing a shuffling algorithm is likely to be biased, yielding a non-uniform distribution of possible arrangements; but (2) fortunately, there is an equally simple algorithm that solves the problem efficiently and exactly, known as the Fisher-Yates shuffle.
This is well-trodden ground; my intent here is to expand on that basic story a bit, by digging into some of the mathematics relating these ideas about computer shuffling algorithms to the way we humans shuffle cards, with the result that some of these “broken” naive shuffling algorithms are potentially fixable.
First, how do humans shuffle cards? There are many different methods, but I will focus here on the riffle shuffle, where the deck is split into two halves, which are then interleaved together. The randomness arises from the imperfection in:
- The cut of the deck into two not necessarily equal “halves.” This can be modeled by cutting at a position selected randomly from a binomial distribution.
- The interleaving of the two halves, where “clumps” of cards from a single half stay together. This can be modeled by letting cards fall into a single pile alternately from the two halves, with the probability of the next card to fall being proportional to the current number of cards remaining in the two halves.
Of course, it is not sufficient to perform this shuffle just once. For one, there are only possible permutations resulting from a single riffle shuffle of a standard 52-card poker deck (why?). This is much less than the total possible arrangements. So all arrangements are not even possible with a single shuffle, let alone equally likely.
However, we can fix this problem by simple repetition– that is, just shuffle multiple times. With more and more shuffles, eventually all arrangements become possible, and with each additional shuffle, the probability distribution of arrangements more closely approximates a uniform distribution. Granted, it is an interesting question how many such shuffles are enough. But in practice it turns out that seven is usually “good enough,” and in principle we can keep shuffling to get as close as desired to a uniform distribution.
So why does this work? We started with a limited set of possible permutations corresponding to a single shuffle, and by repeatedly applying a randomly chosen permutation from that limited set, we were able to arbitrarily closely approximate a uniform distribution over all possible permutations. Under what conditions can we do this? My goal in this post is to answer this question.
On a computer, shuffling can be done more efficiently. The following Python code implements the Fisher-Yates shuffle of a list of elements, in-place, using a linear number of swaps, with each permutation being exactly equally likely:
import random def shuffle(deck): """Fisher-Yates shuffle.""" for i in range(len(deck) - 1, 0, -1): j = random.randrange(i + 1) deck[i], deck[j] = deck[j], deck[i]
(Note that I am ignoring the important issue of integrity of the underlying random number generator, state space size, etc. For this discussion, let us assume that we can generate random integers uniformly distributed in any given range.)
Compare with the following “naive” algorithm, which is simple to describe– for each position in the deck, swap the card in that position with a randomly chosen card:
import random def shuffle(deck): """Naive shuffling algorithm.""" for i in range(len(deck)): j = random.randrange(range(len(deck))) deck[i], deck[j] = deck[j], deck[i]
The problem with this algorithm, as discussed in the earlier linked blog posts, is that it is biased. That is, the resulting distribution of possible permutations is not uniform; some permutations end up being more likely than others, as shown in the following figure.
Each bar represents one of the 5!=120 possible permutations of a 5-card deck, arranged lexicographically from the identity permutation on the left to the “reverse” permutation on the right. The height of each bar is proportional to the corresponding probability. The grid line indicates what the probability of each permutation should be (1/120) when the deck is properly shuffled.
Condition 1: Reachability
Another nice visualization of this bias is a heat map, with each color in an n-by-n matrix indicating the probability that a card in position ends up in position . To properly shuffle the deck, this probability should be for all . The DataGenetics post (near the end, I can’t link directly to the image) as well as Mike Bostock both provide good examples of this.
However, although uniformity of card positions is necessary, it is not sufficient to properly shuffle the deck. To see why, consider the “shuffle” consisting of simply cutting the deck (and completing the cut) at a uniformly random position. Each card is equally likely to end up in any given position, so the corresponding heat map would not show any bias. But there are only possible cyclic permutations that result, and repetition doesn’t help: we can’t “reach” any additional permutations by repeated cuts.
This “reachability” is the first of two important criteria for a proper shuffling method: the set of permutations from which we can choose for a single shuffle should generate all possible permutations. That is, it must be possible by repeated shuffles to eventually realize every possible permutation of cards in the deck.
So just cutting the deck fails this condition. But going back now to the biased naive algorithm above, it does generate all possible permutations… and so it turns out that we can “fix” the bias by repeated applications of shuffle(deck), yielding closer and closer approximations of a perfectly uniform distribution of possible deck arrangements, as shown below:
Condition 2: Spanning cosets
We were able to “fix” the naive shuffling algorithm by applying it repeatedly… but we got slightly lucky in doing so, because this isn’t always possible. To see why, consider the following even simpler shuffle, consisting of swapping a single randomly selected pair of cards:
import random def shuffle(deck): """Swap a random pair of cards.""" i, j = random.sample(range(len(deck)), 2) deck[i], deck[j] = deck[j], deck[i]
This obviously requires repeated application, since a single “shuffle” only changes the positions of two cards. But this simple shuffle does satisfy the reachability condition: every permutation may be represented as a product of transpositions, so with enough random swaps, we can eventually realize every possible permutation. So at first glance, it certainly seems like this algorithm should work just fine… if rather slowly.
But there is a problem. Let’s again consider the small, manageable case of just cards, and look at the probability distribution over the possible permutations as it evolves with each additional call to shuffle(deck)— that is, with each additional random swap of a pair of cards:
At the start, with 0 swaps, all of the probability is concentrated at the identity permutation at the far left. With more and more swaps, the probability “spreads out” among the rest of the possible permutations.
Can you see the problem? To make it more vivid, let’s re-arrange the bars in a different order:
With each successive random swap, the probability distribution does become more and more uniform… but it alternates back and forth between two halves of the set of possible arrangements. The problem is that permutations have parity. Although a given permutation can be represented as a product of transpositions in many different ways, the parity (even-ness or odd-ness) of the number of such transpositions is invariant. Before we begin shuffling, all of the probability is concentrated at the identity permutation, which is even (since it involves zero transpositions). After the first call to shuffle(deck), the randomly chosen transposition yields an odd permutation (in the right half of the figure); after the second transposition, the resulting product is even; after the third, it jumps back to odd, etc.
This periodic behavior happens because all of the permutations in the generating set– that is, all single transpositions– are odd. We would have a similar problem if all of the permutations in the generating set were even, except that instead of jumping back and forth as shown above, the distribution would always be confined to the even half of the permutations.
More complex periodic behavior is possible as well. In general, the second condition needed for a proper shuffle is that the generating set of single shuffles must not all lie in some coset of some subgroup of the symmetry group of all possible permutations. In the random swap algorithm above, this condition is violated because the set of all single transpositions lies within the single coset of the alternating group.
To summarize, a biased or incomplete shuffle isn’t necessarily hopeless. It is possible to arbitrarily closely approximate a uniform distribution of permutations of elements of a list by repeated shuffling, as long as (1) every permutation is “reachable,” and (2) the generating set of permutations from a single shuffle do not all lie in the same coset of some subgroup.