**Introduction**

This is motivated by a recent interesting post on the DataGenetics blog about various algorithms for randomly shuffling a list of elements, e.g. a deck of playing cards. There is a similar article from 2007 at Coding Horror, and I am sure there are others as well. The gist is that (1) a naive first attempt at implementing a shuffling algorithm is likely to be biased, yielding a non-uniform distribution of possible arrangements; but (2) fortunately, there is an equally simple algorithm that solves the problem efficiently and exactly, known as the Fisher-Yates shuffle.

This is well-trodden ground; my intent here is to expand on that basic story a bit, by digging into some of the mathematics relating these ideas about *computer* shuffling algorithms to the way we *humans* shuffle cards, with the result that some of these “broken” naive shuffling algorithms are potentially fixable.

**Repeated shuffling**

First, how do humans shuffle cards? There are many different methods, but I will focus here on the riffle shuffle, where the deck is split into two halves, which are then interleaved together. The randomness arises from the imperfection in:

- The cut of the deck into two not necessarily equal “halves.” This can be modeled by cutting at a position selected randomly from a binomial distribution.
- The interleaving of the two halves, where “clumps” of cards from a single half stay together. This can be modeled by letting cards fall into a single pile alternately from the two halves, with the probability of the next card to fall being proportional to the current number of cards remaining in the two halves.

Of course, it is not sufficient to perform this shuffle just once. For one, there are only possible permutations resulting from a single riffle shuffle of a standard 52-card poker deck (why?). This is much less than the total possible arrangements. So all arrangements are not even *possible* with a single shuffle, let alone equally likely.

However, we can fix this problem by simple repetition– that is, just shuffle multiple times. With more and more shuffles, eventually all arrangements become *possible*, and with each additional shuffle, the probability distribution of arrangements more closely approximates a uniform distribution. Granted, it is an interesting question how many such shuffles are enough. But in practice it turns out that seven is usually “good enough,” and in principle we can keep shuffling to get as close as desired to a uniform distribution.

So *why* does this work? We started with a limited set of possible permutations corresponding to a single shuffle, and by repeatedly applying a randomly chosen permutation from that limited set, we were able to arbitrarily closely approximate a uniform distribution over all possible permutations. Under what conditions can we do this? My goal in this post is to answer this question.

**Computer algorithms**

On a computer, shuffling can be done more efficiently. The following Python code implements the Fisher-Yates shuffle of a list of elements, in-place, using a linear number of swaps, with each permutation being *exactly* equally likely:

import random def shuffle(deck): """Fisher-Yates shuffle.""" for i in range(len(deck) - 1, 0, -1): j = random.randrange(i + 1) deck[i], deck[j] = deck[j], deck[i]

(Note that I am ignoring the important issue of integrity of the underlying random number generator, state space size, etc. For this discussion, let us assume that we can generate random integers uniformly distributed in any given range.)

Compare with the following “naive” algorithm, which is simple to describe– for each position in the deck, swap the card in that position with a randomly chosen card:

import random def shuffle(deck): """Naive shuffling algorithm.""" for i in range(len(deck)): j = random.randrange(range(len(deck))) deck[i], deck[j] = deck[j], deck[i]

The problem with this algorithm, as discussed in the earlier linked blog posts, is that it is biased. That is, the resulting distribution of possible permutations is not uniform; some permutations end up being more likely than others, as shown in the following figure.

Each bar represents one of the 5!=120 possible permutations of a 5-card deck, arranged lexicographically from the identity permutation on the left to the “reverse” permutation on the right. The height of each bar is proportional to the corresponding probability. The grid line indicates what the probability of each permutation *should* be (1/120) when the deck is properly shuffled.

**Condition 1: Reachability**

Another nice visualization of this bias is a heat map, with each color in an *n*-by-*n* matrix indicating the probability that a card in position ends up in position . To properly shuffle the deck, this probability should be for all . The DataGenetics post (near the end, I can’t link directly to the image) as well as Mike Bostock both provide good examples of this.

However, although uniformity of card positions is *necessary*, it is not *sufficient* to properly shuffle the deck. To see why, consider the “shuffle” consisting of simply cutting the deck (and completing the cut) at a uniformly random position. Each card is equally likely to end up in any given position, so the corresponding heat map would not show any bias. But there are only possible cyclic permutations that result, and repetition doesn’t help: we can’t “reach” any additional permutations by repeated cuts.

This “reachability” is the first of two important criteria for a proper shuffling method: the set of permutations from which we can choose for a single shuffle should generate all possible permutations. That is, it must be possible by *repeated* shuffles to eventually realize every possible permutation of cards in the deck.

So just cutting the deck fails this condition. But going back now to the biased naive algorithm above, it *does* generate all possible permutations… and so it turns out that we can “fix” the bias by repeated applications of *shuffle(deck)*, yielding closer and closer approximations of a perfectly uniform distribution of possible deck arrangements, as shown below:

**Condition 2: Spanning cosets**

We were able to “fix” the naive shuffling algorithm by applying it repeatedly… but we got slightly lucky in doing so, because this isn’t always possible. To see why, consider the following even simpler shuffle, consisting of swapping a single randomly selected pair of cards:

import random def shuffle(deck): """Swap a random pair of cards.""" i, j = random.sample(range(len(deck)), 2) deck[i], deck[j] = deck[j], deck[i]

This obviously requires repeated application, since a single “shuffle” only changes the positions of two cards. But this simple shuffle *does* satisfy the reachability condition: every permutation may be represented as a product of transpositions, so with enough random swaps, we can eventually realize every possible permutation. So at first glance, it certainly *seems* like this algorithm should work just fine… if rather slowly.

But there is a problem. Let’s again consider the small, manageable case of just cards, and look at the probability distribution over the possible permutations as it evolves with each additional call to *shuffle(deck)*— that is, with each additional random swap of a pair of cards:

At the start, with 0 swaps, all of the probability is concentrated at the identity permutation at the far left. With more and more swaps, the probability “spreads out” among the rest of the possible permutations.

Can you see the problem? To make it more vivid, let’s re-arrange the bars in a different order:

With each successive random swap, the probability distribution *does* become more and more uniform… but it alternates back and forth between two halves of the set of possible arrangements. The problem is that permutations have parity. Although a given permutation can be represented as a product of transpositions in many different ways, the *parity* (even-ness or odd-ness) of the *number* of such transpositions is invariant. Before we begin shuffling, all of the probability is concentrated at the identity permutation, which is even (since it involves zero transpositions). After the first call to *shuffle(deck)*, the randomly chosen transposition yields an odd permutation (in the right half of the figure); after the second transposition, the resulting product is even; after the third, it jumps back to odd, etc.

This periodic behavior happens because all of the permutations in the generating set– that is, all single transpositions– are odd. We would have a similar problem if all of the permutations in the generating set were *even*, except that instead of jumping back and forth as shown above, the distribution would *always* be confined to the even half of the permutations.

More complex periodic behavior is possible as well. In general, the second condition needed for a proper shuffle is that the generating set of single shuffles must not all lie in some coset of some subgroup of the symmetry group of all possible permutations. In the random swap algorithm above, this condition is violated because the set of all single transpositions lies within the single coset of the alternating group.

To summarize, a biased or incomplete shuffle isn’t necessarily hopeless. It is possible to arbitrarily closely approximate a uniform distribution of permutations of elements of a list by *repeated* shuffling, as long as (1) every permutation is “reachable,” and (2) the generating set of permutations from a single shuffle do not all lie in the same coset of some subgroup.

Those animations were quite helpful in following along, especially the last one.

Thanks! I wasn’t sure whether this was the best way to show what’s happening, particularly after reading this interesting post. I think I’m not quite so against animation as that, though, and in this case I think it works because the *transitions* between frames are arguably more important than the content in any *single* frame.

“With more and more shuffles, eventually all arrangements become possible,…”

Actually, with a riffle shuffle, the cards at the top of the deck tend to stay near the top, and the cards at the bottom tend to stay at the bottom.

Picture the card at the top of the deck, there is roughly a 50/50 chance it will still be the top card after each shuffle. Even if it drops to second or third, it is still near the top. Not very random.

A better method involves shuffles alternating with cuts that are significantly away from the half-way point.

You’re right that a *single* riffle shuffle is “not very random,” in that it does not produce a uniform distribution on the set of all 52! possible arrangements. But that’s sort of the whole point of the post: *repeating* a given shuffle, even one chosen from a relatively small generating set, will do the trick.

And the interleaving of cuts actually does very little to help. That is, if instead of just a riffle, we consider one shuffle to be a riffle+cut, then the behavior of the total variation distance as a function of the *number* of such shuffles isn’t significantly different. See here and here for more details, including a card trick whose success is unaffected by a bunch of additional cuts in between riffle shuffles.

Pingback: Proofs and theorems without words | Possibly Wrong

Pingback: Coupling from the past | Possibly Wrong

Pingback: Seven riffle shuffles is not enough– except when it is | Possibly Wrong

Pingback: The Fisher-Yates shuffle is backward | Possibly Wrong