February | 2012 | Possibly Wrong

As an undergraduate in computer science, and even in graduate school, I learned many elegant algorithms for solving common problems. The kind of algorithms that you might think every programmer should know.

But a few weeks ago, I was reminded of just how relatively empty my toolbox is, as I learned about an algorithm that solves such a common problem, with such a simple-yet-brilliant insight, and with such unexpected efficiency, that made me think, “How have I never heard of this before?”

The problem is simple to state: given a uniform random number generator and a discrete probability distribution with a finite set of $n$ possible outcomes, find an efficient method for sampling from that distribution. An example of this situation has been discussed here before, involving simulation of a lottery with multiple payoffs. As in that example, the first approach that comes to mind is to simply pre-compute the cumulative distribution (using indices of the outcomes) and do a linear search to find the interval containing the sampled uniform random value.

This method requires $O(n)$ time to execute, which may not sound like a big deal, particularly since each step is just an array lookup and arithmetic comparison. But it turns out that there is an algorithm called the Alias Method that can do the job in $O(1)$ time!

There is already an excellent write-up by Keith Schwartz that describes the problem and various solutions in great detail, leading up to the Alias Method and the cool geometric insight into how the algorithm works. (As a brief teaser, instead of selecting a random point on an interval of the real line, extend to two dimensions and imagine throwing a dart at a rectangular dart board. The challenge is to associate every point in the rectangle with an outcome.)

Edit: Motivated by discussion in the comments, following is a MATLAB script implementing the alias method, including a speed comparison with the basic “roulette wheel” inverse CDF method. Two different situations are evaluated: (1) generating many random samples at once, where MATLAB’s vectorization of built-in functions makes the inverse CDF method faster until about $n = 10^4$ ; and (2) generating random samples one at a time, where the alias method wins by orders of magnitude even for small $n$ .

% Create test probability distribution.
p = rand(1, 100);
p = p / sum(p);

% Initialize roulette wheel method: precompute {c}.
c = min([0, cumsum(p)], 1);

% Initialize Vose alias method: pre-compute {n, prob, alias}.
n = numel(p);
prob = nan(1,n);
alias = nan(1,n);
p = p * (n / sum(p));
small = nan(1,n);
large = nan(1,n);
ns = 0;
nl = 0;
for j = 1:n
    if p(j) > 1
        nl = nl + 1;
        large(nl) = j;
    else
        ns = ns + 1;
        small(ns) = j;
    end
end
while ns ~= 0 && nl ~= 0
    j = small(ns);
    ns = ns - 1;
    k = large(nl);
    nl = nl - 1;
    prob(j) = p(j);
    alias(j) = k;
    p(k) = p(k) + p(j) - 1;
    if p(k) > 1
        nl = nl + 1;
        large(nl) = k;
    else
        ns = ns + 1;
        small(ns) = k;
    end
end
while ns ~= 0
    prob(small(ns)) = 1;
    ns = ns - 1;
end
while nl ~= 0
    prob(large(nl)) = 1;
    nl = nl - 1;
end

% Measure vectorized execution time of both methods.
samples = 1e6;

disp('Vectorized roulette wheel method:');
tic;
[~, r] = histc(rand(1,samples), c);
toc

disp('Vectorized Vose alias method:');
tic;
u = n * rand(1,samples);
j = floor(u);
r = alias(j + 1);
idx = (u - j <= prob(j + 1));
r(idx) = j(idx) + 1;
toc

% Measure single-sample execution time of both methods.
samples = 1e6;

disp('Roulette wheel method:');
tic;
for i = 1:samples
    [~, r] = histc(rand(), c);
end
toc

disp('Vose alias method:');
tic;
for i = 1:samples
    u = n * rand();
    j = floor(u);
    if u - j <= prob(j + 1)
        r = j + 1;
    else
        r = alias(j + 1);
    end
end
toc

One final interesting note that came out of investigating this algorithm: Wolfram’s Mathematica, a tool known for its ability to perform not just extended precision but arbitrary precision calculations, has a rather strange bug when confined to “simple” double precision. One of the steps in the alias method and many other algorithms involves generating a uniform random integer in the set $\{0, 1, ..., n-1\}$ . A common approach is to compute $\lfloor u n\rfloor$ , where $u$ is uniformly distributed in the half-open interval $[0,1)$ . When doing this with the limited precision of IEEE-754 floating point arithmetic, it is natural to ask whether there is a danger of the resulting product yielding $n$ instead of $n-1$ for a value of $u$ sufficiently close to 1?

Since the IEEE-754 standard guarantees that elementary arithmetic operations must yield a best possible result, it is not difficult to show that there should be no worries for all values of $n \leq 2^{53}$ , the largest of the consecutive representable integers in double precision.

However, Mathematica fails to get this right for many values of $n$ . The smallest integer where the error occurs is 2049:

N[2049] * N[1 - 2^-53] // InputForm

The second term in the product is the largest double precision value less than 1. Evaluating this expression yields 2049, not 2048.9999999999995 as expected (at least on the Windows versions of Mathematica 7 and 8). It is not clear to me what is happening here, since this should presumably be a hardware operation, and is handled correctly in C++, Java, Python, etc. Anyway, you have been warned.

Reference: Vose, Michael D., A Linear Algorithm For Generating Random Numbers With a Given Distribution. IEEE Transactions on Software Engineering, 17(9) (September 1991):972-975. [PDF]

Possibly Wrong

On science, mathematics, and computing

Monthly Archives: February 2012

The Alias Method and Double Precision