## Expectation Isn’t Everything

The Boston Globe reported last week on an elderly couple who were making– and spending– huge amounts of money on the Massachusetts Cash WinFall state lottery.  In my usual habit, I started poking around in the mathematics involved, and also as usual, found more interesting effects than I expected.

The game is pretty simple: for each \$2 ticket, you select 6 numbers from 1 to 46 (without replacement).  The parimutuel semi-weekly jackpot for matching all 6 numbers varies from \$500,000 to about \$2.5 million, with additional fixed payouts of \$2, \$5, \$150, and \$4,000 for matching 2 to 5 numbers, respectively.  As is usually the case, this game has negative expected return, with state lottery management taking about \$1.10 from each \$2 ticket sold.

What makes this game interesting is that when the jackpot reaches about \$2 million without a winner– which happens several times a year– the excess jackpot is “rolled down” into larger fixed payouts for matching fewer than all 6 numbers.  This last happened on the 14 July drawing, with payouts increased to \$26, \$802, \$19,507, and \$2,392,699 for matching 3 through 6 numbers, respectively.  The result is a game with positive expected return for the ticket buyer… even if you never expect to win the jackpot!

But as Steve Jacobs used to say (or still says, for all I know), “Expectation isn’t everything.”  You still have a lousy 2% probability of actually making any money at all with just one ticket.  Expected value is only realized in the long run, considering the average result of a large number of repeated trials.

So the couple in the Globe article simply realized the long run, and actually executed that large number of repeated trials, buying about \$614,000 worth of tickets.  This strategy not only has a nice expected return of about \$184,000, but it is extremely low risk; the probability of making money jumps from 2% with one ticket to over 98% with 307,000 tickets.  The couple have already won nearly \$1 million this year alone.

What got my attention and motivated this post was the following paragraph from the article:

Mark Kon, a professor of math and statistics at Boston University, calculated that a bettor buying even \$10,000 worth of tickets would run a significant risk of losing more than they won during the July rolldown week. But someone who invested \$100,000 in Cash WinFall tickets had a 72 percent chance of winning. Bettors like the Selbees, who spent at least \$500,000 on the game, had almost no risk of losing money, Kon said.

The interesting question is, how to calculate that 72% probability figure?  As far as I can tell, this problem does not have a nice analytical solution.  The “big hammer” approach is to estimate by simply simulating the drawing many times.  But I wondered if there might be a more efficient way, either analytically or via a simpler form of estimation.

My first idea was to replace the actual game with a more tractable one: consider a lottery where each \$2 ticket yields only two possible outcomes: either it wins a payout $q$ with probability $p$, or it loses with probability $1-p$.  We can compute the necessary values of $p$ and $q$ so that the single ticket outcome has the same (positive) expected value and variance as the actual game… but let us also conservatively assume that we never win the jackpot, so that the distribution is not so skewed.

In this simplified game, each ticket wins \$4,510.95 with probability about 0.00052.  The expected return on a single ticket is the same as the actual (jackpot-less) game, about \$.34, and the variance is the same as well.  The actual distribution is different, of course, but we should see the same general behavior as we buy more and more tickets: the overall expected return will increase linearly, and more importantly, the probability of making money will also increase, from the lowly 0.00052 with one ticket, approaching 1 for a sufficiently large number of tickets.

Right?

Well, not exactly.  The problem turned out to be more interesting than that.  Yes, we can efficiently compute the probability of winning money from $n$ tickets in our simplified game (I will leave it to the interested reader to work out the details).  But I was surprised to find that that probability is not monotonic as a function of $n$, as the following plot shows.

Probability of winning vs. number of tickets.

The red curve corresponds to the actual lottery, and was generated via simulation; see the source code at the end of this post.  The behavior is what I expected: the more tickets you buy, the higher the probability that you win money.

The blue curve, corresponding to the simplified game, is more interesting.  First, just fixing the first and second moments of the distribution did indeed approximate the real behavior as well as I had hoped.  We can efficiently estimate the probabilities in the article: even buying \$10,000 worth of tickets still only wins money about half the time; buying \$100,000 wins about 75% of the time; and buying \$614,000 wins money about 97% of the time.

But the unexpected and interesting behavior is the jaggedness of the blue curve.  There are many large jumps, at times nearly 20%, where the probability of an overall win decreases with the purchase of a single additional ticket.

ObPuzzle: what is going on here?

For reference, following is the source code for simulating the actual game, with commented parameters for the simplified game as well.

```#include "math_Random.h"
#include <iostream>

int main()
{
const int num_samples = 10000;

// The simplified single-payout game.
//const int num_outcomes = 2;
//const double probability[] = {0.9994806467243879, 1.};
//const double payoff[] = {-2, 4508.953581737101};

// The actual game during the 14 July rolldown week.
const int num_outcomes = 6;
const double probability[] = {0.8312777261949867, 0.9776294385532591,
0.9987251808751723, 0.999974270881075, 0.9999998932401705, 1.};
const double payoff[] = {-2, 0, 24, 800, 19505, 2392697};

math::Random rng;

// Evaluate buying increasing numbers of tickets.
for (int num_tickets = 25000; num_tickets <= 300000; num_tickets += 25000)
{
int count = 0;
for (int i = 0; i < num_samples; ++i)
{
// Buy and cash in tickets.
double win = 0;
for (int j = 0; j < num_tickets; ++j)
{
double p = rng.nextDouble();
for (int k = 0; k < num_outcomes; ++k)
{
if (p <= probability[k])
{
win += payoff[k];
break;
}
}
}

// Record whether we make or lose money.
if (win >= 0)
{
++count;
}
}

// Display probability of winning (i.e., not losing money).
std::cout << num_tickets << "\t" <<
static_cast<double>(count) / num_samples << std::endl;
}
}
```

This entry was posted in Uncategorized. Bookmark the permalink.

### 5 Responses to Expectation Isn’t Everything

1. Harold Bond says:

Thanks for posting the article, was certainly a great read!

2. Let me call your probabilities “thresholds” instead. Let them be t_i (i = 1, 2, …, outcomes ) and set t_0 = 0. So, for your simple game: (t_0,t_1,t_2) = (0, 0.9994806467243879, 1.0).

Define p_i = t_i – t_{i-1}. So, for your simple game: (p_1, p_2) = (0.9994806467243879, .0005193532756121).

Define a_i to be the payoffs. So, for your simple game: (a_1, a_2) = (-2, 4508.953581737101).

Let c_i be the number of tickets we got in bucket i for a given attempt. So, if I bought 1 ticket in your simple game, then I have: (c_1, c_2) = (1,0) or (0,1). If I bought 2 tickets, I have: (c_1, c_2) = (2,0), (1,1), or (0,2).

The probability of making money with n tickets is the sum over all possible combinations of c_i’s where all of the c_i’s are non-negative, the sum of the c_i’s is n, and the sum of c_i*a_i is positive of: choice(n;c) * the product of p_i^c_i (where choice(n;c) is n! divided by the product of c_i!). So, for your simple game, this is the sum of (n!/(c_1!c_2!)) * p_1^c_1 * p_2^c_2 for all c_1, c_2 non-negative with c_1 + c_2 = n and c_1*a_1 + c_2*a_2 > 0. (And, since c_1 + c_2 = n and c_1*a_1 + (n – c_1) * a_2 is monotonic in c_1, you can just start summing from c_1 = 0 up to n and stop when you get to a zero term.)

By my calculations then, for the simple game, you have more than a 50% chance of winning money with 1335 tickets or more.

The calculations become very slow when trying to do this with more than two outcomes. You’d need to take more advantage of the ranges where the sum of the c_i*a_i is positive to try to simplify the sums so you’re not summing over all of the partitions of n into six parts. (I believe this would be n(n-1)(n-2)(n-3)(n-4)/16 checks to see if the sum of the c_i*a_i is positive and then all of the corresponding factorials and exponents from there.)

• Thanks for working this out, this is a good verification that I haven’t gotten completely off in the weeds. Here is another argument for why I suspect that a “nice” solution is probably not achievable: let g(x) be the probability generating function for the outcome of a single ticket; we can let all payoffs be integers, so that, for example, for the simple game g(x)=p_1*x^-2 + p_2*x^4509 (using your notation).

Then the pgf for n tickets is simply g(x)^n. This is a handy approach in a lot of situations; for example, the expected return is the derivative evaluated at x=1. But in this case, we need the sum of coefficients for non-negative powers of x, which is, well, not “nice.” At least in the simplified game, this reduces to summing over a few binomial coefficients, but as you point out, when g(x) has more terms, it’s less clear how to optimize the bookkeeping.

Finally, one minor correction: note as mentioned in the post that the probability of winning is not monotonic vs. the number of tickets. You’re right that the probability creeps over 0.5 at 1335 tickets, and increases to about 0.69 at 2255 tickets… but then drops to less than 0.33 at 2256 tickets! (This happens in steps of about a_2/a_1 tickets, as we pick up one additional term in the cdf for the binomial distribution.)