This past Monday marked the start of a 10-week promotion where you can buy bottles of Mountain Dew, each with a label showing one of the 50 United States. Collect all 50 state labels, and win $100 (in the form of a prepaid Visa card).
How many bottles of Mountain Dew should you expect to have to buy to win one of these $100 gift cards? The motivation for this post is to discuss this promotion as an instance of the classic coupon collector’s problem… but with a couple of interesting wrinkles, one of which appears to make this promotion vulnerable to exploitation by a determined participant, with an opportunity to net a significant positive return with almost no risk.
Group drawings: buying six-packs
The first wrinkle is to suppose that we plan to buy our bottles of Mountain Dew in six-packs, instead of one bottle at a time. Buying in bulk reduces the price per bottle: an individual 16.9-ounce bottle is easily a dollar or more, while I can find six-packs in my neighborhood for about $3.48, or 58 cents per bottle. I will assume this price for the remainder of this discussion.
So how many six-packs should we expect to have to buy to collect a complete set of 50 state labels? More generally, let be the number of bottles in each pack, where each bottle has a label chosen independently and uniformly from possible states, and let be the random variable indicating the number of packs we must buy until we first collect at least one of every state label. What is the distribution of ?
Stadje refers to this as the coupon collector’s problem “with group drawings” (see references at the end of this post). However, he addresses a slightly different variant, where the labels in a six-pack must be distinct, that is, sampled without replacement from the 50 possible labels (similar to the NFL drug testing procedure discussed in a recent post). In our case, each individual bottle’s label is an independent sample, so that a six-pack may contain duplicates. Inspection of a few six-packs at my local grocery store verified that this is indeed the more appropriate model for this problem.
The probability that we have collected all state labels after buying packs is given by
It’s a nice exercise to show that, from this, we can derive the expected number of packs as
which evaluated for and yields an average of about 37.91 six-packs, or about $131.92. So if we buy six-packs until we collect all 50 state labels, thus winning $100, we should expect to lose about $31.92 on average as a result of our venture, with a probability of only about 0.16 that we manage to net any positive return. Even worse, there is no upper bound on how much we might lose in that remaining 84% case. We will fix this shortly.
The Double Dixie cup problem: decreasing risk
The much more interesting wrinkle is that the terms of the promotion allow a single participant to win more than once: up to five times during the 10-week period, collecting a total of $500 worth of gift cards. This helps us a lot, because although we expect to need to buy about 38 six-packs to collect the first set of 50 state labels, we typically need less than half as many additional packs to collect the second and subsequent complete sets of labels.
Newman refers to this as the “double Dixie cup problem” (see references below). Let’s extend the notation described above, and define the random variable to indicate the number of packs that we must buy to collect complete sets of state labels. Then the cumulative distribution for is given by
Now, let’s suppose that we start buying six-packs, trying to collect complete sets of bottle labels, thus winning $500. At $3.48 per six-pack, we will lose money only if we have to buy more than 143 six-packs.
Evaluating yields a probability of less than 0.008 that we will lose money in our search for five complete sets of labels. We can bound the possible amount lost by stopping at 144 six-packs, no matter what, and collecting $100 gift cards for however many complete sets of bottle labels we have managed to collect at that point. The figure below shows the resulting distribution of possible net returns, with the expected return of $167.53 shown in red.
A 99.2679% chance of making money, with an expected return of $167.53, sounds like a pretty good wager. But even this may not be the best we could possibly do. Although there is certainly no advantage to buying any more than 144 six-packs in total, it may be advantageous to stop before that point, even without having collected all five complete sets of bottle labels, if the expected additional gain from continuing to buy packs is negative. Such an optimal stopping strategy would likely be very complex, but it would be interesting to investigate how much the above approach could be improved.
Finally, I have already sunk about $300 into a bunch of Skittles that I didn’t eat; I have no plans to invest in another experiment buying gallons of Mountain Dew that I won’t drink. But there is a good reason why I’m more reluctant in this case: all of the above analysis assumes that each of the 50 states are equally likely to appear on any bottle. Perhaps they truly are uniformly distributed– the promotion is intended to “celebrate what makes every state epic,” after all, so any state that is more scarce than it should be would feel justifiably slighted.
We made the same uniformity assumption in the recent Skittles experiment, which turned out to be mostly correct. But in that case, any departure from uniformity would only have helped us, making it more likely to find an identical pair of packs. In this case, however, a non-uniform distribution of state labels makes it harder to collect complete sets.