## Giant Yahtzee

In the game of Yahtzee, players roll five standard dice, then repeatedly re-roll subsets of the dice, trying to obtain various scoring combinations, the most valuable of which is a “Yahtzee,” or five of a kind, i.e., all five dice showing the same value.

If we strip off the complexities of the multiple players, limited number of re-rolls, and various other scoring combinations (e.g., straights, full houses, etc.), there is a nice mathematical puzzle buried underneath:

Roll $n$ dice each with $d=6$ sides, and repeatedly re-roll any subset of the dice– you can “keep” any or none of your previous rolls, and you can re-roll dice you have previously kept– until all dice show the same value (e.g., all 1s, or all 2s, etc.). Using an optimal strategy, what is the (minimum) expected number of rolls required? In particular, can we solve this problem for “Giant Yahztee,” where we are playing with, say, $n=100$ dice?

Edit 2020-10-05: Following are my notes on this problem. Given that we (re)roll $r$ of the dice– setting aside the remaining $s=n-r$ already identical dice– let the random variable $X_r$ indicate the resulting new number of identical dice. The distribution of $X_r$ is given by

$P(X_r \leq t) = \frac{1}{d^r}[\frac{x^r}{r!}] \left(\sum\limits_{k=0}^t \frac{x^k}{k!}\right)^{d-1} \left(\sum\limits_{k=0}^{t-n+r} \frac{x^k}{k!}\right)$

$P(x_r = t) = P(X_r \leq t) - P(X_r \leq t-1)$

so that the transition matrix $P$ for the absorbing Markov chain with state space ${0, 1, 2, \ldots, n}$ indicating the current number of identical dice has entries

$P_{s,t} = P(X_{n-s}=t), 0 \leq s,t \leq n$

which we can use to compute the desired expected number of rolls. See the comments for a nice closed form solution for the cumulative distribution function for the number of rolls when $n=5$.

This entry was posted in Uncategorized. Bookmark the permalink.

### 9 Responses to Giant Yahtzee

1. This isn’t the answer you seek but it’s related. The link is to a tweet with an image because it’s unclear whether we can use Latex in the comments. https://twitter.com/iconjack/status/697903289220747264?s=20

• Right– defining $p(n)$ to be the formula provided in the link, the expected value desired here is $\sum_{n \geq 0} (1-p(n))$, which we can (tediously) evaluate to be 191283/17248. (You can embed inline LaTeX by flanking with $latex and $.)

(Note that this implicitly assumes that an optimal strategy minimizing the expected number of rolls it “monotonic” in the sense that the keep-and-reroll strategy for a *fixed* number of rolls n can be “extended” to a still-optimal strategy for any larger fixed number of rolls.)

• Yes, I got the same answer for average number of rolls required to Yahtzee with 5 dice: 191283/17248 ≈ 11.09. The only hitch in the case of 5 dice is the situation where you’re keeping 2 of something and, when you roll the other 3 dice, they all match. In this case you have to switch horses in midstream.

2. @iconjack, I have edited the post to describe my solution approach. Computing the distribution of the “new” number of identical dice is essentially a variant of the birthday problem, which makes me wonder whether this can be simplified further– maybe not, e.g. see the recurrence relation on MathWorld’s birthday problem page.

• iconjack says:

Your solution is extremely nice. Really shows off the power of generating functions. Have you gone on to compute the average number of rolls needed to yahtzee with 100 dice?

• Yep– the expected number of rolls is about 28.59474513327532. Interestingly, this seems to be very well approximated by $\log_{1/(1 - 1/d)}n + f(d)$, where there is a reasonable albeit hand-wavey argument for the logarithm, but I’m less sure about how to intuitively explain the $f(d)$, which seems to be about $f(d) \approx d/2$.

3. iconjack says:

One angle of attack for this problem, at least for the approximation, could be as follows. First note that the Yahtzee rule that allows what I called “switching horses in midstream” has little value when n, d are not small. Then realize the problem is equivalent to the “order statistics” problem of finding E max(X_i) where X_i is geometric with p = 1/d. Maybe this is a solved problem?

• iconjack says:

This is what I found for E max(X_i) where X_i is geometric with p = 1/d, the value mentioned in the comment above:
$$\sum_{k=0}^{\infty} k \, \left( (1- (1-\frac{1}{d})^k)^n - (1-(1-\frac{1}{d})^{k-1} \right)^n$$
For n=100 and d=6,
28.5947 is the real answer
28.2585 is what your log formula gives
28.9518 is what the infinite sum in this comment gives
Remember, this last value is for when you’re not allowed to switch horses. I’ve found empirically that when n is largish, the ability to switch only gains you about half a roll.

• Nice! This figure shows the resulting error in these two approximations as a function of n, with each curve corresponding to d in {2,4,6,8,10,12}. This suggests that my eyeball-norm guess at a +d/2 correction is inaccurate, although I’m not sure how to fix it.

This site uses Akismet to reduce spam. Learn how your comment data is processed.