# Secret Santa puzzle

This problem was inspired by a recent /r/dailyprogrammer exercise on Reddit.  It’s admittedly a little late for a holiday-themed puzzle, but then so was the exercise:

You and your large extended family participate in a Secret Santa gift exchange, where everyone is randomly– and secretly– assigned another person for whom to buy a gift.  To determine the gift assignments, everyone writes their name on a slip of paper and puts the slip into a hat.  Then each person in turn draws a slip from the hat, buying a gift for the person whose name is on the slip.

One obvious requirement is that no one should draw their own name.  However, there is an additional restriction: no one should be assigned to buy a gift for anyone in their immediate family: husbands cannot buy for their wives, parents cannot buy for their children, etc.  Roughly, you cannot buy for anyone that came in the same car with you.

Instead of drawing slips from a hat, the /r/dailyprogrammer exercise involved writing a program to generate the gift assignments for a group of 30 people comprised of immediate family sizes of {1, 1, 2, 1, 2, 4, 3, 1, 1, 2, 1, 2, 1, 1, 3, 2, 2}.  I saw several different approaches in the submitted solutions: some programs were actually deterministic, forgetting that the assignments should be random.  Other programs constructed an assignment iteratively or recursively, essentially modeling the sequential draw-from-a-hat method… but sometimes getting “stuck” near the end in situations discussed here before, where all remaining unassigned persons are in the same family.  (And even when these programs don’t get stuck, not all of the resulting assignments are equally likely.)

Finally, what I thought were the cleanest solutions used rejection sampling to eliminate these problems: keep generating random unrestricted permutations until finding one that satisfies all of the intra-family buying restrictions.

Problem: Your family does the same thing: if at any point in the drawing process, anyone draws their own name or the name of anyone in their immediate family, everyone puts their slips back into the hat, and the process is re-started from the beginning.  On average, how long will this take?  That is, what is the expected number of times the drawing process must be started before it successfully completes?

# Proofreading as “mark and recapture”

Last week I saw two different articles (at Futility Closet and DataGenetics) about exactly the same topic: suppose that two reviewers each proofread the same document.  The first reviewer finds $A=10$ errors, and the second finds $B=12$ errors, of which $C=8$ are in common, i.e., 8 of those 12 errors had already been found by the first reviewer.  Can we estimate how many additional errors remain in the document that were missed by both reviewers?

Both articles essentially reproduce the argument given by Pólya (see reference below) that a reasonable estimate for the total number of errors (both found and missed) is given by the following simple formula:

$\hat{N} = \frac{A B}{C} = 15$

This is a “mark-and-recapture” estimation method similar to that used, for example, to estimate the number of fish in a lake.  Intuitively, the first reviewer identifies and “marks” $A/N$ of the errors in the document (where $N$ is unknown), which should approximately equal the fraction $C/B$ of errors found by the second reviewer that were already marked.

However, neither article points out just how inaccurate this method of estimation can be, nor the fact that better alternatives are available.  For example, continuing with the example above as originally presented in the DataGenetics article, let us assume for the moment that

1. There really are a total of 15 errors in the document being reviewed.
2. The first reviewer really does find each error independently with probability 10/15=2/3.
3. The second reviewer really does find each error independently with probability 12/15=4/5.

Note that this example is arguably somewhat contrived to be “nice,” since the actual number $C=8$ of errors observed in common by both reviewers happens to equal the expected number of such errors.  This need not be the case; with this model of reviewer accuracy, the number of common errors may be as large as $N=15$… or as small as zero, in which case our estimator breaks down entirely.

Even if we condition against this unlikely difficulty, essentially asking the reviewers to both start over if they don’t find any errors in common (and to forget the errors they may have already found), there is still significant variance in the possible estimates that may result, as shown in the following figure.

Distribution of estimate of number of errors (N=15, p1=2/3, p2=4/5).

(This is not a sample from a simulation; we can calculate this distribution exactly.)  The mean of the estimate, shown in red, is approximately 15.17, which is pretty good.  However, we can do better– only slightly better in this already-fortunate case, but a lot better in other cases– using a slightly different estimator due to Chapman:

$\hat{N} = \frac{(A+1)(B+1)}{C+1} - 1$

This estimate has several advantages over the Lincoln-Petersen method.  It has less bias and less variance, particularly in situations like this where the “population” of errors is relatively small.  Also, it still works even when $C=0$, i.e., when no errors are found in common by both reviewers.

Having said that, it’s not clear how really useful either method is in this particular context, given how widely the resulting estimate may vary from the true value.  These estimation methods work much better when at least one of the two sample sizes $A$ and $B$ is pre-determined (e.g., first catch exactly 100 fish, mark them, then catch 100 fish again), and only $C$ varies randomly.

Reference:

• Pólya, G., Probabilities in Proofreading, American Mathematical Monthly, 83(1) January 1976, p. 42 [JSTOR]