The clueless student: solution

This is to capture notes on my progress toward a solution to the “matching quiz” problem presented in the previous post, which can be re-stated more concisely as follows: what is the optimal strategy for choosing quiz answers specified as a function f : [n] \rightarrow [n], not necessarily a bijection, that maximizes the probability of agreeing with a randomly chosen “solution” bijection in at least m questions?

A strategy is characterized by a partition \lambda of n, with the parts corresponding to the sizes of the classes of the equivalence kernel of the chosen function f.  The number of parts in the partition indicates the number of distinct quiz answers.  For example,

1 + 1 + ... + 1 = n

corresponds to a bijection, and the single-part partition n=n corresponds to a constant function, i.e., writing the same answer for all n questions.  Note that any function with the same partition “signature” has the same probability of success.

So, given a guessing strategy specified as a partition with k parts:

\lambda = \lambda_1 + \lambda_2 + ... + \lambda_k = n

what is the probability of getting at least m answers correct?  First, we can compute the probability p(S) that exactly some subset S \subseteq [k] of distinct answers is correct using inclusion-exclusion:

p(S) = \frac{1}{n!} \sum\limits_{T \supseteq S} (-1)^{|T|-|S|}(n-|T|)! \prod\limits_{i \in T} \lambda_i

And so the overall probability of at least m correct answers is

P(success) = \sum\limits_{|S| \geq m} p(S)

We can collapse this sum of sums by observing that a particular subset T \subseteq [k] in the inner summation appears {|T| \choose |S|} times in the outer summation for a given size |S|.  Using the identity

\sum\limits_{s=m}^t (-1)^{t-s} {t \choose s} = (-1)^{t-m} \frac{m}{t} {t \choose m}

we can compute the overall probability of at least m correct answers as

P(success) = \frac{1}{n!} \sum\limits_{|T| \geq m} (-1)^{|T|-m} \frac{m}{|T|} {|T| \choose m} (n-|T|)! \prod\limits_{i \in T} \lambda_i

At this point, I’m stuck; I don’t see a better approach than simply evaluating all possible strategies (i.e., generating all integer partitions of n) for small values of 1 \leq m \leq n, and looking for patterns.  Following is the conjectured optimal strategy which holds for n \leq 16:

  • If m=1, then as pointed out in the previous post, write down the same answer n times, guaranteeing exactly one correct answer.
  • For m=2, write one answer \lfloor n/2 \rfloor times, and another (different) answer \lceil n/2 \rceil times.
  • For m=n-1, select an “almost-permutation,” corresponding to the integer partition 1+1+...+1+2=n, so that exactly one answer gets duplicated (and thus exactly one answer is missing).
  • Otherwise, select a random permutation.

Actually, we can describe this strategy more efficiently as follows: (a) select m distinct answers, and distribute them as evenly as possible among the n questions… unless 2<m<n-1, in which case (b) just select a random permutation.  I think (a) is what intuition would suggest is always the optimal approach– it was my wife’s immediate response when I described the problem to her– but I don’t have a good explanation for the qualification (b).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s