A harder birthday problem

It is a well-known non-intuitive result that in a group of n=23 people– conveniently the size of a classroom of students– the probability is at least 1/2 that k=2 or more of them share a birthday.  This is a nice problem for several reasons:

  1. Its solution involves looking at the problem in a non-obvious way, in this case by considering the complementary event that all birthdays are distinct.
  2. Once the approach in (1) is understood, computing the answer is relatively easy: the probability is 1-(365)_{n}/365^n, where (x)_{n} is the falling factorial.
  3. The answer is surprising.  When I ask students, “How many people are needed for the probability of a shared birthday to exceed 1/2?”, guesses as high as 180 are common.
  4. Picking on the not-quite-realistic assumption of all d=365 birthdays being equally likely actually helps; that is, with a non-uniform distribution of birthdays, the probability of coincidence is higher.

But what about larger k?  For example, suppose that while surveying your students’ birthdays in preparation for this problem, you find that three of them share a birthday?  What is the probability of this happening?

The Wikipedia page on the birthday problem doesn’t address this generalization at all.  There are several blog and forum posts addressing the k=3 case specifically, by grouping the prohibited cases according to the number of pairs of people with shared birthdays.  This is generalized to arbitrary k \geq 2 on the Wolfram MathWorld page; the resulting recurrence relation is pretty complex, but it’s a nice exercise to prove that it works.

Probability that at least (2,3,4,5) people share a birthday, vs. group size.

Probability that at least (2,3,4,5) people share a birthday, vs. group size.

The motivation for this post is to describe what I think is a relatively simpler solution, for arbitrary k, including Python source code to perform the calculation.  Let’s fix the number of equally likely possible birthdays d=365, and the desired number k of people sharing a birthday, and define the function

G(x)=(1+x+\frac{x^2}{2} \ldots + \frac{x^{k-1}}{(k-1)!})^d

Then G(x) is the exponential generating function for the number of “prohibited” assignments of birthdays to n people where no more than k-1 share a birthday.  That is, the number of such prohibited assignments is n! times the coefficient of x^n in G(x).

(When working through why this works, it’s interesting how often it can be helpful to transform the problem into a different context.  For example, in this case, we are also counting the number of length-n strings over an alphabet of d characters, where no character appears more than k-1 times.)

The rest of the calculation follows in the usual manner: divide by the total number of possible assignments d^n to get the complementary probability, then subtract from 1.  The following Python code performs this calculation, either exactly– using the fractions module, which can take a while– or in double precision, which is much faster.

import numpy as np
from numpy.polynomial import polynomial as P
import fractions
import operator

def p(k, n, d=365, exact=False):
    f = fractions.Fraction if exact else operator.truediv
    q = P.polypow(np.array(
        [f(1, np.math.factorial(j)) for j in range(k)], dtype=object),
        d)[n]
    for j in range(1, n + 1):
        q = q * f(j, d)
    return 1 - q

 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s