A harder birthday problem

It is a well-known non-intuitive result that in a group of n=23 people– conveniently the size of a classroom of students– the probability is at least 1/2 that k=2 or more of them share a birthday.  This is a nice problem for several reasons:

  1. Its solution involves looking at the problem in a non-obvious way, in this case by considering the complementary event that all birthdays are distinct.
  2. Once the approach in (1) is understood, computing the answer is relatively easy: the probability is 1-(365)_{n}/365^n, where (x)_{n} is the falling factorial.
  3. The answer is surprising.  When I ask students, “How many people are needed for the probability of a shared birthday to exceed 1/2?”, guesses as high as 180 are common.
  4. Picking on the not-quite-realistic assumption of all d=365 birthdays being equally likely actually helps; that is, with a non-uniform distribution of birthdays, the probability of coincidence is higher.

But what about larger k?  For example, suppose that while surveying your students’ birthdays in preparation for this problem, you find that three of them share a birthday?  What is the probability of this happening?

The Wikipedia page on the birthday problem doesn’t address this generalization at all.  There are several blog and forum posts addressing the k=3 case specifically, by grouping the prohibited cases according to the number of pairs of people with shared birthdays.  This is generalized to arbitrary k \geq 2 on the Wolfram MathWorld page; the resulting recurrence relation is pretty complex, but it’s a nice exercise to prove that it works.

Probability that at least (2,3,4,5) people share a birthday, vs. group size.

Probability that at least (2,3,4,5) people share a birthday, vs. group size.

The motivation for this post is to describe what I think is a relatively simpler solution, for arbitrary k, including Python source code to perform the calculation.  Let’s fix the number of equally likely possible birthdays d=365, and the desired number k of people sharing a birthday, and define the function

G(x)=(1+x+\frac{x^2}{2} \ldots + \frac{x^{k-1}}{(k-1)!})^d

Then G(x) is the exponential generating function for the number of “prohibited” assignments of birthdays to n people where no more than k-1 share a birthday.  That is, the number of such prohibited assignments is n! times the coefficient of x^n in G(x).

(When working through why this works, it’s interesting how often it can be helpful to transform the problem into a different context.  For example, in this case, we are also counting the number of length-n strings over an alphabet of d characters, where no character appears more than k-1 times.)

The rest of the calculation follows in the usual manner: divide by the total number of possible assignments d^n to get the complementary probability, then subtract from 1.  The following Python code performs this calculation, either exactly– using the fractions module, which can take a while– or in double precision, which is much faster.

import numpy as np
from numpy.polynomial import polynomial as P
import fractions
import operator
def p(k, n, d=365, exact=False):
    f = fractions.Fraction if exact else operator.truediv
    q = 0 if n > d * (k - 1) else P.polypow(np.array(
        [f(1, np.math.factorial(j)) for j in range(k)], dtype=object),
    for j in range(1, n + 1):
        q = q * f(j, d)
    return 1 - q
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to A harder birthday problem

  1. Nice python modules. I don’t know this python modules fractions and operator.
    Anyway you have a error:

    >>> p(1,100)
    Traceback (most recent call last):
    File “”, line 1, in
    File “”, line 6, in p
    IndexError: index 100 is out of bounds for axis 0 with size 1
    >>> p(2,100)

    • Note that k=1 doesn’t really yield a useful calculation, since we’re asking for the probability that “1 or more” of n people have the same birthday. In that case, the generating function described in the post is just the constant G(x)=1.

      More generally, we could add the edge case check for n>(k-1)*d, in which case p(k,n)=1 by the pigeonhole principle, but the simple array indexing in the example code won’t work to extract the corresponding generating function coefficient (which is zero).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.