It is a well-known non-intuitive result that in a group of people– conveniently the size of a classroom of students– the probability is at least 1/2 that or more of them share a birthday. This is a nice problem for several reasons:
- Its solution involves looking at the problem in a non-obvious way, in this case by considering the complementary event that all birthdays are distinct.
- Once the approach in (1) is understood, computing the answer is relatively easy: the probability is , where is the falling factorial.
- The answer is surprising. When I ask students, “How many people are needed for the probability of a shared birthday to exceed 1/2?”, guesses as high as 180 are common.
- Picking on the not-quite-realistic assumption of all birthdays being equally likely actually helps; that is, with a non-uniform distribution of birthdays, the probability of coincidence is higher.
But what about larger ? For example, suppose that while surveying your students’ birthdays in preparation for this problem, you find that three of them share a birthday? What is the probability of this happening?
The Wikipedia page on the birthday problem doesn’t address this generalization at all. There are several blog and forum posts addressing the case specifically, by grouping the prohibited cases according to the number of pairs of people with shared birthdays. This is generalized to arbitrary on the Wolfram MathWorld page; the resulting recurrence relation is pretty complex, but it’s a nice exercise to prove that it works.
The motivation for this post is to describe what I think is a relatively simpler solution, for arbitrary , including Python source code to perform the calculation. Let’s fix the number of equally likely possible birthdays , and the desired number of people sharing a birthday, and define the function
Then is the exponential generating function for the number of “prohibited” assignments of birthdays to people where no more than share a birthday. That is, the number of such prohibited assignments is times the coefficient of in .
(When working through why this works, it’s interesting how often it can be helpful to transform the problem into a different context. For example, in this case, we are also counting the number of length- strings over an alphabet of characters, where no character appears more than times.)
The rest of the calculation follows in the usual manner: divide by the total number of possible assignments to get the complementary probability, then subtract from 1. The following Python code performs this calculation, either exactly– using the
fractions module, which can take a while– or in double precision, which is much faster.
import numpy as np from numpy.polynomial import polynomial as P import fractions import operator def p(k, n, d=365, exact=False): f = fractions.Fraction if exact else operator.truediv q = 0 if n > d * (k - 1) else P.polypow(np.array( [f(1, np.math.factorial(j)) for j in range(k)], dtype=object), d)[n] for j in range(1, n + 1): q = q * f(j, d) return 1 - q