It is a well-known non-intuitive result that in a group of people– conveniently the size of a classroom of students– the probability is at least 1/2 that or more of them share a birthday. This is a nice problem for several reasons:

- Its solution involves looking at the problem in a non-obvious way, in this case by considering the
*complementary*event that all birthdays are*distinct*. - Once the approach in (1) is understood, computing the answer is relatively easy: the probability is , where is the falling factorial.
- The answer is surprising. When I ask students, “How many people are needed for the probability of a shared birthday to exceed 1/2?”, guesses as high as 180 are common.
- Picking on the not-quite-realistic assumption of all birthdays being equally likely actually
*helps*; that is, with a non-uniform distribution of birthdays, the probability of coincidence is*higher*.

But what about larger ? For example, suppose that while surveying your students’ birthdays in preparation for this problem, you find that *three* of them share a birthday? What is the probability of this happening?

The Wikipedia page on the birthday problem doesn’t address this generalization at all. There are several blog and forum posts addressing the case specifically, by grouping the prohibited cases according to the number of *pairs* of people with shared birthdays. This is generalized to arbitrary on the Wolfram MathWorld page; the resulting recurrence relation is pretty complex, but it’s a nice exercise to prove that it works.

The motivation for this post is to describe what I think is a relatively simpler solution, for arbitrary , including Python source code to perform the calculation. Let’s fix the number of equally likely possible birthdays , and the desired number of people sharing a birthday, and define the function

Then is the exponential generating function for the number of “prohibited” assignments of birthdays to people where *no more* than share a birthday. That is, the number of such prohibited assignments is times the coefficient of in .

(When working through *why* this works, it’s interesting how often it can be helpful to transform the problem into a different context. For example, in this case, we are also counting the number of length- strings over an alphabet of characters, where no character appears more than times.)

The rest of the calculation follows in the usual manner: divide by the *total* number of possible assignments to get the complementary probability, then subtract from 1. The following Python code performs this calculation, either exactly– using the `fractions`

module, which can take a while– or in double precision, which is much faster.

import numpy as np from numpy.polynomial import polynomial as P import fractions import operator def p(k, n, d=365, exact=False): f = fractions.Fraction if exact else operator.truediv q = P.polypow(np.array( [f(1, np.math.factorial(j)) for j in range(k)], dtype=object), d)[n] for j in range(1, n + 1): q = q * f(j, d) return 1 - q

Nice python modules. I don’t know this python modules fractions and operator.

Anyway you have a error:

>>> p(1,100)

Traceback (most recent call last):

File “”, line 1, in

File “”, line 6, in p

IndexError: index 100 is out of bounds for axis 0 with size 1

>>> p(2,100)

0.9999996927510721

Note that k=1 doesn’t really yield a useful calculation, since we’re asking for the probability that “1 or more” of n people have the same birthday. In that case, the generating function described in the post is just the constant G(x)=1.

More generally, we could add the edge case check for n>(k-1)*d, in which case p(k,n)=1 by the pigeonhole principle, but the simple array indexing in the example code won’t work to extract the corresponding generating function coefficient (which is zero).