Someone at work this past week asked me about the problem of computing the probability that, in a group of people, some pair of persons share the same birthday. I am not sure what prompted the question, despite the interesting coincidence that in fact he and I share the same birthday.
The problem is a commonly used example of the failure of our intuition when dealing with probabilities in general and probabilities of “coincidences” in particular. A survey of university students asked for an estimate of the number of people required for the probability of a shared birthday to exceed 1/2 (see here). The median response was 385 people. Think about that– more than half of those surveyed guessed a number of people that is guaranteed to yield a shared birthday.
So what is the correct answer? I think this problem makes for a great experiment for students, since it can be approached in several ways. First, the mathematics: the desired probability is 1 minus the probability that all birthdays are distinct. Assuming all birthdays are equally likely (we’ll address this assumption shortly), we have
Next, the programming: students can write a function to compute , which requires both iteration and some thought about order of operations. It will not do to compute the numerator and denominator explicitly and then divide, at least in most languages that lack arbitrary-precision arithmetic.
A plot showing versus , and thus the answer to the problem, can be seen here. (<soapbox> I find it interesting and amusing when I hear comments about Wikipedia not being a useful source of information. Such comments seem to be based on content that is almost always political or religious. What does one expect in those cases? That is simply not the type of content for which the “anyone can edit” approach is useful. Particularly for technical information, I think Wikipedia contains a lot of useful starting points… that still require a brain in the head of the viewer to verify or extend the results found there.</soapbox>)
Finally, the experiments: the answer to the birthday problem may be sufficiently non-intuitive that students may not trust the mathematics. A classroom of 25-30 students, or better yet, collection of data from multiple classrooms, provides a handy experimental sample to corroborate the analysis. This is also a good time to come back to the uniformity assumption that all birthdays are equally likely. Students can do Monte Carlo analysis, writing programs to generate “classrooms” of data, possibly with non-uniform distribution of birthdays… and observe for themselves that such non-uniformity actually increases the likelihood of shared birthdays.
I was motivated to write this post by the Wikipedia article mentioned earlier. One small section describes a variant of this question that I think is more directly applicable to the cryptographic “birthday attack,” so named for its relation to this problem. Suppose that a movie theater is offering a free ticket to a new movie to the first person at the box office with the same birthday as someone earlier in line. Where should you stand in line? What is the expected position in line to win the free ticket?
This is also a nice problem for students, particularly because its exact solution does not require evaluation of any of the actual probabilities! In fact, the answer may be obtained from no more than the root of a quadratic. I won’t spoil it for those wanting to work out the solution, but I will point out that this is a good example, as mentioned above, where Wikipedia gets it wrong (at least at time of this writing). Unless I misunderstand the statement, the solution does not involve maximizing the difference between probabilities for consecutive positions in line.