“Cold hit” DNA profiling

I recently read the book Math on Trial: How Numbers Get Used and Abused in the Courtroom, by the mother-daughter team of Leila Schneps and Coralie Colmez.  It’s an interesting, easy, and short read, just a little over 200 pages, with each chapter being a mostly self-contained description of a legal case where mathematics was involved in one way or another… and in particular, where mathematics was potentially misused, misunderstood, etc.

For example, one of the chapters discusses the case of the murder of Meredith Kercher, which has new relevance given Amanda Knox’s recent re-emergence into the media spotlight as the comedy of the Italian court system re-tries her case.  In my opinion, the authors seem to clearly think that both Knox and her then-boyfriend, Raffaele Sollecito, are guilty, and much of the focus of the discussion is on the mathematical benefit of repeating a DNA test with a probabilistic outcome.  It will be interesting to see if this “second test” is indeed conducted again, and if so, what will be the result.

But my focus here is on another chapter in the book discussing the murder of Diana Sylvester in 1972, and the case against John Puckett, identified in 2003, over three decades later, by “cold hit” DNA testing of sperm from Diana’s body.  By “cold hit” is meant that the DNA was compared with all of the samples in a large database of “thousands of known California criminals,” without any prior suspicion of any particular person in that database in relation to this particular crime.  Puckett came up as a match, and was eventually convicted of first-degree murder and sentenced to life in prison.

The interesting mathematical question is, what is the probability that Puckett was in fact innocent of the crime, and was instead merely an unfortunate lottery winner?  That is, even if no one in the searched database committed the crime in question, what is the probability that someone in the database would be identified as a match by chance?

To make the problem slightly more disturbing, imagine a not-so-distant future where, instead of a database confined to known criminals in California, we expand the database to all of the 300+ million mostly law-abiding people in the United States.  Now what is the probability of such a “cold hit” occurring due to chance, as opposed to guilt?

This case has generated a lot of controversy and debate.  For example, mathematician Keith Devlin has written several MAA columns questioning the practice of cold hit DNA profiling (see references below), citing a 2005 study of an Arizona criminal database containing just 65,493 entries, in which a seemingly surprisingly large number of pairs of entries were found that matched at nine or more gene loci:

“A study of the Arizona CODIS database carried out in 2005 showed that approximately 1 in every 228 profiles in the database matched another profile in the database at nine or more loci, that approximately 1 in every 1,489 profiles matched at 10 loci, 1 in 16,374 profiles matched at 11 loci, and 1 in 32,747 matched at 12 loci.”

In my opinion, this is a rather misleading, or at least confusing, way of presenting what was actually found in the study.  In Math on Trial, the authors also mention this study, and I think they do a better job of more clearly describing the matches that were found, and attempting to explain them.  (Although I admit that I disagree with some of their conclusions and arguments.)  Essentially, the heart of the issue is the Birthday Problem, and whether such random matches in large databases are a symptom of a dangerously flawed law enforcement practice, or whether they are an expected (although perhaps unintuitive) but irrelevant mathematical anomaly.


  1. Schneps, L. and Colmez, C., Math on Trial: How Numbers Get Used and Abused in the Courtroom, New York: Basic Books, 2013.
  2. Devlin, K., Devlin’s Angle, “Damned lies” (blog post), October 2006. [HTML]
  3. Devlin, K. Devlin’s Angle, “DNA math and the end of innocence” (blog post), January 2007. [HTML]
  4. Devlin, K. Devlin’s Angle, “Bad Math, Bad Thinking: the BMI and DNA Identification Revisited” (blog post), February 2011. [HTML]
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.