(Cool word, huh? 18 points not counting a double or triple word score.)
I was reading USA Today a couple of weeks ago… not on purpose, really. We were on vacation, and it was delivered to our hotel room every morning. So I couldn’t not read it. Anyway, there was a teaser line for the sports section referring to the baseball statistic BABIP. After some research, I found that BABIP stands for “batting average on balls in play.” We’ll get to what that means shortly. First, though, baseball is already arguably the most mathematical– ok, at least statistical– of all sports. That baseball analysts would continue to develop more useful metrics is not surprising. What was amusing to learn is that such analysis has a fancy name that I had not heard before, sabermetrics, which is derived from the acronym for the Society for American Baseball Research.
The idea is that some common existing statistics such as batting average may be intuitive and easy to calculate, but not necessarily very good indicators of a player’s contribution to his team’s winning of games. Sabermetrics seems to be an attempt to bring some slightly more sophisticated mathematics into the analysis.
Batting average on balls in play (BABIP) is one interesting and relatively simple example. The formula for BABIP is (H – HR) / (AB – K – HR + SF), where H is hits, AB is at bats, HR is home runs, K is strikeouts, and SF is sacrifice flies. My clumsy description of this is: how frequently does the batter get a hit in those situations where the fielding team has an opportunity to screw up?
More precisely, the denominator in the formula counts plate appearances where the batter puts the bat on the ball (i.e., doesn’t strike out, walk, get hit by a pitch, etc.), and the fielding team has a chance to make an error, which excludes home runs, but includes sacrifice flies which technically are not considered at bats. The numerator counts the subset of those situations that result in a hit.
This statistic and the literature discussing it are interesting in that BABIP seems to be intended to identify streaks of “good luck” and “bad luck” rather than to compare performance relative to other batters. That is, a BABIP of about .300 is considered “normal,” invariant with respect to the particular batter being evaluated, and significant deviations from this in either direction suggest not a better or worse batter, but a batter that is “lucky” or “unlucky” and should expect a compensating reversal of fortune.
As another example of a “sabermetric” baseball statistic, I read an AMS article recently titled “Baseball and Markov Chains: Power Hitting and Power Series,” by John P. D’Angelo. The article describes the notion of Markov runs as a measure of a batter’s performance. Suppose that the batter of interest is the only player on the team, with “clones” of himself for every plate appearance, and that he bats randomly based on his statistics (i.e., he strikes out some fraction of the time, is walked some fraction of the time, or gets a single, or a double, etc.). How many runs will his “team” score in a game? I recommend reading the article; there are not just interesting and readable mathematics, but some cool anecdotal results for the baseball fan as well.