CAPTCHAs (and reCAPTCHAs)

I learned about an interesting application of CAPTCHAs this past week.  Good timing, too, since CAPTCHAs make a nice follow-on to last week’s post about the Turing Test and interactive proof systems.

First, what is a CAPTCHA?  Last week, I described the Turing Test as a thought experiment.  A CAPTCHA is a practical realization of the Turing Test– with a slight twist– that is used thousands of times per day.  You have seen them online, even if you have not heard of them: when you login to some email accounts, or buy concert tickets, or do your online banking, you are sometimes confronted with a small, slightly distorted image of some text, usually just a word or two, and are asked to type in the words that you see.  Wikipedia has some images of past and current examples of CAPTCHAs.

CAPTCHA is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.”  As its name suggests, a CAPTCHA is basically a Turing Test, designed to ensure that it is in fact you, a human being, buying those concert tickets, and not a “bot,” or computer program trying to buy a large block of tickets to re-sell later.

The idea is that the test should be something that a human can respond to relatively easily and quickly, but that a computer (currently) cannot.  As described above, this test is usually of the form “read the word(s) in this distorted image.”  Humans are pretty good at doing this, while the current state of the art in image processing is not very good at “reading” images that have been distorted in some ways.

The “twist” mentioned earlier refers to the fact that in this case, it is a computer (the web server) that is performing the test, trying to identify another computer “acting” like a human.  Because a computer is performing the test, the test must be automated; that is, the web server must be able to generate many different instances of the test automatically, to present to each of the many humans– and bots– accessing the site.

Ok, so far so good.  The cool new interesting thing that I learned was how CAPTCHAs are being used to not only protect web applications, but to digitize books… at the same time.  Check out the reCAPTCHA web site; this is a particular flavor of CAPTCHA that seems to be becoming the de facto standard.  It uses not just one but two words in the image.  One word is the actual pass/fail challenge; the other word comes from a book that is being digitized or “read” by character recognition software.  This second word was “confusing” in some way, and was unable to be read correctly by the software.  (The fact that character recognition algorithms can typically “know” when they can’t read something correctly is an interesting topic in its own right.)

The CAPTCHA uses the human test subject to help the reading process along.  If the human (or bot) fails the “real” test, then the submission is rejected, fulfilling the main purpose of the CAPTCHA.  If he/she/it gets it right, though, it is assumed that he/she/it probably read the other word correctly as well, and this response is used to resolve the confusion.

Turing’s Law of Robotics?

“Any sufficiently advanced algorithm is indistinguishable from human intelligence.”

Alan Turing began his 1950 paper, “Computing Machinery and Intelligence,” with the proposal “to consider the question, ‘Can machines think?'”  He went on to describe what is now commonly called the Turing Test as a way to characterize and identify artificial intelligence.  With apologies to Arthur C. Clarke, I think the statement above is a concise way to express the motivation for and structure of the test.

To set the stage for the discussion, suppose that you are confronted with a robot that looks, walks, talks, and acts just like a human.  Is the robot thinking?  Does it have intelligence?  Does it have the same sense of “me” that you or I have?  Does it have a soul?

(I find these questions interesting in part because people often have very reflexive responses to them.  If a robot isn’t human, then it might be able to play a good game of chess, or even perhaps fly an airplane, but it can’t possibly feel love, or pain, or anger… can it?)

One imagines that, to answer these questions, philosophers, scientists, and especially mathematicians would attempt to provide very precise, measurable, and most likely very complex and technical criteria that must be satisfied.  Turing’s proposed test is surprisingly simple, however.  It boils down to a challenge: hold a conversation with the robot and another real human subject… and try to determine which is which.  If you guess incorrectly, then the robot has demonstrated behavior sufficient to be called “intelligent.”

(This is both more and less complex than the test as it is usually described.  It is more complex in that the “conversation” is usually restricted to only text communication using a keyboard and monitor.  It is less complex in that multiple conversations and “guesses” are needed to make a conclusion with some statistical certainty.  As a thought experiment, I think the former difference is unnecessarily restrictive, and the latter is a technical point we can safely ignore for this discussion.)

This post was motivated by a 2006 paper by Stuart Shieber titled “Does the Turing Test Demonstrate Intelligence or Not?”  It’s very short, just four pages, and does a very good job of summarizing not only the Turing Test itself, but also arguments from both sides of the fence regarding the test’s sufficiency as a means of demonstrating artificial intelligence.  It also has some interesting mathematics (that’s how I stumbled on the paper in the first place), which if you are interested is described in more detail in Shieber’s longer follow-on paper, “The Turing Test as Interactive Proof.”

The papers make for interesting reading, but I think the critical point is made well upstream of the more sophisticated mathematics.  When trying to argue whether a particular mass of flesh or of wires is– or is not– thinking, or even feeling, I think the test is as simple as Turing suggested: if you can’t tell the difference, then what is the difference?

Skyboxes and Xboxes

My most recent project started out pretty simply.  I bought a wireless Xbox 360 controller for Windows.  The “for Windows” part was why I bought it; I don’t have an Xbox 360.  But Microsoft provides what they call the XInput interface to the controller, making it very easy for someone off the street to write programs for the PC that accept input from the controller.

This sounded like fun, and the interface itself turned out to be very simple… but then I needed an application, something to actually do with the controller.  This didn’t take too much thought.  I remember what was then, and even now almost 20 years later still is, one of the best video games for the PC, Star Wars: X-Wing.  This was actually a pretty straightforward space flight simulator, but with great game play, story line, and sound effects and music.

As usual, I am not really interested in game development as much as I am in the interesting mathematical and programming problems that live behind the scenes, so to speak.  But I thought it would be interesting to write a simple space flight simulator similar to X-Wing, using the Xbox controller for the joystick input.

You can download the result here, which includes instructions, source code, and Windows executables.  But also as usual, this should compile on *nix platforms as well, wherever FreeGLUT/OpenGL is supported.  (The original motivation for the project was to use the Windows-specific XInput interface to get at all 6 axes and 14 or so buttons on the controller.  But ever a fan of portability, I ended up sticking with GLUT’s more generic joystick interface, which should work with other controllers on other platforms.  I also included keyboard controls to play around without a joystick.)

This all started with a desire to experiment with simple joystick input.  But it soon turned into an experiment with a neat 3D graphics technique that I had not heard of before, the “skybox.”  The idea is very simple, as most good ideas are.  If you want to render an image from the cockpit of a spaceship, then in addition to the other nearby spaceships, or obstacles, or whatever else must be drawn in the field of view, adding distant stars helps a lot to provide a sense of rotational motion.

One approach to doing this would be to include hundreds of individual distant star “objects” in the scene.  This is not very efficient, since it requires performing the coordinate transformations from world to camera coordinates for every star, for every frame of animation.

A skybox is a simpler, much more efficient, and just as effective solution.  It involves texture mapping, or rendering a flat 2D bitmap image onto the polygonal surface of a 3D object (think of pasting a decal onto a model).  Graphics cards can perform texture mapping very efficiently, even if the 2D image is very complex.

A skybox is just a texture image of a star field, mapped onto the inside faces of a cube.  Instead of transforming and drawing hundreds of individual stars every frame, we draw hundreds of stars on six flat black images ahead of time.  Then every frame, we just texture map the six images onto the inside of a cube that translates– but does not rotate– around with the spaceship.  We draw the star field before drawing any other “nearby” objects, so the stars will be appropriately occluded.

A couple of final notes about the “bad guy” TIE fighters in the simulator.  First, I chose TIE fighters because (a) I had Star Wars in mind the whole time, and (b) a TIE fighter works well as a simple wireframe model, particularly the wing solar panels.  Second, the “autopilot” for the computer-flown fighters was mostly an afterthought, but is an interesting mathematical problem in itself.  I made up some extremely simple rules that led to reasonable, if somewhat drunken, flight characteristics.

Remembering Martin Gardner

Martin Gardner died last month, on 22 May 2010.  He was 95 years old, and in the words of Elwyn Berlekamp, John H. Conway, and Richard Guy, he “brought more math to more millions than anyone else.”  I know he had a lot of influence on me when I was growing up, and having read the last few weeks’ worth of blogs, columns, and articles, it appears many others felt the same.  Gardner was also, in my opinion, a first-rate scientist and arguably the original skeptic, with a very interesting and thoughtful perspective on the relationship between reason and religion.

But most importantly, he loved mathematics.  He particularly loved mathematics “for fun”– what no one but other mathematicians could manage to refer to with a straight face as “recreational mathematics.”  This includes the mathematics involved in playing games, magic tricks, gambling… and mathematical puzzles in particular.

For someone who enjoys mathematical puzzles, and has enjoyed them for some time, finding new and interesting ones can be difficult.  (E.g., Alice and Bob are locked in a room with three doors and two light switches on the wall.  They are each wearing a hat that is red or blue.  What color is the bear?  That should cover most of the classics.)  A new puzzle, or even an interesting variant of an old puzzle, is a wonderful thing.

I am not a very eloquent writer, so I will not attempt to eulogize Martin Gardner here.  Instead, I want to share something that I found this past week while reading what others had to say about him: another interesting puzzle that I have not seen before.  It would certainly be in the spirit of Mr. Gardner’s influence if just a reader or two found it interesting as well.

The problem comes from an MAA column from just last year, dedicated to Martin Gardner for his 95th birthday.  It caught my eye because it reminded me of one of my favorite puzzles, a card trick with some very interesting mathematics in it.  Before getting to the new puzzle, let me describe the old one:

A magician asks a spectator to choose five cards from a standard 52-card poker deck.  The magician takes the chosen cards, then hands four of them, one at a time, back to the spectator who announces each card in turn to the magician’s assistant who is blindfolded.  After hearing the four cards, the assistant immediately announces the fifth chosen card.  How do they do it?

The trick is not terribly difficult to learn, but is a remarkable effect when executed well.  More importantly, the trick and its generalizations are a great vehicle for many different mathematical ideas, from the pigeonhole principle to saturating matchings in graphs.  (For the mathematically inclined, consider the magician handing back to the spectator k of m cards selected from a deck of n, and the assistant naming the remaining m-k cards.  For what values of (n,m,k) can the trick be done?)

Now on to the new puzzle.  A magician hands a spectator a standard 52-card poker deck and asks him to select five cards.  The spectator hands the five cards to the magician, who inspects them and places them all face up in a row on a table.  The magician then asks the spectator to select one of the five cards, and remove and replace it with another card selected from the remainder of the deck… with the only requirement being that the selected card and its replacement be different colors (i.e., replace red with black or vice versa).

At this point the spectator is asked to call the magician’s assistant on the phone, and to read aloud the five cards as they appear on the table, in his own words and with no communication from the magician.  After hearing the five cards, the assistant is able to name which of the five cards is the replacement.  How does he do it?

Hmmm…

Reference: Colm Mulcahy, “Poker-Faced Over the Phone,” the MAA column where I found the description of the puzzle.  I am tacking this at the end of the post because I am pretty sure that it includes a solution to the problem, and I don’t want to spoil it for readers.

Sabermetrics

(Cool word, huh?  18 points not counting a double or triple word score.)

I was reading USA Today a couple of weeks ago… not on purpose, really.  We were on vacation, and it was delivered to our hotel room every morning.  So I couldn’t not read it.  Anyway, there was a teaser line for the sports section referring to the baseball statistic BABIP.  After some research, I found that BABIP stands for “batting average on balls in play.”  We’ll get to what that means shortly.  First, though, baseball is already arguably the most mathematical– ok, at least statistical– of all sports.  That baseball analysts would continue to develop more useful metrics is not surprising.  What was amusing to learn is that such analysis has a fancy name that I had not heard before, sabermetrics, which is derived from the acronym for the Society for American Baseball Research.

The idea is that some common existing statistics such as batting average may be intuitive and easy to calculate, but not necessarily very good indicators of a player’s contribution to his team’s winning of games.  Sabermetrics seems to be an attempt to bring some slightly more sophisticated mathematics into the analysis.

Batting average on balls in play (BABIP) is one interesting and relatively simple example.  The formula for BABIP is (H – HR) / (AB – K – HR + SF), where H is hits, AB is at bats, HR is home runs, K is strikeouts, and SF is sacrifice flies.  My clumsy description of this is: how frequently does the batter get a hit in those situations where the fielding team has an opportunity to screw up?

More precisely, the denominator in the formula counts plate appearances where the batter puts the bat on the ball (i.e., doesn’t strike out, walk, get hit by a pitch, etc.), and the fielding team has a chance to make an error, which excludes home runs, but includes sacrifice flies which technically are not considered at bats.  The numerator counts the subset of those situations that result in a hit.

This statistic and the literature discussing it are interesting in that BABIP seems to be intended to identify streaks of “good luck” and “bad luck” rather than to compare performance relative to other batters.  That is, a BABIP of about .300 is considered “normal,” invariant with respect to the particular batter being evaluated, and significant deviations from this in either direction suggest not a better or worse batter, but a batter that is “lucky” or “unlucky” and should expect a compensating reversal of fortune.

As another example of a “sabermetric” baseball statistic, I read an AMS article recently titled “Baseball and Markov Chains: Power Hitting and Power Series,” by John P. D’Angelo.  The article describes the notion of Markov runs as a measure of a batter’s performance.  Suppose that the batter of interest is the only player on the team, with “clones” of himself for every plate appearance, and that he bats randomly based on his statistics (i.e., he strikes out some fraction of the time, is walked some fraction of the time, or gets a single, or a double, etc.).  How many runs will his “team” score in a game?  I recommend reading the article; there are not just interesting and readable mathematics, but some cool anecdotal results for the baseball fan as well.

Miracles happen every day

… as long as you don’t specify beforehand what miracle you might be interested in observing.

A very good friend of mine, nomasir, had an interesting post recently on his blog, referring to a British news article about a grandfather, father, and son who were all three born on the same day of the year.  The article suggested that this was a very unlikely occurrence.

Just how unlikely was the matter of debate in the blog post… and that discussion reminded me of an interesting article by Persi Diaconis, “Statistical Problems in ESP Research” (Science, July 14, 1978, 201(4351):131-136).  Diaconis has contributed a lot of interesting mathematics, with applications ranging from magic tricks to card shuffling to coin flipping to, in this case, evaluating– and usually debunking– claims of ESP (extra-sensory perception).

I had read the article before, but in the process of searching for it, I also stumbled across something I had not seen before: a series of letters in response to the article by several apparently less-than-impressed researchers in the field, along with a response from Diaconis.  I recommend both the article and the letters; it’s an interesting exchange, short and readable even for the non-mathematician.

My point of this post deals with what Diaconis refers to in the article as the “problem of multiple end points.”  (The relevant section of the article involves the experimental subject referred to as B.D.)  In the context of the coincidental birthdays mentioned above, we can calculate the (small) probability of that particular event… but is that probability really interesting or useful after the event has occurred, when we didn’t “bet on” the event before it occurred?

Suppose that, instead of three generations of men all having the same birthday, they had all died on the same day of the year.  Or maybe it was a grandmother instead of a grandfather, or daughter instead of a son, etc.  Or maybe the three birthdays were all within just one or two days of each other.  These would all still be “interesting” occurrences.  Now– what is the probability that some one (or more) of these various “interesting” events might occur?

Quoting Diaconis again, “the odds against a coincidence of some sort are dramatically less than those against any prespecified particular one of them.”

I am reminded of the story of David Wilkerson, a man who claimed to have been inspired to make hundreds of peanut butter sandwiches on the night of 10 September 2001, which were then fed to rescue workers and victims after the terrorist attacks, the idea being that such inspiration was somehow a prediction of the attacks.  (This claim has since been discredited, but that’s not the point, so let’s suppose for example that it did actually happen.)  As one news story at the time described it, “What were [the sandwiches] for?  Who would eat them?  That part wasn’t clear [my emphasis]…”

Now suppose that tonight you stay up all night making peanut butter sandwiches.  What is the probability that you will not find some good use for them, even if it’s not in response to an unnatural disaster with global impact?