## Probabilistic Tic-Tac-Toe solution

This is a follow-up to the previous post about a Hollywood Squares-style game of probabilistic Tic-Tac-Toe, where a selected square is marked with the player’s symbol (say, X) with some fixed probability $p$, but with the opponent’s symbol (O) with probability $1-p$.  When $p=1$ this reduces to the standard game of Tic-Tac-Toe, in which neither player has an advantage; that is, optimal strategy by both players results in a draw.  But what if $p<1$?

Without any additional rules, analysis of this simplest form of the game is pretty straightforward.  The green curve in the figure below shows the first player’s advantage as a function of $p$ (assuming the loser pays the winner a dollar).

First player’s advantage vs. probability p of “success” of any given move.

Recall that the parameter $p$ effectively models the level of difficulty of the trivia questions asked in the television versions of the game.  Interestingly, hard questions seem to be significantly more beneficial for the second player than easy questions are for the first player.

As mentioned last week, the television games prevent ties with the additional rule that a player can win by either getting three in a row, or being the first to get any five squares.  The black curve above shows the effect of this incremental modification, which behaves about like we would expect.

Finally, the most interesting additional rule is that a player can’t “lose” the game by missing a question; he or she must “win” by answering correctly.  If a miss would result in a win for the opponent, the selected square remains empty and the player simply loses his turn.  In this case, we must be careful to handle potentially repeating game states: for example, suppose that the board looks like this:

On either player’s turn, he or she must answer correctly to win; a miss is simply a lost turn.

If player X misses on his turn, and player O also misses on her turn, then we are right back where we started.  We can eliminate this “infinite recursion” by expressing the value of the board for a given player in terms of itself, and solving:

$v = p + (1-p)(-p + (1-p)v)$

$v = \frac{p}{2-p}$

The blue and red curves show the effect of this rule, with and without the “first to five” tie-breaker rule; the blue curve represents the television version of the game.  The first player has an advantage for all but the most difficult questions, and has a huge advantage for relatively easy questions– it would be interesting to see an estimate of the “actual” value of $p$ based on the historical fraction of questions answered correctly in episodes of the show.

This entry was posted in Uncategorized. Bookmark the permalink.

This site uses Akismet to reduce spam. Learn how your comment data is processed.