# Probabilistic Tic-Tac-Toe solution

This is a follow-up to the previous post about a Hollywood Squares-style game of probabilistic Tic-Tac-Toe, where a selected square is marked with the player’s symbol (say, X) with some fixed probability $p$, but with the opponent’s symbol (O) with probability $1-p$.  When $p=1$ this reduces to the standard game of Tic-Tac-Toe, in which neither player has an advantage; that is, optimal strategy by both players results in a draw.  But what if $p<1$?

Without any additional rules, analysis of this simplest form of the game is pretty straightforward.  The green curve in the figure below shows the first player’s advantage as a function of $p$ (assuming the loser pays the winner a dollar).

First player’s advantage vs. probability p of “success” of any given move.

Recall that the parameter $p$ effectively models the level of difficulty of the trivia questions asked in the television versions of the game.  Interestingly, hard questions seem to be significantly more beneficial for the second player than easy questions are for the first player.

As mentioned last week, the television games prevent ties with the additional rule that a player can win by either getting three in a row, or being the first to get any five squares.  The black curve above shows the effect of this incremental modification, which behaves about like we would expect.

Finally, the most interesting additional rule is that a player can’t “lose” the game by missing a question; he or she must “win” by answering correctly.  If a miss would result in a win for the opponent, the selected square remains empty and the player simply loses his turn.  In this case, we must be careful to handle potentially repeating game states: for example, suppose that the board looks like this:

On either player’s turn, he or she must answer correctly to win; a miss is simply a lost turn.

If player X misses on his turn, and player O also misses on her turn, then we are right back where we started.  We can eliminate this “infinite recursion” by expressing the value of the board for a given player in terms of itself, and solving:

$v = p + (1-p)(-p + (1-p)v)$

$v = \frac{p}{2-p}$

The blue and red curves show the effect of this rule, with and without the “first to five” tie-breaker rule; the blue curve represents the television version of the game.  The first player has an advantage for all but the most difficult questions, and has a huge advantage for relatively easy questions– it would be interesting to see an estimate of the “actual” value of $p$ based on the historical fraction of questions answered correctly in episodes of the show.

# Probabilistic Tic-Tac-Toe

This past week, I was presented with what I thought was a very interesting problem: what is the optimal strategy in the game Tic-Tac-Toe?

Okay, so maybe that’s not actually a very interesting problem.  Although I think it can be a great first “game AI” programming exercise for students, it is well known that the game is “fair” in the sense that optimal play by both players results in a draw.

But motivated by the recent revival of the British television game show Celebrity Squares (which is in turn based on the original American show Hollywood Squares), let’s incrementally add a few twists to Tic-Tac-Toe that do make things more interesting:

Probabilistic moves

First, make moves “probabilistically.”  That is, when a player chooses a square in which to place his marker (say X), then with probability $p$ the move is actually successful, but with probability $1-p$ he “loses” the square and his opponent places her marker (O) in that square instead.  This models the situation on the television game show, where a player must essentially know the correct answer to a trivia question to successfully place his mark in a chosen square.  By fixing $p$ we are effectively assuming that the two players are evenly matched, and that all questions are of equal difficulty, with larger values of $p$ corresponding to easier questions (so that when $p=1$ the game reduces to the original deterministic Tic-Tac-Toe).

How fair is this game?  That is, as a function of $p$, what is the optimal strategy and resulting expected value for each player (assuming the loser pays the winner a dollar)?

Forbidden “stolen” wins

An additional rule in the television game show is that a player can only win while “in control” of the board.  That is, for example, if player X answers a question incorrectly, then an O is placed in the square… unless the result would be an immediate win for player O, in which case the square remains unmarked, and player X merely loses his turn.

This rule is interesting not only because of the effect it has on the fairness of the game, but also because it introduces a tricky wrinkle in computing strategies and expected values.  The problem is that it is now possible for game states to repeat; for example, if both players can win on their next move with the same square, and player X answers incorrectly, followed by player O also answering incorrectly, then we are right back where we started.

First to five squares

Finally, note that so far it has still been possible for a game to end in a draw.  To prevent the uncomfortable awkward silence that I imagine resulting from a tie at the end of a game show (Jeopardy! being an exception), there is one additional rule that guarantees an outright winner: a player can win by either getting three in a row, or by being the first to mark any five squares.

(For completeness, we can also consider a slight variation of this rule that was in effect in the early episodes of the American show, where a player could only win with five or more squares once the board was full– or equivalently, only once his opponent had no potential three in a row opportunities.)

Taking all of these rules together– or considering various combinations of them– what is the resulting optimal strategy and expected value for this game?  I’m not sure yet… but it seems like a fun problem to try to solve.