This past week, I was presented with what I thought was a very interesting problem: what is the optimal strategy in the game Tic-Tac-Toe?
Okay, so maybe that’s not actually a very interesting problem. Although I think it can be a great first “game AI” programming exercise for students, it is well known that the game is “fair” in the sense that optimal play by both players results in a draw.
But motivated by the recent revival of the British television game show Celebrity Squares (which is in turn based on the original American show Hollywood Squares), let’s incrementally add a few twists to Tic-Tac-Toe that do make things more interesting:
First, make moves “probabilistically.” That is, when a player chooses a square in which to place his marker (say X), then with probability the move is actually successful, but with probability he “loses” the square and his opponent places her marker (O) in that square instead. This models the situation on the television game show, where a player must essentially know the correct answer to a trivia question to successfully place his mark in a chosen square. By fixing we are effectively assuming that the two players are evenly matched, and that all questions are of equal difficulty, with larger values of corresponding to easier questions (so that when the game reduces to the original deterministic Tic-Tac-Toe).
How fair is this game? That is, as a function of , what is the optimal strategy and resulting expected value for each player (assuming the loser pays the winner a dollar)?
Forbidden “stolen” wins
An additional rule in the television game show is that a player can only win while “in control” of the board. That is, for example, if player X answers a question incorrectly, then an O is placed in the square… unless the result would be an immediate win for player O, in which case the square remains unmarked, and player X merely loses his turn.
This rule is interesting not only because of the effect it has on the fairness of the game, but also because it introduces a tricky wrinkle in computing strategies and expected values. The problem is that it is now possible for game states to repeat; for example, if both players can win on their next move with the same square, and player X answers incorrectly, followed by player O also answering incorrectly, then we are right back where we started.
First to five squares
Finally, note that so far it has still been possible for a game to end in a draw. To prevent the uncomfortable awkward silence that I imagine resulting from a tie at the end of a game show (Jeopardy! being an exception), there is one additional rule that guarantees an outright winner: a player can win by either getting three in a row, or by being the first to mark any five squares.
(For completeness, we can also consider a slight variation of this rule that was in effect in the early episodes of the American show, where a player could only win with five or more squares once the board was full– or equivalently, only once his opponent had no potential three in a row opportunities.)
Taking all of these rules together– or considering various combinations of them– what is the resulting optimal strategy and expected value for this game? I’m not sure yet… but it seems like a fun problem to try to solve.