Probabilistic Tic-Tac-Toe

This past week, I was presented with what I thought was a very interesting problem: what is the optimal strategy in the game Tic-Tac-Toe?

Okay, so maybe that’s not actually a very interesting problem.  Although I think it can be a great first “game AI” programming exercise for students, it is well known that the game is “fair” in the sense that optimal play by both players results in a draw.

But motivated by the recent revival of the British television game show Celebrity Squares (which is in turn based on the original American show Hollywood Squares), let’s incrementally add a few twists to Tic-Tac-Toe that do make things more interesting:

Probabilistic moves

First, make moves “probabilistically.”  That is, when a player chooses a square in which to place his marker (say X), then with probability $p$ the move is actually successful, but with probability $1-p$ he “loses” the square and his opponent places her marker (O) in that square instead.  This models the situation on the television game show, where a player must essentially know the correct answer to a trivia question to successfully place his mark in a chosen square.  By fixing $p$ we are effectively assuming that the two players are evenly matched, and that all questions are of equal difficulty, with larger values of $p$ corresponding to easier questions (so that when $p=1$ the game reduces to the original deterministic Tic-Tac-Toe).

How fair is this game?  That is, as a function of $p$, what is the optimal strategy and resulting expected value for each player (assuming the loser pays the winner a dollar)?

Forbidden “stolen” wins

An additional rule in the television game show is that a player can only win while “in control” of the board.  That is, for example, if player X answers a question incorrectly, then an O is placed in the square… unless the result would be an immediate win for player O, in which case the square remains unmarked, and player X merely loses his turn.

This rule is interesting not only because of the effect it has on the fairness of the game, but also because it introduces a tricky wrinkle in computing strategies and expected values.  The problem is that it is now possible for game states to repeat; for example, if both players can win on their next move with the same square, and player X answers incorrectly, followed by player O also answering incorrectly, then we are right back where we started.

First to five squares

Finally, note that so far it has still been possible for a game to end in a draw.  To prevent the uncomfortable awkward silence that I imagine resulting from a tie at the end of a game show (Jeopardy! being an exception), there is one additional rule that guarantees an outright winner: a player can win by either getting three in a row, or by being the first to mark any five squares.

(For completeness, we can also consider a slight variation of this rule that was in effect in the early episodes of the American show, where a player could only win with five or more squares once the board was full– or equivalently, only once his opponent had no potential three in a row opportunities.)

Taking all of these rules together– or considering various combinations of them– what is the resulting optimal strategy and expected value for this game?  I’m not sure yet… but it seems like a fun problem to try to solve.

Solving a corn maze

Every fall for the last several years, my wife and I have walked through a corn maze with our nephews.  It’s a lot of fun; the kids get a paper copy of the maze to help them navigate, and there are a series of numbered checkpoints to find, each with a themed quiz question to answer.  (I thought this was a pretty good idea, giving a clear sense of continuing progress, rather than just wandering through a maze of twisty little passages, all alike.)

But what if you don’t have a map of the maze?  There are many algorithms for “solving” a maze, with various requirements on the structure of the maze and the assumed ability of the solver to “record” information about parts of the maze already explored.  Some only work on a grid, while others, such as wall-following, are only guaranteed to work if the maze is simply connected– or in graph-theoretic terms, the graph representing the rooms (vertices) and passages (edges) of the maze must not contain any cycles.

Suppose instead that you find yourself in a maze of rooms and connecting passages like the one shown below, whose corresponding graph is neither a grid, nor acyclic.

A maze with 20 rooms and 30 connecting passages.

In fact, the maze need not even be planar (think catacombs with passages running over/under each other), and it may contain loops (passages that lead you back to the room you  just left) or multiple passages between the same pair of rooms.  The only requirement is that each passage has two ends– imagine a door at each end leading into a room… although as in the case of a loop, both doors may lead into the same room.

Now imagine exploring the maze by following these rules:

1. If you are in a room with an unmarked door (as you will be at the start), pick one, mark an X on it, and traverse the corresponding passage.
2. When first entering a room with all doors unmarked, mark a Y on the door of entry.
3. If you are in a room with all doors marked, exit via the door marked Y if there is one, otherwise stop.

This algorithm, due to Tarry (see reference below), will visit every room in the (connected) maze, traversing every passage exactly twice, once in each direction… ending in the same room where you started!  Proving this is a nice problem involving induction.  And the doors marked with a Y have the nice property of essentially marking a path from any room back to the starting point.  (As an interesting side effect, suppose that we modify rule 3 slightly, so that before exiting a door marked Y, we remove or erase all of the marks on the doors in that room.  The result is that, at the end, we have still visited every room in the maze, but erased any trace that we were there!)

Tarry’s algorithm has a lot in common with Trémaux’s algorithm, although I think this version is more explicitly “local,” in the sense that I can imagine actually executing Tarry’s algorithm in, say, a corn maze, with appropriate markings on the “doors” at intersections, as opposed to needing to mark the passages between intersections.

References:

1. Tarry G., Le problème des labyrinthes.  Nouv. Ann. Math., 14 (1895), 187-190

Posted in Uncategorized | 2 Comments

Projectile motion puzzle solution

This post captures my notes on the following problem from the last post:

Problem: Suppose that you are standing on the outer bank of a moat surrounding a castle, and you wish to secretly deliver a message, attached to a large rock, to your spy inside the castle.  A wall h=11 meters high surrounds the castle, which is in turn surrounded by the moat which is d=19 meters wide.  At what angle should you throw the rock in order to have the best chance of clearing the wall?

Solution: A projectile thrown from the origin with initial speed $v$ at an angle $\theta$ has a trajectory described by

$x(t) = v t \cos \theta$

$y(t) = v t \sin \theta - \frac{1}{2} g t^2$

Setting $x(t)=d$ and $y(t)=h$ and solving for $v$ and $t$, we find the following expression for the required initial speed as a function of the throwing angle:

$v(\theta)^2 = \frac{g d^2}{2 \cos^2 \theta (d \tan \theta - h)}$

We can minimize this required initial speed by maximizing the denominator.  Differentiating and setting equal to zero yields

$\tan 2\theta = -\frac{d}{h}$

Keeping in mind that the throwing angle is between 0 and 90 degrees, we can rewrite this expression as

$\theta = \frac{\phi + \pi/2}{2}$

where $\phi$ is the angle of the line of sight to the top of the wall.  The geometric interpretation of this result is: to have the best chance of clearing the wall, throw at an angle halfway between the line of sight to the top of the wall and the vertical.  In the original problem, this optimal throwing angle is about 60.03 degrees.

Recall, however, that this analysis assumes negligible effects of air resistance.  This is a safe assumption for a relatively heavy rock at human-throwable speeds, but not for a baseball.  If we apply this same analysis to Jackie Bradley, Jr.’s throw from home plate over the center field wall, 420 feet away and 17 feet high, we get an optimal throwing angle of about 46.2 degrees, with a minimum initial speed of only about 80.9 miles per hour, which is not terribly difficult to achieve.

To more accurately model the trajectory of a thrown (or batted) baseball, we must incorporate the effects of not only drag, which slows the ball down, but also the Magnus force, which “lifts” the ball causing it to “rise” and/or curve.  The references below describe several interesting experiments tracking thrown and hit baseballs, including some useful approximation formulas for accurately modeling these trajectories, yielding my estimate from the last post that Bradley must have thrown the ball at over 105 miles per hour, with an optimal throwing angle of just a little over 30 degrees.

Since this is the sort of problem that I think is excellent for students to attack via computer simulation, following are the relevant formulas and constants collected in one place for them to use:

The acceleration of a thrown or batted baseball due to gravity, drag, and lift is given by

$a = g + a_D + a_L$

$a_D = -\frac{1}{2m}\rho v^2 A C_D \hat{v}$

$a_L = \frac{1}{2m}\rho v^2 A C_L (\hat{\omega} \times \hat{v})$

where:

• Acceleration due to gravity $|g| = 9.80665$ meters/second^2
• Mass $m = 5.125$ ounces (the midpoint of the regulation tolerance)
• Air density $\rho = 1.225$ kilograms/meter^3 (sea level on a standard day)
• $v$ and $\omega$ are the magnitudes of velocity and angular velocity, respectively, with $\hat{v}$ and $\hat{\omega}$ being corresponding unit vectors in the same direction
• $A = \pi r^2$ is the cross-sectional area of the baseball, where the circumference $2\pi r = 9.125$ inches (the midpoint of the regulation tolerance)
• Coefficient of drag $C_D = 0.35$ (see Reference 3)
• Coefficient of lift $C_L = S/(0.4+2.32S)$, where $S = r\omega/v$ is the “spin parameter” (see Reference 2)
• Backspin on a typical fastball $\omega = 1500$ rpm (or 2000 rpm for a typical batted ball; see Reference 1)

References:

1. A. Nathan, The effect of spin on the flight of a baseball, American Journal of Physics, 76:2 (February 2008), 119-124 [PDF]

2. A. Nathan, What New Technologies Are Teaching Us About the Game of Baseball, October 2012 [PDF]

3. D. Kagan and A. Nathan, Simplified Models for the Drag Coefficient of a Pitched Baseball, The Physics Teacher, 52 (May 2014), 278-280 [PDF]

Posted in Uncategorized | 2 Comments

Projectile motion puzzle

This problem is inspired by Jackie Bradley, Jr., outfielder for the Boston Red Sox, who last week in warm-up threw a baseball from near home plate over the 17-feet-high wall in deep center field, 420 feet away.  (Here is a video clip of the throw.)

It’s a pretty amazing throw… but just how amazing is it?  That is, how hard would you have to throw a baseball to clear a 17-foot wall 420 feet away?

This is an interesting question in its own right, with the usual appeal of encouraging both pen-and-paper as well as computer simulation for a solution.  I’ll get to the answer shortly– but while working on it, I encountered an interesting relationship between some of the variables that has a nice geometric interpretation, which I think is best illustrated with the following slightly different version of the problem:

Problem: Suppose that you are standing on the outer bank of a moat surrounding a castle, and you wish to secretly deliver a message, attached to a large rock, to your spy inside the castle.  A wall h=11 meters high surrounds the castle, which is in turn surrounded by the moat which is d=19 meters wide.  At what angle should you throw the rock in order to have the best chance of clearing the wall?

At what angle should you throw an object to clear a wall 19 meters away and 11 meters high?

The intent of the large rock is to allow us to ignore the relatively negligible effects of air resistance, thus preventing the calculus problem from becoming a differential equations problem.

We can’t afford to do that with a baseball, though.  Coming back to the original problem at Fenway Park, there are two important atmospheric effects to consider.  First, air resistance significantly increases the speed at which Bradley must have thrown the ball to clear the outfield wall.  But second, the Magnus force resulting from backspin on the ball (also responsible for curve balls and surprisingly hard-to-catch pop-ups) actually makes the ball travel farther, thus decreasing the required speed compared with a ball thrown with no backspin.

Accounting for both of these effects, by my calculations (which I can share if there is interest), Bradley would have had to throw the ball at over 105 miles per hour, at an angle of a little over 30 degrees.

Using a watch– or a stick– as a compass

A couple of years ago, I wrote about a commonly cited method of direction-finding using an analog watch and the sun.  Briefly, if you hold your watch face horizontally with the hour hand pointing toward the sun, then the ray halfway between the hour hand and 12 noon points approximately true south.  (This is for locations in the northern hemisphere; there is a slightly different version that works in the southern hemisphere.)

The punch line was that the method can be extremely inaccurate, with errors potentially exceeding 80 degrees depending on the location, month, and time of day.  I provided a couple of figures, each for a different “extreme” location in the United States, showing the range of error in estimated direction over the course of an entire year.

Unfortunately, I ended on that essentially negative note, without considering any potentially more accurate methods as an alternative.  This post is an attempt to remedy that.  In recent discussion in the comments, Steve H. suggested analysis of the use of the “shadow-stick” method: place a stick vertically in the ground, and mark the location on the ground of the tip of the stick’s shadow at two (or more) different times.  The line containing these points will be roughly west-to-east.

Illustration of the shadow-stick method of direction-finding. With a stick placed vertically in the ground, the tip of the stick’s shadow moves roughly from west to east.

As the following analysis shows, this shadow-stick method of direction-finding is indeed generally more accurate than the watch method… most of the time, anyway.  But even when it is better, it can still be bad.  It turns out that both methods are plagued with some problems, with the not-so-surprising conclusion that if you need to find your way home, there is a tradeoff to be made between accuracy and convenience.

One of the problems with my original presentation was condensing the behavior of the watch method over an entire year into a single plot (in this case, at Lake of the Woods in Minnesota, at a northern latitude where the watch method’s accuracy is best).  This clearly shows the performance envelope, i.e. the maximum possible error over the whole year, but it hides the important trending behavior within each day, and how that daily trend changes very gradually over the year.  We can see this more clearly with an animation: the following shows the same daily behavior of error in estimated direction using the watch method (in blue), but also the shadow-stick method (in red), over the course of this year.

Accuracy of the watch method (blue) and shadow-stick method (red) of direction-finding, over the course of the year 2014 in Lake of the Woods, Minnesota. The shadow-stick method is more accurate 40.6% of the time.

For reference, following are links to a couple of other animations showing the same comparison at other locations.

• Florida Keys (a southern extreme, where the watch method performs poorly, included in the original earlier post)

There are several things to note here.  First, this is an example where the shadow-stick method can actually perform significantly worse than the watch method.  Its worst-case behavior is near the solstices in June and December, with errors exceeding 30 degrees near sunrise and sunset.  This worst-case error increases with latitude, which is the opposite of how the watch method behaves, as shown by the Florida Keys example above.

However, note the symmetry in the error curve for the shadow-stick method.  It always passes from an extreme in the morning, to zero around noon, to the other extreme in the evening.  We can exploit this symmetry… if we are willing to wait around a while.  That is, we could improve our accuracy by making a direction measurement some time in the morning before noon, then making another measurement at the same time after noon, and using the average of the two as our final estimate.  (A slightly easier common refinement of the shadow-stick method is to (1) mark the tip of the shadow sometime in the morning, then (2) mark the shadow again later in the afternoon when the shadow is the same length.  The basic idea is the same in either case.)

Finally, this issue of the length of time between measurements is likely an important consideration in the field.  A benefit of the watch method is that you get a result immediately; look at the sun, look at your watch, and you’re off.  The shadow-stick method, on the other hand, requires a pair of measurements, with some waiting time in between.  How long are you willing to wait for more accuracy?

Interestingly, the benefit of that additional waiting time isn’t linear– that is, all of the data shown here assumes just 15 minutes between marking the stick’s shadow.  Waiting longer can certainly reduce the effect of measurement error (i.e., the problem of using cylindrical sticks and spherical pebbles, etc., instead of mathematical line segments and points) by providing a longer baseline… but the inherent accuracy of the method only improves significantly when the two measurement times span apparent noon, as in the refinement above, which could take hours.

To wrap up, I still do not see a way to condense this information into a reasonably simple, easy-to-remember, expedient method for finding direction in the field without a compass.  The regular, symmetric behavior of the error in the shadow-stick method suggests that we could possibly devise an “immediate” method of eliminating most of that error, by considering the extent and sense of the error as a function of the season, and a “scale factor” as a function of the time until/since noon… but that starts to sound like anything but “simple and easy-to-remember.”

Posted in Uncategorized | 4 Comments

The Price Is Right puzzle solution

This is just a quick follow-up to capture my notes on an approach to solving the following problem from the last post:

Problem: Suppose you and $n$ other players are playing the following game.  Each player in turn spins a wheel as many times as desired, with each spin adding to the player’s score a random value uniformly distributed between 0 and 1.  However, if a player’s cumulative score ever exceeds 1, she immediately loses her turn (and the game).

After all players have taken a turn, the player with the highest score (not exceeding 1) wins the game.  You get to go first– what should your strategy be, and what are your chances of winning?

Solution: Suppose that our strategy is to continue to spin until our score exceeds some target value $t$.  The first key observation is that the probability that we do so “safely,” i.e. reach a total score greater than $t$ without exceeding 1, is

$q(t) = e^t(1-t)$

To see this, partition the successful outcomes by the number $k$ of initial spins with total score at most $t$, with the final $k+1$-st spin landing the total score in the interval $(t, 1]$.  The probability of such an outcome is

$\frac{t^k}{k!}(1-t)$

(which may be shown by induction, or see also here and here), and the result follows from the Taylor series expansion for $e^t$.

At this point, we can express the overall probability of winning using strategy $t$ as

$p_n(t) = q(t) \int_t^1 \frac{1}{1-t} (1 - q(s))^n ds$

Intuitively, we must:

1. Safely reach a score in the interval $(t, 1]$, with probability $q(t)$; and
2. For each such score $s$, reached with uniform density $1/(1-t)$, each of the remaining $n$ players must fail to beat our score.

The optimal strategy consists of choosing a target score $t$ maximizing $p_n(t)$.  Unfortunately, this does not have a closed form; however, after differentiating and some manipulation, the desired target score $t$ can be shown to be the root in $[0, 1]$ of the following equation, which has a nice interpretation:

$\int_t^1 (1 - q(s))^n ds = (1 - q(t))^n$

The idea is that we are choosing a target score $t$ where the (left-hand side) probability of winning by spinning one more time exactly equals the (right-hand side) probability of winning by stopping with the current score $t$.

Handing the problem over to the computer, the following table shows the optimal target score and corresponding probability of winning for the first few values of $n$.

Optimal target score (red) and corresponding probability of winning (blue), vs. number of additional players.

One final note: how does this translate into an optimal strategy for the players after the first?  At any point in the game, the current player has some number $n$ of players following him.  His optimal strategy is to target the maximum of the best score so far from the previous players, and the critical score computed above.

[Edit: Following is Python code using mpmath that implements the equations above.]

import mpmath

def q(t):
return mpmath.exp(t) * (1 - t)

def p_stop(t, n):
return (1 - q(t)) ** n

def p_spin(t, n):
return mpmath.quad(lambda s: p_stop(s, n), [t, 1])

def t_opt(n):
return mpmath.findroot(
lambda t: p_spin(t, n) - p_stop(t, n), [0, 1], solver='ridder')

def p_win(t, n):
return q(t) / (1 - t) * p_spin(t, n)

if __name__ == '__main__':
for n in range(11):
t = float(t_opt(n))
p = float(p_win(t, n))
print('{:4} {:.9f} {:.9f}'.format(n, t, p))


Posted in Uncategorized | 2 Comments

The Price Is Right… sort of

After a couple of recent conversations about the dice game puzzle proposed a few months ago, I spent some time experimenting with the following game that has a vaguely similar “feel,” and that also has the usual desirable feature of being approachable via either computer simulation or pencil and paper.

Suppose you and n other players are playing the following game.  Each player in turn spins a wheel as many times as desired, with each spin adding to the player’s score a random value uniformly distributed between 0 and 1.  However, if a player’s cumulative score ever exceeds 1, she immediately loses her turn (and the game).

After all players have taken a turn, the player with the highest score (not exceeding 1) wins the game.  You get to go first– what should your strategy be, and what are your chances of winning?

The obvious similarity to the dice game Pig is in the “jeopardy”-type challenge of balancing the risk of losing everything– in this case, by “busting,” or exceeding a total score of 1– with the benefit of further increasing your score, and thus decreasing the other players’ chances of beating that score.

I like this “continuous” version of the problem, for a couple of reasons.  First, it’s trickier to attack with a computer, resisting a straightforward dynamic programming approach.  But at the same time, I think we still need the computer, despite some nice pencil-and-paper mathematics involved in the solution.

We can construct an equally interesting discrete version of the game, though, as well: instead of each spin of the wheel yielding a random real value between 0 and 1, suppose that each spin yields a random integer between 1 and m (say, 20), inclusive, where each player’s total score must not exceed m.  The first player who reaches the maximum score not exceeding m wins the game.

This version of the game with n=3 and m=20 is very similar to the “Showcase Showdown” on the television game show The Price Is Right, where three players each get up to two spins of a wheel partitioned into dollar amounts from $.05 to$1.00, in steps of \$.05.  The television game has been analyzed before (see here, for example), but as a computational problem I like this version better, since it eliminates both the limit on the number of spins, as well as the potential for ties.