## Expected length of a soccer penalty shootout

The U.S. women’s soccer team recently beat the Netherlands in a penalty shootout, soccer’s version of an overtime tie-breaker. Teams alternate turns attempting a penalty kick, resulting in either a scored point or a block/miss, and after each team has had five attempts, the most points wins… with two twists:

1. The shootout ends as soon as the outcome is guaranteed, so that the shootout may require fewer than ten kicks in total. This happened in the U.S. vs. Netherlands game: the U.S. blocked two of the first four kicks from the Netherlands, and so after making all of their first four shots, the U.S. victory was assured after just eight total shots.
2. If the score is still tied after each team has had five shots, the shootout continues until “sudden death,” with teams alternating penalty kicks until one team scores and the other misses in the same “round.” This could go on for a while: the current world record for the longest shootout in a first class match is 48 shots (the ten initial shots, plus 19 more rounds of sudden death) in the 2005 Namibian Cup.

A recent Riddler column at FiveThirtyEight asked how long this takes on average. That is, assuming that each penalty kick by either team scores independently with the same probability $p$, what is the expected total number of shots taken?

This post is mostly just to capture my notes on this problem, adding to my “library” of potentially interesting future exercises for students. Following is a Python implementation that uses the symbolic mathematics library SymPy (bundled with SciPy) to recursively compute the desired expected value as a function of $p$:

```from functools import lru_cache
import sympy as sym

@lru_cache(maxsize=None)
def shots(score, us, them, p):
"""E[# shots] given net score and remaining shots for us/them."""
if us + them == 0:
return 0 if score != 0 else 2 / (2 * p * (1 - p))
if score + us < 0 or score - them > 0:
return 0
return (1 + p * shots(-score - 1, them, us - 1, p) +
(1 - p) * shots(-score, them, us - 1, p))

print(shots(0, 5, 5, sym.Symbol('p')).simplify())
```

The early stopping condition in line 9 is simple to describe: we stop if the next team to shoot can’t possibly win, or can’t possibly lose.

I like this problem because there are several nice subproblems buried inside. For example, with a bit of work, we could consolidate the `us` and `them` function arguments– indicating the number of possible remaining shots for the next team to kick and the defending team, respectively– into a single argument indicating the total number of shots remaining before sudden death. (It’s a fair question whether this is really any clearer or more readable, though; I think the above implementation makes it easier to see how the kicking and defending teams switch places in the recursion.)

Also, as noted in the Riddler column, it’s interesting, and a nice exercise to prove, that the expected number of shots is symmetric in $p$ and $1-p$, as shown in the figure below:

In other words, for example, if teams score penalty kicks with probability 0.8, we should expect shootouts to take just as long on average as if teams score with probability 0.2; and shootouts are shortest when each kick is a fair coin flip.

Finally, instead of just the expected number of shots, consider computing the entire probability distribution. Or even just counting outcomes (that are not in general equally likely): for example, how many different outcomes of the first $n=5$ “guaranteed” rounds (i.e., the first ten shots) are there that result in a tie leading to sudden death? This number is ${2n \choose n}$; it’s a nice problem to find a counting argument to show this (e.g., ten misses is one such outcome, while ten scored points is another, etc.).

This entry was posted in Uncategorized. Bookmark the permalink.

This site uses Akismet to reduce spam. Learn how your comment data is processed.