This is a follow-up to the previous post that briefly mentioned the game of Hangman, in the context of developing a dictionary of words sorted based on their frequency of occurrence in the recently updated Google Books Ngram dataset.
My Python implementation of the game is on GitHub, where you are the guesser playing against the computer acting as the hangman. But this hangman is “evil:” that is, the computer doesn’t play “honestly,” selecting a secret word at the start of the game, and keeping responses to your guesses consistent with that initial selection throughout the game. Instead, the computer does not start with a secret word at all, merely a word length, and scores guesses using a greedy algorithm that maximizes the number of possible words in the dictionary that remain consistent with all scored guesses so far.
(Aside: this version is slightly more evil than the original, modified to break ties by minimizing the number of correct letters in each response, so that it will score a “miss” if possible, subject to the original constraint of maximizing the number of possible remaining words.)
It seems intuitive that Evil Hangman should be a much harder game to play than the normal honest version. But how much harder is it really? A recent paper by Barbay and Subercaseaux  shows that this simple children’s game hides a lot of complexity: determining a strategy that guarantees at least a specified number of missed guesses is coNP-hard, and the standard “homework assignment” greedy algorithm can be arbitrarily worse than optimal for appropriately crafted dictionaries… but what about “real” dictionaries? How much harder is Evil Hangman in practice with a typical English-language dictionary?
Modeling the guesser
To automate the analysis, we need to implement a model of a player’s sequence of guesses in each game. When I play Hangman, my initial guesses tend to be hard-coded, always starting with ETAOIN SHRDLU, until a first correct guess, only after which my strategy depends on the situation, the length of the word, which letters were correct and where, etc.
I do this knowing that it’s not optimal; there is a great Data Genetics blog post  that describes the problem with this strategy, and a better solution. Rather than prioritizing guesses based on frequency of occurrence in language (where even ETAOINSHRDLU isn’t quite right– a more accurate sequence is approximately ETAOINSRHLDCU), a better approach is instead to guess letters that occur in the largest number of remaining possible words in the dictionary.
The following table shows what this strategy looks like, indicating the sequence of guesses you should make for each possible word length… at least until you score the first correct guess, after which the subsequent strategy is too complex to fit in a simple table, since it depends on not only the word length, but which letter was correct and where, etc.
|Word length||Guesses until first correct|
Note that this table differs slightly from the Data Genetics article, since the strategy depends on the dictionary. Here and throughout I am using the current North American (NWL2018) Scrabble dictionary of 192,111 words.
The following figure shows a comparison of performance of this “optimal” guessing strategy against various hangman strategies.
There is a lot to unpack here. First, we consider a range of dictionary sizes on the x-axis, ranging from just the most commonly occurring words to the entire NWL2018 Scrabble dictionary. Second, each curve corresponds to a specific word length. This is perhaps the most interesting and unintuitive observation: the shortest words, with just 2-5 letters, are consistently the hardest ones to guess.
The leftmost plot shows the “normal” game, where an honest hangman selects a fixed secret word uniformly at random, with the y-axis indicating the expected number of misses (conditioned on that word length) before guessing the word.
The middle plot shows the performance against Evil Hangman. Interestingly, short words are not only harder to guess, they are much harder, with words of 3-5 letters requiring nearly twice as many misses as words with 6 letters. What are these “hardest” words? Recall that Evil Hangman does not select a specific word to guess; but when finally forced into a corner by this “optimal” guessing strategy, examples of the eventual lone remaining valid dictionary word are (from most to least common): hip, rib, tung, buzz, puff, jig, pugh, fil, jib, wuz, cuz, yuck, and guck.
However, as discussed in the paper motivating this post, this “greedily” Evil Hangman does not necessarily yield a maximum number of failed guesses. The rightmost plot shows the worst case performance against a normal honest hangman, i.e., where the secret word is selected to maximize the number of misses against this “optimal” guessing strategy. It is again perhaps unintuitive that merely crafting a particular selected word– but a fixed word– can yield so many more misses than the Evil Hangman, who actively avoids giving out information.
This behavior can be more clearly seen with a simpler example. Consider a dictionary with just five words: from, have, that, this, with. We can verify that our guessing strategy will win with zero misses against Evil Hangman, by choosing H as the first letter; but an honest hangman can guarantee that first guess will miss, by selecting the word from.
So what are these hardest words that an honest hangman could select from the actual Scrabble dictionary? Some of the more recognizable ones are junked (12 misses), jazzed and faffed (13 misses each)… and it only gets weirder from there; to require more than the 16 failed guesses squeezed out by the Evil Hangman, one could select words like: hin, rill, zin, zill, zax, yill, yins, zills, yills. I don’t recognize any of these, and I imagine that selecting one of them as a hangman secret word would probably start a fight.
Of course, these examples highlight one of a couple of problems with this analysis: first, the strategy for both the guesser and the hangman, even in the evil case, are completely deterministic, and thus predictably repeatable. For example, having observed that the “optimal” guesser requires 12 misses to eventually guess the word pop, as hangman we could select pop again and again, and score 12 misses every single time.
Second, we aren’t even really playing the right game. In Hangman, the guesser does not necessarily continue until finally guessing the entire word. Instead, a maximum number of failed guesses is designated ahead of time, and the guesser loses as soon as that maximum is reached. Rather than minimizing the number of misses– which we haven’t even been guaranteeing anyway, hence my quotes around “optimal” strategy– the real objective is to maximize the probability of winning, i.e., of not reaching the maximum number of misses.
This is a very hard problem, with a mixed (that is, randomized) strategy Nash equilibrium that would be incredibly complex to even specify, let alone actually compute. Jon McLoone presents an interesting Wolfram blog post  that attempts to address these issues, albeit in a similarly simplified and not necessarily optimal way. (Many of the above words, including buzz and its inflected forms, jazzed, faffed, etc., also appear high on his list of “best” words for a hangman to choose.)
- Barbay, J. and Subercaseaux, B., The Computational Complexity of Evil Hangman, arXiv:2003.10000
- Berry, N., A Better Strategy for Hangman, Data Genetics (blog), https://datagenetics.com/blog/april12012/
- McLoone, J., 25 Best Hangman Words, Wolfram News, Views, & Insights (blog), https://blog.wolfram.com/2010/08/13/25-best-hangman-words/