Analysis of the game of HIPE

Introduction

Can you think of a word in which the letters HIPE appear consecutively?  What about a word containing HQ?  This “game” is described in a not-so-relevant chapter of Peter Winkler’s wonderful book Mathematical Mind-Benders, where he provides many more examples of what he calls HIPEs: a challenge of a short sequence of letters, a response to which must be a word containing that sequence, consecutively with no intervening letters.  For example, BV has many solutions, one of which is, well, oBVious.

It turns out that I am really bad at playing this game.  My wife, on the other hand, is pretty good at it.  As it happens, I also lag behind her quite a bit when we work together on crossword puzzles, acrostics, anagrams, etc…. that is, in many situations where it is helpful to consider words as made up of letters.  After browsing more examples of HIPEs in Winkler’s book, I wondered what makes a given HIPE easy or difficult.  I will describe my attempt at “automating” the generation of difficult HIPEs… but mostly I want to share what I found to be a fascinating anecdote from Winkler’s discussion of the game.

Word Lists

My idea was pretty simple: difficult HIPEs are most likely those sequences of letters that (1) occur very rarely in the dictionary, but (2) occur sufficiently often in natural language to be recognizable.  To compute these metrics, I used:

  1. A word list consisting of the union of (a) the ENABLE2k word list (updated from the previous ENABLE1), and (b) the third and latest 2014 edition of the Scrabble Official Tournament and Club Word List.
  2. The Google Books Ngrams data set (English Version 20120701) to map each word in my dictionary to a corresponding frequency of occurrence (details of methodology described in this previous post).

As usual, you can download the aggregated frequency data and component word lists here.

2- and 3-letter HIPEs

First, let’s focus on just two letters for now; the following figure shows all possible 2-letter HIPEs, arranged by frequency of occurrence in the Google Books dataset on the x-axis, and frequency of occurrence in the word list on the y-axis, with the example HIPEs in Winkler’s chapter shown in red.  Note that both axes are on a logarithmic scale to better “spread out” the infrequent HIPEs that we are interested in.

All digraphs with corresponding x=frequency of occurrence in the Google Books dataset and y=frequency of occurrence in the ENABLE+Scrabble word list. Winkler's example HIPEs are shown in red.

All digraphs with corresponding x=frequency of occurrence in the Google Books dataset and y=frequency of occurrence in the ENABLE+Scrabble word list. Winkler’s example HIPEs are shown in red.

As expected, the digraphs near the top of the figure aren’t very interesting, while Winkler’s examples are clustered near the bottom of the figure… although I don’t see much horizontal organization.  To get a better view, let’s zoom in on that lower-left corner of the figure, while still containing all of Winkler’s example HIPEs in red:

A zoomed-in view of the 2-letter HIPEs no more common than Winkler's examples.

A zoomed-in view of the 2-letter HIPEs no more common than Winkler’s examples.

Unfortunately, closer inspection of this figure is a little disappointing: there are certainly “interesting” additional HIPEs in there (DK and GP, for example), but no clear separation between them and other unacceptably weird ones like QI, MK, etc.

We can do the same thing for 3-letter HIPEs, but things get messy quickly; there are just too many possible HIPEs with valid solutions, even if we again “zoom in” on just the bounding box of Winkler’s examples:

Similarly zoomed-in view of 3-letter HIPEs.

Similarly zoomed-in view of rare/difficult 3-letter HIPEs.

There are quite a few interesting HIPEs even in that very bottom row in the figure.  Following are some of my favorites, which appear at most twice in the entire word list: BSM, CEV, CTW, CYI, FCO, IKY, KGA, LFC, UCY, UIU, WDF, XEU, XII.

Conclusion

Finally, back to Winkler’s discussion of what makes HIPEs easy or difficult.  This is where things get really interesting.  He points out that, for most people, it is the “kinetic sense” of producing a word with our mouths that dominates our sense of “knowing” the word.  Not its definition, not what it looks like on paper or how it sounds, but the association with the physical act of expressing the word.  If this is really true, then he suggests that perhaps deaf people, “especially those who customarily communicate in a sign language,” might play HIPE better than others:

Resolved to test this hypothesis, I introduced HIPE to a group of hearing-impaired employees at a government agency, who sat together and conversed in ASL daily at lunch.  They found the game completely trivial; as fast as I wrote HIPEs on napkins, they wrote solutions around them.  To them it was a mystery why anyone would think HIPE was any kind of challenge.

This certainly sounds like a significant effect, enough so that I wonder if more rigorous study has been done?

References:

  • Winkler, P., Mathematical Mind-Benders. Wellesley: A K Peters, Ltd., 2007 [PDF]
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Analysis of the game of HIPE

  1. Pingback: Probability of a Scrabble bingo | Possibly Wrong

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s