A different sock matching problem

When I take a load of laundry from the dryer, there are socks mixed in with all of the other clothes. Suppose that there are n pairs of socks, 2n socks in total, that I match and fold by randomly drawing one sock at a time from the basket. If the sock matches one that I have already drawn, I fold the matching pair and put it away in the sock drawer. Otherwise, I set the unmatched sock aside, anticipating matching it later. How much space does this take? That is, let U be the maximum number of unmatched socks set aside at any point during this process. What is the distribution of U?

There are at least a couple of different possible problems here, depending on what constitutes a matching pair of socks. Arguably the most natural setup is that all n pairs are distinct (e.g., each pair of my dress socks is a different color), so that each individual sock has exactly one mate. This is what has been described as the sock matching problem in the literature; see the references below.

My athletic socks, on the other hand, are essentially n identical pairs, with each individual sock being distinguished only by a “left” or “right” label stitched into it, so that each sock may be matched with any of the n other “differently-handed” socks. In this case, it’s a nice problem to show that

P(U \geq u) = \frac{1}{{2n \choose n}} \cdot 2\sum\limits_{k=1}^{\lfloor n/u \rfloor} (-1)^{k-1}{2n \choose n-ku}

and thus

E[U] = \frac{1}{{2n \choose n}} \cdot 2\sum\limits_{u=1}^n \sum\limits_{k=1}^{\lfloor n/u \rfloor} (-1)^{k-1}{2n \choose n-ku}

But what I found most interesting about this problem is that E[U] appears to be very well approximated by \sqrt{\frac{3}{2}n}, with an error that I conjecture is always less than 1/2, and approaches zero in the limit as n grows large. I don’t see how to prove this, though.

[Edit 2021-02-20: It turns out that the above conjecture is incorrect; thanks to Michael Earnest, who in the comments shows that the expected value is in fact \sqrt{\pi n}\ln{2} - \frac{1}{2} + O (n^{-1/2}), referencing an interesting paper by de Bruijn, Knuth, and S. O. Rice. Interestingly, that constant multiplier in the \sqrt{n} term is approximately 1.22857, slightly larger than my incorrect guess of \sqrt{3/2} which is about 1.22474. Thanks, Michael!]

References:

  1. Gilliand, S., Johnson, C., Rush, S., and Wood, D., The sock matching problem, Involve, 7(5) 2014, p. 691-697. [PDF]
  2. Pantic, B. and Bodroza-Pantic, O., A brief overview of the sock matching problem. [arXiv]
  3. OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences [A225177]

The hot hand

The wager described in the previous post was motivated by an interesting recent paper (Miller and Sanjurjo, reference below) discussing the hot hand, or the perception in some sports, particularly basketball, that a player with a streak of successful shots (“He’s heating up!”) has an increased probability of making a subsequent shot and continuing the streak (“He’s on fire!”).

Whether being on a streak yourself, or trying to defend a player on a streak, on the court this certainly feels like a real phenomenon. But is the hot hand a real effect, or just another example of our human tendency to see patterns in randomness?

A famous 1985 paper (Gilovich, Vallone, and Tversky, reference below) argued the latter, analyzing the proportion of successful shots immediately following a streak of three made shots in various settings (NBA field goals, free throws, and a controlled experiment with college players). Not finding any significant increase in proportion of “streak-extending” shots made, the apparent conclusion would be that a past streak has no effect on current success.

But that’s where this puzzle comes in: even if basketball shots are truly iid with success probability p, we should expect a negative bias in the proportion of shots made following a streak, at least compared to the intuitively expected proportion p. Miller and Sanjurjo argue that the absence of this bias in the 1985 data suggests that the hot hand is not just “a cognitive illusion.”

Both papers are interesting reads. In presenting the problem here as a gambling wager, I simplified things somewhat down to a “win, lose, or push” outcome (i.e., were there more streak-extending successes than failures, or fewer), since the resulting exact expected return can be computed more efficiently than the expected proportion of successes following a streak:

Given n remaining trials (basketball shots, coin flips, whatever) with success probability p, noting outcomes following streaks of length s, and winning (losing) the overall wager if the number of streak-extending successes is greater (less) than the number of streak-ending failures, the expected return is v(n,0,0), computed recursively via

v(0,r,w) = sgn(w)

v(n,s,w) = p v(n-1,s,w+1)+(1-p)v(n-1,0,w-1)

v(n,r,w) = p v(n-1,r+1,w)+(1-p)v(n-1,0,w)

where, using the setup in the previous post where we flip n=100 fair coins with probability p=1/2 of heads, looking for heads extending streaks of length s=4, the expected return v(100,0,0) is about -0.150825.

References:

  1. Gilovich, T., Vallone, R., and Tversky, A., The Hot Hand in Basketball: On the Misperception of Random Sequences, Cognitive Psychology, 17(3) July 1985, p. 295-314 [PDF]
  2. Miller, Joshua B. and Sanjurjo, Adam, Surprised by the Hot Hand Fallacy? A Truth in the Law of Small Numbers, Econometrica, 86(6) November 2018, p. 2019-2047 [arXiv]

Gambler’s fallacy fallacy?

The gambler’s fallacy is the belief that roulette wheels, dice, etc., have “memory,” so that, for example, having observed an unlikely streak of losses, the probability that the next outcome will be a win has increased, as a correction toward the expected long-term trend. The Wikipedia page provides a good example involving repeatedly flipping a fair coin:

“If after tossing four heads in a row, the next coin toss also came up heads, it would complete a run of five successive heads. Since the probability of a run of five successive heads is [only] 1/32, a person might believe that the next flip would be more likely to come up tails rather than heads again. This is incorrect and is an example of the gambler’s fallacy.”

That is, having observed a streak of four heads in a row, we are actually just as likely to observe heads again on the subsequent fifth flip as we are to observe tails. Similarly, even after betting on red at the roulette wheel and losing four times in a row, we should still expect to win a fifth such bet on red the same stubborn 18/38 of the time (assuming a typical double-zero American wheel).

Right?

So, here is what I think is an interesting puzzle: let’s play a game. I will flip a fair coin n=100 times, and prior to each flip, if you have observed a current streak of s=4 or more consecutive heads, then make a note of the outcome of the subsequent flip. After all 100 coin flips, tally the noted “streak-following” flips: if there are more heads than tails, then I pay you one dollar. If there are more tails than heads, then you pay me one dollar. (If there are an equal number of heads and tails, then we push.)

If the gambler’s fallacy is indeed a fallacy, then shouldn’t this be a fair bet, i.e., with net zero expected return? But I claim that I have a significant advantage in this game, taking more than 15 cents from you on average every time we play! Following a streak of heads, we expect to observe a much larger proportion of “trend-correcting” tails than “streak-extending” heads.

And there is nothing special or tricky about this particular setup. Try this experiment with a different number n of coin flips, or a longer or shorter “target” streak length s, or even a roulette-like bias on the coin flip probability. Or instead of focusing only on streaks of consecutive heads (i.e., ignoring streaks of tails), look for streaks of either s or more heads or s or more tails, and note whether the subsequent flip is different. The effect persists: on average, we observe a larger-than-expected proportion of outcomes that tend to end the streak.