## A different sock matching problem

When I take a load of laundry from the dryer, there are socks mixed in with all of the other clothes. Suppose that there are $n$ pairs of socks, $2n$ socks in total, that I match and fold by randomly drawing one sock at a time from the basket. If the sock matches one that I have already drawn, I fold the matching pair and put it away in the sock drawer. Otherwise, I set the unmatched sock aside, anticipating matching it later. How much space does this take? That is, let $U$ be the maximum number of unmatched socks set aside at any point during this process. What is the distribution of $U$?

There are at least a couple of different possible problems here, depending on what constitutes a matching pair of socks. Arguably the most natural setup is that all $n$ pairs are distinct (e.g., each pair of my dress socks is a different color), so that each individual sock has exactly one mate. This is what has been described as the sock matching problem in the literature; see the references below.

My athletic socks, on the other hand, are essentially $n$ identical pairs, with each individual sock being distinguished only by a “left” or “right” label stitched into it, so that each sock may be matched with any of the $n$ other “differently-handed” socks. In this case, it’s a nice problem to show that $P(U \geq u) = \frac{1}{{2n \choose n}} \cdot 2\sum\limits_{k=1}^{\lfloor n/u \rfloor} (-1)^{k-1}{2n \choose n-ku}$

and thus $E[U] = \frac{1}{{2n \choose n}} \cdot 2\sum\limits_{u=1}^n \sum\limits_{k=1}^{\lfloor n/u \rfloor} (-1)^{k-1}{2n \choose n-ku}$

But what I found most interesting about this problem is that $E[U]$ appears to be very well approximated by $\sqrt{\frac{3}{2}n}$, with an error that I conjecture is always less than 1/2, and approaches zero in the limit as $n$ grows large. I don’t see how to prove this, though.

[Edit 2021-02-20: It turns out that the above conjecture is incorrect; thanks to Michael Earnest, who in the comments shows that the expected value is in fact $\sqrt{\pi n}\ln{2} - \frac{1}{2} + O (n^{-1/2})$, referencing an interesting paper by de Bruijn, Knuth, and S. O. Rice. Interestingly, that constant multiplier in the $\sqrt{n}$ term is approximately 1.22857, slightly larger than my incorrect guess of $\sqrt{3/2}$ which is about 1.22474. Thanks, Michael!]

References:

1. Gilliand, S., Johnson, C., Rush, S., and Wood, D., The sock matching problem, Involve, 7(5) 2014, p. 691-697. [PDF]
2. Pantic, B. and Bodroza-Pantic, O., A brief overview of the sock matching problem. [arXiv]
3. OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences [A225177]
This entry was posted in Uncategorized. Bookmark the permalink.

### 15 Responses to A different sock matching problem

1. Wilmington says:

Folding laundry must be quite interesting at your house!
I don’t know how to derive your distribution functions (hopefully you’ll show how later), but I did mock it up in Excel and could calculate up to 16 pairs of socks. No idea how to prove it.

For the conjecture, I’d put it slightly differently: It looks like E[U] approaches SQRT(3u/2) – 1/2 with an error in the limit of zero and the largest error of 1 of 1/2 – SQRT(3/2) occurring at n = 1.

Here are the Excel functions if anyone want to play with them:

Option Explicit

Private Function PrSpaceGreaterOrEqualGuts(u As Long, n As Long) As Long
Dim k As Long
Dim s As Long: s = 0
Dim c As Long

If u = 0 Then
PrSpaceGreaterOrEqualGuts = 1
Exit Function
End If

Dim limit As Long: limit = Int(n / u)

For k = 1 To limit
c = WorksheetFunction.Combin(n + n, n – k * u)
If k Mod 2 = 1 Then
s = s + c
Else
s = s – c
End If
Next
PrSpaceGreaterOrEqualGuts = s
End Function

Private Function NormalizationFactor(n As Long) As Double
NormalizationFactor = 2# / WorksheetFunction.Combin(n + n, n)
End Function

Public Function PrSpaceGreaterOrEqual(u As Long, n As Long) As Double
PrSpaceGreaterOrEqual = NormalizationFactor(n) * PrSpaceGreaterOrEqualGuts(u, n)
End Function

Public Function E_SockSpace(n As Long) As Double
Dim u As Long
Dim s As Long: s = 0
For u = 1 To n
s = s + PrSpaceGreaterOrEqualGuts(u, n)
Next
E_SockSpace = NormalizationFactor(n) * s
End Function

• wilmington says:

typo: largest error of 1/2 – SQRT(3/2) occurring at n = 1. This is approximately 0.275

• wilmington says:

another typo: It looks like E[U] approaches SQRT(3n/2) – 1/2 with an error in the limit of zero and the largest error of 1/2 – SQRT(3/2) occurring at n = 1. The error at 1 is approximately 0.275.

• possiblywrong says:

Your code looks right, but I may misunderstand your description of the approximation error. If we use sqrt(3n/2)-1/2 as the approximation, then its error at n=1 (where E[U]=1) is sqrt(3/2)-3/2, or about -0.275. I’m guessing this is just a typo, since although your expression sqrt(3/2)-1/2 is off by 1, the decimal figure looks right (mod the sign)?

At any rate, although you are correct that offsetting the approximation by 1/2 like this does indeed reduce the magnitude of its error for small n, the resulting trend in the magnitude of that error is *increasing* for large n. See the plot linked in the comment replying to Chris.

2. Chris Wellons says:

Passes a Monte Carlo sniff test, though your estimate very consistently overestimates:

• possiblywrong says:

Right; see this plot showing the error between the approximation and the exact expected value. It appears to approach zero from above… or at least that’s my guess.

• Chris Wellons says:

Oh, duh, that makes sense. Now I understand what you meant by “approaches zero in the limit”. (Plus I wasn’t running enough simulations to see the pattern.)

3. possiblywrong says:

Sorry, that was confusing; I have edited the last paragraph of the post to try to clarify.

Re the simulation results, since we have an exact expression for the cumulative distribution function (e.g., contrast with the recent Skittles experiment, where we can only efficiently-but-accurately approximate the expected number of packs before first duplicate, but computing the distribution requires resorting to Monte Carlo estimation), we can compute the variance exactly as well. This plot shows the exact expected value in blue as before, with 2-sigma (roughly 95% containment) bounds for 65536 samples in red (the sample size that you used), and for one million samples in gray.

4. Wilmington says:

Ah! A hump for low n, then a downward trend. As I mentioned, my Excel code only worked up to n=16 or so. I was lulled by the initial trend.
What have you tried so far to prove it?
Suppose we convert it to the continuous domain? I don’t know how to finish it, but here’s my attempt to use the gamma function: https://imgur.com/a/atWQbrf
Would it be possible to apply L’Hopital’s rule?

In any case, if you can prove it, it would be tombstone worthy, (though maybe hard to express pithily) especially since it involves the numbers 3 and 2 like Archimedes had inscribed on his tombstone: https://en.wikipedia.org/wiki/On_the_Sphere_and_Cylinder

5. Michael Earnest says:

This is a super cool problem! After seeing this, I asked on math stack exchange to see if anyone had a proof of the $\sqrt{\frac32 n}$ result, but no dice so far.

One thing I figured out is that $P(U

which can be found by applying a roots-of-unity filter to evaluate the sums of evenly spaced binomial coefficients. Since this is a summation of $O(u)$ terms, I think this is bit faster to compute when $u\ll \sqrt{n}$. Using this, I computed $E[U]$ up to $n = 47,000$. It seems that $\sqrt{\frac32n}-E[U]$ becomes negative at around $n = 17,000$, and continues to decrease thereafter. Perhaps it oscillates, or perhaps it approaches some constant.

Finally, a related problem to this is finding the expected value of the maximum height of a randomly chosen Dyck path. This is discussed in , where they show this expected value is $\sqrt{\pi n}-\frac12-O(n^{-1/2}\log n)$. I think that similar method can solve this problem, but I do not have the skills to replicate the work of Knuth and de Bruijn!

1. Bruijn, de, N. G., Knuth, D. E., & Rice, S. O. (1972). The average height of planted plane trees. In R. C. Read (Ed.), Graph Theory and Computing (pp. 15-22). Academic Press Inc. https://pure.tue.nl/ws/files/2382241/597601.pdf

• possiblywrong says:

Very interesting! I like the transformation to reduce the number of terms involved in computing $P(U. (Note that I edited the RHS of the formula in your comment, where it looked like you were using $m$ in place of the parameter $u$.)
Thanks for the pointer to the paper, I will take a look… although I won't be surprised if it's over my head. I struggle with asymptotics, where the discrete of combinatorics meets the continuous of analysis, so to speak :), and I'm much less comfortable with the latter.

6. Michael Earnest says:

Okay, I finally dove in to that paper and figured it out. It turns out that $\sqrt{3/2 n}$ is not the best approximation! Instead, $E[U] = {\ln 4\over 2}\sqrt{\pi n} - \frac12 + O(n^{-1/2})$

Note that $\sqrt{3/2}\approx 1.2247$, while ${ln 4\over 2} \sqrt{\pi}\approx 1.22857$, so you would have to use pretty large $n$ to distinguish the two with data alone.

Here is code my code I used to confirm the approximation: https://repl.it/@mearnest/Expected-Deviation-Lattice-Path#main.py

7. Michael Earnest says:

aaand, I just realized that ${\ln 4 \over 2}$ simplifies to $\ln 2$.

• possiblywrong says:

Very interesting, great work! I have edited the post to refer to your comments and point out this result. Thanks!

This site uses Akismet to reduce spam. Learn how your comment data is processed.