## A different sock matching problem

When I take a load of laundry from the dryer, there are socks mixed in with all of the other clothes. Suppose that there are $n$ pairs of socks, $2n$ socks in total, that I match and fold by randomly drawing one sock at a time from the basket. If the sock matches one that I have already drawn, I fold the matching pair and put it away in the sock drawer. Otherwise, I set the unmatched sock aside, anticipating matching it later. How much space does this take? That is, let $U$ be the maximum number of unmatched socks set aside at any point during this process. What is the distribution of $U$?

There are at least a couple of different possible problems here, depending on what constitutes a matching pair of socks. Arguably the most natural setup is that all $n$ pairs are distinct (e.g., each pair of my dress socks is a different color), so that each individual sock has exactly one mate. This is what has been described as the sock matching problem in the literature; see the references below.

My athletic socks, on the other hand, are essentially $n$ identical pairs, with each individual sock being distinguished only by a “left” or “right” label stitched into it, so that each sock may be matched with any of the $n$ other “differently-handed” socks. In this case, it’s a nice problem to show that $P(U \geq u) = \frac{1}{{2n \choose n}} \cdot 2\sum\limits_{k=1}^{\lfloor n/u \rfloor} (-1)^{k-1}{2n \choose n-ku}$

and thus $E[U] = \frac{1}{{2n \choose n}} \cdot 2\sum\limits_{u=1}^n \sum\limits_{k=1}^{\lfloor n/u \rfloor} (-1)^{k-1}{2n \choose n-ku}$

But what I found most interesting about this problem is that $E[U]$ appears to be very well approximated by $\sqrt{\frac{3}{2}n}$, with an error that I conjecture is always less than 1/2, and approaches zero in the limit as $n$ grows large. I don’t see how to prove this, though.

References:

1. Gilliand, S., Johnson, C., Rush, S., and Wood, D., The sock matching problem, Involve, 7(5) 2014, p. 691-697. [PDF]
2. Pantic, B. and Bodroza-Pantic, O., A brief overview of the sock matching problem. [arXiv]
3. OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences [A225177]
This entry was posted in Uncategorized. Bookmark the permalink.

### 9 Responses to A different sock matching problem

1. Wilmington says:

Folding laundry must be quite interesting at your house!
I don’t know how to derive your distribution functions (hopefully you’ll show how later), but I did mock it up in Excel and could calculate up to 16 pairs of socks. No idea how to prove it.

For the conjecture, I’d put it slightly differently: It looks like E[U] approaches SQRT(3u/2) – 1/2 with an error in the limit of zero and the largest error of 1 of 1/2 – SQRT(3/2) occurring at n = 1.

Here are the Excel functions if anyone want to play with them:

Option Explicit

Private Function PrSpaceGreaterOrEqualGuts(u As Long, n As Long) As Long
Dim k As Long
Dim s As Long: s = 0
Dim c As Long

If u = 0 Then
PrSpaceGreaterOrEqualGuts = 1
Exit Function
End If

Dim limit As Long: limit = Int(n / u)

For k = 1 To limit
c = WorksheetFunction.Combin(n + n, n – k * u)
If k Mod 2 = 1 Then
s = s + c
Else
s = s – c
End If
Next
PrSpaceGreaterOrEqualGuts = s
End Function

Private Function NormalizationFactor(n As Long) As Double
NormalizationFactor = 2# / WorksheetFunction.Combin(n + n, n)
End Function

Public Function PrSpaceGreaterOrEqual(u As Long, n As Long) As Double
PrSpaceGreaterOrEqual = NormalizationFactor(n) * PrSpaceGreaterOrEqualGuts(u, n)
End Function

Public Function E_SockSpace(n As Long) As Double
Dim u As Long
Dim s As Long: s = 0
For u = 1 To n
s = s + PrSpaceGreaterOrEqualGuts(u, n)
Next
E_SockSpace = NormalizationFactor(n) * s
End Function

• wilmington says:

typo: largest error of 1/2 – SQRT(3/2) occurring at n = 1. This is approximately 0.275

• wilmington says:

another typo: It looks like E[U] approaches SQRT(3n/2) – 1/2 with an error in the limit of zero and the largest error of 1/2 – SQRT(3/2) occurring at n = 1. The error at 1 is approximately 0.275.

• possiblywrong says:

Your code looks right, but I may misunderstand your description of the approximation error. If we use sqrt(3n/2)-1/2 as the approximation, then its error at n=1 (where E[U]=1) is sqrt(3/2)-3/2, or about -0.275. I’m guessing this is just a typo, since although your expression sqrt(3/2)-1/2 is off by 1, the decimal figure looks right (mod the sign)?

At any rate, although you are correct that offsetting the approximation by 1/2 like this does indeed reduce the magnitude of its error for small n, the resulting trend in the magnitude of that error is *increasing* for large n. See the plot linked in the comment replying to Chris.

2. Chris Wellons says:

Passes a Monte Carlo sniff test, though your estimate very consistently overestimates:

• possiblywrong says:

Right; see this plot showing the error between the approximation and the exact expected value. It appears to approach zero from above… or at least that’s my guess.

• Chris Wellons says:

Oh, duh, that makes sense. Now I understand what you meant by “approaches zero in the limit”. (Plus I wasn’t running enough simulations to see the pattern.)

3. possiblywrong says:

Sorry, that was confusing; I have edited the last paragraph of the post to try to clarify.

Re the simulation results, since we have an exact expression for the cumulative distribution function (e.g., contrast with the recent Skittles experiment, where we can only efficiently-but-accurately approximate the expected number of packs before first duplicate, but computing the distribution requires resorting to Monte Carlo estimation), we can compute the variance exactly as well. This plot shows the exact expected value in blue as before, with 2-sigma (roughly 95% containment) bounds for 65536 samples in red (the sample size that you used), and for one million samples in gray.

4. Wilmington says:

Ah! A hump for low n, then a downward trend. As I mentioned, my Excel code only worked up to n=16 or so. I was lulled by the initial trend.
What have you tried so far to prove it?
Suppose we convert it to the continuous domain? I don’t know how to finish it, but here’s my attempt to use the gamma function: https://imgur.com/a/atWQbrf
Would it be possible to apply L’Hopital’s rule?

In any case, if you can prove it, it would be tombstone worthy, (though maybe hard to express pithily) especially since it involves the numbers 3 and 2 like Archimedes had inscribed on his tombstone: https://en.wikipedia.org/wiki/On_the_Sphere_and_Cylinder

This site uses Akismet to reduce spam. Learn how your comment data is processed.