When I take a load of laundry from the dryer, there are socks mixed in with all of the other clothes. Suppose that there are pairs of socks, socks in total, that I match and fold by randomly drawing one sock at a time from the basket. If the sock matches one that I have already drawn, I fold the matching pair and put it away in the sock drawer. Otherwise, I set the unmatched sock aside, anticipating matching it later. How much space does this take? That is, let be the maximum number of unmatched socks set aside at any point during this process. What is the distribution of ?

There are at least a couple of different possible problems here, depending on what constitutes a matching pair of socks. Arguably the most natural setup is that all pairs are distinct (e.g., each pair of my dress socks is a different color), so that each individual sock has exactly one mate. This is what has been described as *the* sock matching problem in the literature; see the references below.

My athletic socks, on the other hand, are essentially identical pairs, with each individual sock being distinguished only by a “left” or “right” label stitched into it, so that each sock may be matched with any of the other “differently-handed” socks. In this case, it’s a nice problem to show that

and thus

But what I found most interesting about this problem is that appears to be very well approximated by , with an error that I conjecture is always less than 1/2, and approaches zero in the limit as grows large. I don’t see how to prove this, though.

**References:**

- Gilliand, S., Johnson, C., Rush, S., and Wood, D., The sock matching problem,
*Involve*, **7**(5) 2014, p. 691-697. [PDF]
- Pantic, B. and Bodroza-Pantic, O., A brief overview of the sock matching problem. [arXiv]
- OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences [A225177]

### Like this:

Like Loading...

*Related*

Folding laundry must be quite interesting at your house!

I don’t know how to derive your distribution functions (hopefully you’ll show how later), but I did mock it up in Excel and could calculate up to 16 pairs of socks. No idea how to prove it.

For the conjecture, I’d put it slightly differently: It looks like E[U] approaches SQRT(3u/2) – 1/2 with an error in the limit of zero and the largest error of 1 of 1/2 – SQRT(3/2) occurring at n = 1.

Here are the Excel functions if anyone want to play with them:

Option Explicit

Private Function PrSpaceGreaterOrEqualGuts(u As Long, n As Long) As Long

Dim k As Long

Dim s As Long: s = 0

Dim c As Long

If u = 0 Then

PrSpaceGreaterOrEqualGuts = 1

Exit Function

End If

Dim limit As Long: limit = Int(n / u)

For k = 1 To limit

c = WorksheetFunction.Combin(n + n, n – k * u)

If k Mod 2 = 1 Then

s = s + c

Else

s = s – c

End If

Next

PrSpaceGreaterOrEqualGuts = s

End Function

Private Function NormalizationFactor(n As Long) As Double

NormalizationFactor = 2# / WorksheetFunction.Combin(n + n, n)

End Function

Public Function PrSpaceGreaterOrEqual(u As Long, n As Long) As Double

PrSpaceGreaterOrEqual = NormalizationFactor(n) * PrSpaceGreaterOrEqualGuts(u, n)

End Function

Public Function E_SockSpace(n As Long) As Double

Dim u As Long

Dim s As Long: s = 0

For u = 1 To n

s = s + PrSpaceGreaterOrEqualGuts(u, n)

Next

E_SockSpace = NormalizationFactor(n) * s

End Function

typo: largest error of 1/2 – SQRT(3/2) occurring at n = 1. This is approximately 0.275

another typo: It looks like E[U] approaches SQRT(3n/2) – 1/2 with an error in the limit of zero and the largest error of 1/2 – SQRT(3/2) occurring at n = 1. The error at 1 is approximately 0.275.

Your code looks right, but I may misunderstand your description of the approximation error. If we use sqrt(3n/2)-1/2 as the approximation, then its error at n=1 (where E[U]=1) is sqrt(3/2)-3/2, or about -0.275. I’m guessing this is just a typo, since although your expression sqrt(3/2)-1/2 is off by 1, the decimal figure looks right (mod the sign)?

At any rate, although you are correct that offsetting the approximation by 1/2 like this does indeed reduce the magnitude of its error for small n, the resulting trend in the magnitude of that error is *increasing* for large n. See the plot linked in the comment replying to Chris.

Passes a Monte Carlo sniff test, though your estimate very consistently overestimates:

Code: https://pastebin.com/i32Z3E7g

Output: https://pastebin.com/HZbeX5xW

Right; see this plot showing the error between the approximation and the exact expected value. It appears to approach zero from above… or at least that’s my guess.

Oh, duh, that makes sense. Now I understand what you meant by “approaches zero in the limit”. (Plus I wasn’t running enough simulations to see the pattern.)

Sorry, that was confusing; I have edited the last paragraph of the post to try to clarify.

Re the simulation results, since we have an exact expression for the cumulative distribution function (e.g., contrast with the recent Skittles experiment, where we can only efficiently-but-accurately approximate the expected number of packs before first duplicate, but computing the distribution requires resorting to Monte Carlo estimation), we can compute the variance exactly as well. This plot shows the exact expected value in blue as before, with 2-sigma (roughly 95% containment) bounds for 65536 samples in red (the sample size that you used), and for one million samples in gray.

Ah! A hump for low n, then a downward trend. As I mentioned, my Excel code only worked up to n=16 or so. I was lulled by the initial trend.

What have you tried so far to prove it?

Suppose we convert it to the continuous domain? I don’t know how to finish it, but here’s my attempt to use the gamma function: https://imgur.com/a/atWQbrf

Would it be possible to apply L’Hopital’s rule?

In any case, if you can prove it, it would be tombstone worthy, (though maybe hard to express pithily) especially since it involves the numbers 3 and 2 like Archimedes had inscribed on his tombstone: https://en.wikipedia.org/wiki/On_the_Sphere_and_Cylinder