When I take a load of laundry from the dryer, there are socks mixed in with all of the other clothes. Suppose that there are pairs of socks, socks in total, that I match and fold by randomly drawing one sock at a time from the basket. If the sock matches one that I have already drawn, I fold the matching pair and put it away in the sock drawer. Otherwise, I set the unmatched sock aside, anticipating matching it later. How much space does this take? That is, let be the maximum number of unmatched socks set aside at any point during this process. What is the distribution of ?

There are at least a couple of different possible problems here, depending on what constitutes a matching pair of socks. Arguably the most natural setup is that all pairs are distinct (e.g., each pair of my dress socks is a different color), so that each individual sock has exactly one mate. This is what has been described as *the* sock matching problem in the literature; see the references below.

My athletic socks, on the other hand, are essentially identical pairs, with each individual sock being distinguished only by a “left” or “right” label stitched into it, so that each sock may be matched with any of the other “differently-handed” socks. In this case, it’s a nice problem to show that

and thus

But what I found most interesting about this problem is that appears to be very well approximated by , with an error that I conjecture is always less than 1/2, and approaches zero in the limit as grows large. I don’t see how to prove this, though.

[*Edit 2021-02-20*: It turns out that the above conjecture is incorrect; thanks to Michael Earnest, who in the comments shows that the expected value is in fact , referencing an interesting paper by de Bruijn, Knuth, and S. O. Rice. Interestingly, that constant multiplier in the term is approximately 1.22857, slightly larger than my incorrect guess of which is about 1.22474. Thanks, Michael!]

**References:**

- Gilliand, S., Johnson, C., Rush, S., and Wood, D., The sock matching problem,
*Involve*, **7**(5) 2014, p. 691-697. [PDF]
- Pantic, B. and Bodroza-Pantic, O., A brief overview of the sock matching problem. [arXiv]
- OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences [A225177]

### Like this:

Like Loading...

*Related*

Folding laundry must be quite interesting at your house!

I don’t know how to derive your distribution functions (hopefully you’ll show how later), but I did mock it up in Excel and could calculate up to 16 pairs of socks. No idea how to prove it.

For the conjecture, I’d put it slightly differently: It looks like E[U] approaches SQRT(3u/2) – 1/2 with an error in the limit of zero and the largest error of 1 of 1/2 – SQRT(3/2) occurring at n = 1.

Here are the Excel functions if anyone want to play with them:

Option Explicit

Private Function PrSpaceGreaterOrEqualGuts(u As Long, n As Long) As Long

Dim k As Long

Dim s As Long: s = 0

Dim c As Long

If u = 0 Then

PrSpaceGreaterOrEqualGuts = 1

Exit Function

End If

Dim limit As Long: limit = Int(n / u)

For k = 1 To limit

c = WorksheetFunction.Combin(n + n, n – k * u)

If k Mod 2 = 1 Then

s = s + c

Else

s = s – c

End If

Next

PrSpaceGreaterOrEqualGuts = s

End Function

Private Function NormalizationFactor(n As Long) As Double

NormalizationFactor = 2# / WorksheetFunction.Combin(n + n, n)

End Function

Public Function PrSpaceGreaterOrEqual(u As Long, n As Long) As Double

PrSpaceGreaterOrEqual = NormalizationFactor(n) * PrSpaceGreaterOrEqualGuts(u, n)

End Function

Public Function E_SockSpace(n As Long) As Double

Dim u As Long

Dim s As Long: s = 0

For u = 1 To n

s = s + PrSpaceGreaterOrEqualGuts(u, n)

Next

E_SockSpace = NormalizationFactor(n) * s

End Function

typo: largest error of 1/2 – SQRT(3/2) occurring at n = 1. This is approximately 0.275

another typo: It looks like E[U] approaches SQRT(3n/2) – 1/2 with an error in the limit of zero and the largest error of 1/2 – SQRT(3/2) occurring at n = 1. The error at 1 is approximately 0.275.

Your code looks right, but I may misunderstand your description of the approximation error. If we use sqrt(3n/2)-1/2 as the approximation, then its error at n=1 (where E[U]=1) is sqrt(3/2)-3/2, or about -0.275. I’m guessing this is just a typo, since although your expression sqrt(3/2)-1/2 is off by 1, the decimal figure looks right (mod the sign)?

At any rate, although you are correct that offsetting the approximation by 1/2 like this does indeed reduce the magnitude of its error for small n, the resulting trend in the magnitude of that error is *increasing* for large n. See the plot linked in the comment replying to Chris.

Passes a Monte Carlo sniff test, though your estimate very consistently overestimates:

Code: https://pastebin.com/i32Z3E7g

Output: https://pastebin.com/HZbeX5xW

Right; see this plot showing the error between the approximation and the exact expected value. It appears to approach zero from above… or at least that’s my guess.

Oh, duh, that makes sense. Now I understand what you meant by “approaches zero in the limit”. (Plus I wasn’t running enough simulations to see the pattern.)

Sorry, that was confusing; I have edited the last paragraph of the post to try to clarify.

Re the simulation results, since we have an exact expression for the cumulative distribution function (e.g., contrast with the recent Skittles experiment, where we can only efficiently-but-accurately approximate the expected number of packs before first duplicate, but computing the distribution requires resorting to Monte Carlo estimation), we can compute the variance exactly as well. This plot shows the exact expected value in blue as before, with 2-sigma (roughly 95% containment) bounds for 65536 samples in red (the sample size that you used), and for one million samples in gray.

Ah! A hump for low n, then a downward trend. As I mentioned, my Excel code only worked up to n=16 or so. I was lulled by the initial trend.

What have you tried so far to prove it?

Suppose we convert it to the continuous domain? I don’t know how to finish it, but here’s my attempt to use the gamma function: https://imgur.com/a/atWQbrf

Would it be possible to apply L’Hopital’s rule?

In any case, if you can prove it, it would be tombstone worthy, (though maybe hard to express pithily) especially since it involves the numbers 3 and 2 like Archimedes had inscribed on his tombstone: https://en.wikipedia.org/wiki/On_the_Sphere_and_Cylinder

This is a super cool problem! After seeing this, I asked on math stack exchange to see if anyone had a proof of the result, but no dice so far.

One thing I figured out is that

which can be found by applying a roots-of-unity filter to evaluate the sums of evenly spaced binomial coefficients. Since this is a summation of terms, I think this is bit faster to compute when . Using this, I computed up to . It seems that becomes negative at around , and continues to decrease thereafter. Perhaps it oscillates, or perhaps it approaches some constant.

Finally, a related problem to this is finding the expected value of the maximum height of a randomly chosen Dyck path. This is discussed in [1], where they show this expected value is . I think that similar method can solve this problem, but I do not have the skills to replicate the work of Knuth and de Bruijn!

1. Bruijn, de, N. G., Knuth, D. E., & Rice, S. O. (1972). The average height of planted plane trees. In R. C. Read (Ed.), Graph Theory and Computing (pp. 15-22). Academic Press Inc. https://pure.tue.nl/ws/files/2382241/597601.pdf

Very interesting! I like the transformation to reduce the number of terms involved in computing . (Note that I edited the RHS of the formula in your comment, where it looked like you were using in place of the parameter .)

Thanks for the pointer to the paper, I will take a look… although I won't be surprised if it's over my head. I struggle with asymptotics, where the discrete of combinatorics meets the continuous of analysis, so to speak :), and I'm much less comfortable with the latter.

Pingback: Expected maximum deviation of lattice path ~ Mathematics ~ mathubs.com

Okay, I finally dove in to that paper and figured it out. It turns out that $\sqrt{3/2 n}$ is not the best approximation! Instead,

Note that , while , so you would have to use pretty large to distinguish the two with data alone.

Here is code my code I used to confirm the approximation: https://repl.it/@mearnest/Expected-Deviation-Lattice-Path#main.py

aaand, I just realized that simplifies to .

Very interesting, great work! I have edited the post to refer to your comments and point out this result. Thanks!