A different sock matching problem

When I take a load of laundry from the dryer, there are socks mixed in with all of the other clothes. Suppose that there are n pairs of socks, 2n socks in total, that I match and fold by randomly drawing one sock at a time from the basket. If the sock matches one that I have already drawn, I fold the matching pair and put it away in the sock drawer. Otherwise, I set the unmatched sock aside, anticipating matching it later. How much space does this take? That is, let U be the maximum number of unmatched socks set aside at any point during this process. What is the distribution of U?

There are at least a couple of different possible problems here, depending on what constitutes a matching pair of socks. Arguably the most natural setup is that all n pairs are distinct (e.g., each pair of my dress socks is a different color), so that each individual sock has exactly one mate. This is what has been described as the sock matching problem in the literature; see the references below.

My athletic socks, on the other hand, are essentially n identical pairs, with each individual sock being distinguished only by a “left” or “right” label stitched into it, so that each sock may be matched with any of the n other “differently-handed” socks. In this case, it’s a nice problem to show that

P(U \geq u) = \frac{1}{{2n \choose n}} \cdot 2\sum\limits_{k=1}^{\lfloor n/u \rfloor} (-1)^{k-1}{2n \choose n-ku}

and thus

E[U] = \frac{1}{{2n \choose n}} \cdot 2\sum\limits_{u=1}^n \sum\limits_{k=1}^{\lfloor n/u \rfloor} (-1)^{k-1}{2n \choose n-ku}

But what I found most interesting about this problem is that E[U] appears to be very well approximated by \sqrt{\frac{3}{2}n}, with an error that I conjecture is always less than 1/2, and approaches zero in the limit as n grows large. I don’t see how to prove this, though.

[Edit 2021-02-20: It turns out that the above conjecture is incorrect; thanks to Michael Earnest, who in the comments shows that the expected value is in fact \sqrt{\pi n}\ln{2} - \frac{1}{2} + O (n^{-1/2}), referencing an interesting paper by de Bruijn, Knuth, and S. O. Rice. Interestingly, that constant multiplier in the \sqrt{n} term is approximately 1.22857, slightly larger than my incorrect guess of \sqrt{3/2} which is about 1.22474. Thanks, Michael!]


  1. Gilliand, S., Johnson, C., Rush, S., and Wood, D., The sock matching problem, Involve, 7(5) 2014, p. 691-697. [PDF]
  2. Pantic, B. and Bodroza-Pantic, O., A brief overview of the sock matching problem. [arXiv]
  3. OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences [A225177]
This entry was posted in Uncategorized. Bookmark the permalink.

15 Responses to A different sock matching problem

  1. Wilmington says:

    Folding laundry must be quite interesting at your house!
    I don’t know how to derive your distribution functions (hopefully you’ll show how later), but I did mock it up in Excel and could calculate up to 16 pairs of socks. No idea how to prove it.

    For the conjecture, I’d put it slightly differently: It looks like E[U] approaches SQRT(3u/2) – 1/2 with an error in the limit of zero and the largest error of 1 of 1/2 – SQRT(3/2) occurring at n = 1.

    Here are the Excel functions if anyone want to play with them:

    Option Explicit

    Private Function PrSpaceGreaterOrEqualGuts(u As Long, n As Long) As Long
    Dim k As Long
    Dim s As Long: s = 0
    Dim c As Long

    If u = 0 Then
    PrSpaceGreaterOrEqualGuts = 1
    Exit Function
    End If

    Dim limit As Long: limit = Int(n / u)

    For k = 1 To limit
    c = WorksheetFunction.Combin(n + n, n – k * u)
    If k Mod 2 = 1 Then
    s = s + c
    s = s – c
    End If
    PrSpaceGreaterOrEqualGuts = s
    End Function

    Private Function NormalizationFactor(n As Long) As Double
    NormalizationFactor = 2# / WorksheetFunction.Combin(n + n, n)
    End Function

    Public Function PrSpaceGreaterOrEqual(u As Long, n As Long) As Double
    PrSpaceGreaterOrEqual = NormalizationFactor(n) * PrSpaceGreaterOrEqualGuts(u, n)
    End Function

    Public Function E_SockSpace(n As Long) As Double
    Dim u As Long
    Dim s As Long: s = 0
    For u = 1 To n
    s = s + PrSpaceGreaterOrEqualGuts(u, n)
    E_SockSpace = NormalizationFactor(n) * s
    End Function

    • wilmington says:

      typo: largest error of 1/2 – SQRT(3/2) occurring at n = 1. This is approximately 0.275

      • wilmington says:

        another typo: It looks like E[U] approaches SQRT(3n/2) – 1/2 with an error in the limit of zero and the largest error of 1/2 – SQRT(3/2) occurring at n = 1. The error at 1 is approximately 0.275.

    • Your code looks right, but I may misunderstand your description of the approximation error. If we use sqrt(3n/2)-1/2 as the approximation, then its error at n=1 (where E[U]=1) is sqrt(3/2)-3/2, or about -0.275. I’m guessing this is just a typo, since although your expression sqrt(3/2)-1/2 is off by 1, the decimal figure looks right (mod the sign)?

      At any rate, although you are correct that offsetting the approximation by 1/2 like this does indeed reduce the magnitude of its error for small n, the resulting trend in the magnitude of that error is *increasing* for large n. See the plot linked in the comment replying to Chris.

  2. Passes a Monte Carlo sniff test, though your estimate very consistently overestimates:

    Code: https://pastebin.com/i32Z3E7g
    Output: https://pastebin.com/HZbeX5xW

  3. Sorry, that was confusing; I have edited the last paragraph of the post to try to clarify.

    Re the simulation results, since we have an exact expression for the cumulative distribution function (e.g., contrast with the recent Skittles experiment, where we can only efficiently-but-accurately approximate the expected number of packs before first duplicate, but computing the distribution requires resorting to Monte Carlo estimation), we can compute the variance exactly as well. This plot shows the exact expected value in blue as before, with 2-sigma (roughly 95% containment) bounds for 65536 samples in red (the sample size that you used), and for one million samples in gray.

  4. Wilmington says:

    Ah! A hump for low n, then a downward trend. As I mentioned, my Excel code only worked up to n=16 or so. I was lulled by the initial trend.
    What have you tried so far to prove it?
    Suppose we convert it to the continuous domain? I don’t know how to finish it, but here’s my attempt to use the gamma function: https://imgur.com/a/atWQbrf
    Would it be possible to apply L’Hopital’s rule?

    In any case, if you can prove it, it would be tombstone worthy, (though maybe hard to express pithily) especially since it involves the numbers 3 and 2 like Archimedes had inscribed on his tombstone: https://en.wikipedia.org/wiki/On_the_Sphere_and_Cylinder

  5. Michael Earnest says:

    This is a super cool problem! After seeing this, I asked on math stack exchange to see if anyone had a proof of the \sqrt{\frac32 n} result, but no dice so far.

    One thing I figured out is that

    P(U<u) = \frac1{\binom{2n}n}\cdot\frac2u\cdot 2^{2n}\sum_{k=1}^{\lfloor u/2\rfloor} \bigg[\cos\left(\frac{(2k-1)\pi}{2u}\right)\bigg]^{2n}

    which can be found by applying a roots-of-unity filter to evaluate the sums of evenly spaced binomial coefficients. Since this is a summation of O(u) terms, I think this is bit faster to compute when u\ll \sqrt{n}. Using this, I computed E[U] up to n = 47,000. It seems that \sqrt{\frac32n}-E[U] becomes negative at around n = 17,000, and continues to decrease thereafter. Perhaps it oscillates, or perhaps it approaches some constant.

    Finally, a related problem to this is finding the expected value of the maximum height of a randomly chosen Dyck path. This is discussed in [1], where they show this expected value is \sqrt{\pi n}-\frac12-O(n^{-1/2}\log n). I think that similar method can solve this problem, but I do not have the skills to replicate the work of Knuth and de Bruijn!

    1. Bruijn, de, N. G., Knuth, D. E., & Rice, S. O. (1972). The average height of planted plane trees. In R. C. Read (Ed.), Graph Theory and Computing (pp. 15-22). Academic Press Inc. https://pure.tue.nl/ws/files/2382241/597601.pdf

    • Very interesting! I like the transformation to reduce the number of terms involved in computing P(U<u). (Note that I edited the RHS of the formula in your comment, where it looked like you were using m in place of the parameter u.)
      Thanks for the pointer to the paper, I will take a look… although I won't be surprised if it's over my head. I struggle with asymptotics, where the discrete of combinatorics meets the continuous of analysis, so to speak :), and I'm much less comfortable with the latter.

  6. Pingback: Expected maximum deviation of lattice path ~ Mathematics ~ mathubs.com

  7. Michael Earnest says:

    Okay, I finally dove in to that paper and figured it out. It turns out that $\sqrt{3/2 n}$ is not the best approximation! Instead,

    E[U] = {\ln 4\over 2}\sqrt{\pi n} - \frac12 + O(n^{-1/2})

    Note that \sqrt{3/2}\approx 1.2247, while {ln 4\over 2} \sqrt{\pi}\approx 1.22857, so you would have to use pretty large n to distinguish the two with data alone.

    Here is code my code I used to confirm the approximation: https://repl.it/@mearnest/Expected-Deviation-Lattice-Path#main.py

  8. Michael Earnest says:

    aaand, I just realized that {\ln 4 \over 2} simplifies to \ln 2.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.