**Introduction**

This is a follow-up to a post from earlier this year discussing the likelihood of encountering two identical packs of Skittles, that is, two packs having exactly the same number of candies of each flavor. Under some reasonable assumptions, it was estimated that we should expect to have to inspect “only about 400-500 packs” on average until encountering a first duplicate. This is interesting, because as described in that earlier post, there are *millions* of different possible packs– or even if we discount those that are much less likely to occur (like, say, a pack of nothing but red Skittles), then there are still hundreds of thousands of different “likely” packs that we might expect to encounter.

So, on 12 January of this year, I started buying boxes of packs of Skittles. This past week, “only” 82 days, 13 boxes, 468 packs, and 27,740 individual Skittles later, I found the following identical 2.17-ounce packs:

**Test procedure**

I purchased all of the 2.17-ounce packs of Skittles for this experiment from Amazon in boxes of 36 packs each. From 12 January through 4 April, I worked my way through 13 boxes, for a total of 468 packs, at the approximate rate of six packs per day. This was enough to feel like I was making progress each day, but not enough to become annoying or risk clerical errors. For each six-pack recording session, I did the following:

- Take a pack from the box, open it, and empty and sort the contents onto a blank sheet of paper.
- Take a photo of the contents of the pack.
- Record, with pen and paper, the number of Skittles of each color in the pack (more on this later).
- Empty the Skittles into a bowl.
- Repeat steps 1-4; after six packs, save and review the photos, recording the color counts to file, verifying against the paper record from step 3, and checking for duplication of a previously recorded pack.

The photos captured all of the contents of each pack, including any small flakes and chips of flavored coating that were easy to disregard… but also larger “chunks” of misshapen paste that were often only partially coated or not at all, that required some criteria up front to determine whether or how to count. For this experiment, my threshold for counting a chunk was answering “Yes” to all three of (a) is it greater than half the size of a “normal” Skittle, (b) is it completely coated with a single clearly identifiable flavor color, and (c) is it not gross, that is, would I be willing to eat it? Any “No” answer resulted in recording that pack as containing “uncounted” material, such as the pack shown below.

The entire data set is available here as well as on GitHub. The following figure shows the photos of all 468 packs (the originals are 1024×768 pixels each), with the found pair of identical packs circled in red.

**But… why?**

So, what’s the point? Why bother with nearly three months of effort to collect this data? One easy answer is that I simply found it interesting. But I think a better answer is that this seemed like a great opportunity to demonstrate the *predictive power* of mathematics. A few months ago, we did some calculations on a cocktail napkin, so to speak, *predicting* that we should be able to find a pair of identical packs of Skittles with a reasonably– and perhaps surprisingly– small amount of effort. Actually seeing that effort through to the finish line can be a vivid demonstration for students of this predictive power of what might otherwise be viewed as “merely abstract” and not concretely useful mathematics.

(As an aside, I think the fact that this *particular* concrete application happens to be recreational, or even downright frivolous, is beside the point. For one thing, recreational mathematics is fun. But perhaps more importantly, there are useful, non-recreational, “real-world” applications *of the same underlying mathematics*. Cryptography is one such example application; this experiment is really just a birthday attack in slightly more complicated form.)

**Assumptions and predictions**

For completeness, let’s review the approach discussed in the previous post for estimating the number of packs we need to inspect to find a duplicate. We assume that the color of each individual Skittle is independently and uniformly distributed among the possible flavors (strawberry, orange, lemon, green apple, and grape). We further assume that the total number of Skittles in a pack is independently distributed with density , where we guessed at based on similar past studies.

We use generating functions to compute the probability that two *particular* randomly selected packs of Skittles would be identical, where

Given this, a reasonable approximation of the expected number of packs we need to inspect until encountering a first duplicate is , or about 400-500 packs depending on our assumption for the pack size density .

**Observations**

The most common and controversial question asked about Skittles seems to be whether all five flavors are indeed uniformly distributed, or whether some flavors are more common than others. The following figure shows the distribution observed in this sample of 468 packs.

Somewhat unfortunately, this data set potentially adds fuel to the frequent accusation that the yellow Skittles dominate. However, I leave it to an interested reader to consider and analyze whether this departure from uniformity is significant.

How accurate was our prior assumed distribution for the total number of Skittles in each pack? The following figure shows the observed distribution from this sample of 468 packs, with the mean of 59.2735 Skittles per pack shown in red.

Although our prior assumed average of 60 Skittles per pack was reasonable, there is strong evidence against our assumption of independence from one pack to the next, as shown in the following figure. The *x*-axis indicates the pack number from 1 to 468, and the *y*-axis indicates the number of Skittles in the pack, either total (in black) or of each individual color. The vertical grid lines show the grouping of 36 packs per box.

The colored curves at bottom really just indicate the frequency and extent of outliers for the individual flavors; for example, we can see that every color appeared on at least 2 and at most 24 Skittles in every pack. The most interesting aspect of this figure, though, is the *consecutive* spikes in *total* number of Skittles shown by the black curve, with the minimum of 45 Skittles in pack #291 immediately followed by the maximum of 73 Skittles in pack #292. (See this past analysis of a single box of 36 packs that shows similar behavior.) This suggests that the dispenser that fills each pack targets an amortized rate of weight or perhaps volume, got jammed somehow resulting in an underfilled pack, and in getting “unjammed” *overfilled* the subsequent pack.

This is admittedly just speculation; note, for example, that the 36 packs in each box are relatively free to shift around, and I made only a modest effort to pull packs from each box in a consistent “top to bottom, front to back” order as I recorded them. So although each *group* of 36 packs in this data set definitely come from the same box, the *order* of packs within each group of 36 does not necessarily correspond to the order in which the packs were filled at the factory.

At any rate, if the objective of this experiment were to obtain a representative “truly random” sample of packs of Skittles, then the above behavior suggests that buying these 36-pack boxes in bulk is probably not recommended.

**Stopping rule**

Finally, one additional caveat: fortunately the primary objective of this experiment was *not* to obtain a “truly random” sample, but only to confirm the predicted “ease” with which we could find a pair of identical packs of Skittles. However, suppose that we *did* want to use this data set as a “truly random” sample… and further suppose that we could eliminate the practical imperfections suggested above, so that each pack was indeed a theoretically perfect, independent random sample.

Then even in this clean room thought experiment, we *still* have a problem: by stopping our sampling procedure upon encountering a duplicate, we have biased the distribution of possible resulting sample data sets! This can perhaps be most clearly seen with a simpler setup that allows an analytical solution: suppose that each pack contains just Skittles, and each individual Skittle is independently equally likely to be one of just possible colors, red or green. If we collect any *fixed* number of sample packs, then we should expect to observe an “all-red” pack with two red Skittles exactly 1/4 of the time. But if we instead collect sample packs until we observe a first duplicate, and *then* count the fraction that are all red, the expected value of this fraction is slightly less than 1/4 (181/768, to be exact). That is, by stopping with a duplicate, we are less likely to even get a chance to observe the more rare all-red (or all-green) packs.

It’s an interesting problem to quantify the extent of this effect (which I suspect is vanishingly small) with actual packs of Skittles, where the numbers of candies are larger, and the probabilities of those “extreme” compositions such as all reds is so small as to be effectively zero.

Pingback: Identical packs of Skittles | Possibly Wrong

Interesting, I didn’t expect you’d actually do the experiment. I notice your steps don’t include

eatingthe Skittles. Consuming six packs each day would be excessive, at least by yourself!How big are all the images? That could make for an interesting image processing dataset, especially since it includes all the skittle counts: Train on counting the skittles of half the set and then test against the other half.

Yeah, I learned from this experiment that I don’t actually like Skittles, which is probably good, so a lot of Skittles were bagged and handed off to relatives. It’s also good that I didn’t try this with M&Ms instead 🙂 (which, interestingly, are *not* packaged with a uniform distribution of colors).

The automated counting idea is interesting. The original images are 1024×768 PNGs, which take up about 300 MB. I converted to JPG to bring it down to less than 30 MB, and added these to the GitHub repo.

Thanks! I posted your repository to r/datasets: https://redd.it/ban9gr

GOOD GRIEF! Your images are 300MB?!? That’s CRAZY huge. These could EASILY be dropped to literally a few tens of kilobytes and retain all the “info” you need, while still keeping a large format….

This was awesome! Have you tried a hypothesis calculating the probably a one box (assumed random sample) could apply to the Skittles population?

You should have at least 4 bags of skittles left. You should open them, too. What if you get a second identical bag?!?!

I did; the full dataset for all 13 boxes, for a total of 468 packs, is in the linked GitHub repo. The only identical pair was found at pack #464.

That’s too bad. Sometimes statistics gives really, really odd answers with small data sets, and it would be funny if that happened here.

(Description of data set as small is tongue-in-cheek.)

Pingback: New top story on Hacker News: I found 2 identical packs of Skittles among 468 packs with total 27,740 Skittles – News about world

Pingback: New top story on Hacker News: I found 2 identical packs of Skittles among 468 packs with total 27,740 Skittles – Latest news

Pingback: New top story on Hacker News: I found 2 identical packs of Skittles among 468 packs with total 27,740 Skittles – Outside The Know

Pingback: New top story on Hacker News: I found 2 identical packs of Skittles among 468 packs with total 27,740 Skittles – Hckr News

Pingback: New top story on Hacker News: I found 2 identical packs of Skittles among 468 packs with total 27,740 Skittles – Golden News

Pingback: I found 2 identical packs of Skittles among 468 packs with total 27,740 Skittles – INDIA NEWS

This is really neat! Thanks for sharing!

Pingback: I found 2 identical packs of Skittles among 468 packs with total 27,740 Skittles – Hacker News Robot

Really interesting! Can I ask why you chose to arrange the Skittles in two columns to photograph? I feel like having one long column would be easier to compare, or some 4xN grid easier to quickly count.

For a visual count/comparison, you are probably correct. However, my main objective was sorting speed, and I found that I could sort colors more quickly moving two Skittles at a time into place from the unsorted “pile.”

You should try this with European skittles, we have totally different flavours – EMG. Grape is swapped with Blackcurrant.

Incredible! Have you considered a Patreon page, or a donation button? I think people would donate for mathematical experiments like this one.

Thanks! I always enjoy reading your blog. Yeah, this was one of my more expensive diversions :). I hadn’t considered this, but will have to investigate.

This inspired me to write a program using OpenCV to count Skittles: https://github.com/tlrobinson/skittles.py

It’s not perfect but it’s a start.

Very cool! This is something that I spent some time on at the outset, thinking that it would allow me to skip the minutes spent manually sorting the Skittles– just dump them haphazardly on the paper, take a picture, and move on. But my initial attempts weren’t perfect, and I realized that I wanted to sort anyway so that, once a duplicate was found, it could be more easily visually “matched.” The sorting was the time-consuming part; the counting and recording was actually pretty quick.

If you look at https://www.youtube.com/watch?v=e3DDHVWGnRc&start=72 you’ll see a few factory shots of color-specific skittle pipelines which are feeding into whatever they use to mix them up. Interesting bit is that each container has a few stray yellows clearly visible. Unless that’s a wild coincidence it looks like they do indeed create more yellows than anything else… but why?

Looking at the video (thanks!) I notice that the packages are loose before the boxing stage and therefore won’t necessarily stay in “filled” order. So the argument about packing jams (one few, followed by one many) causing the adjacent anomaly you note is likely wrong.

An interesting question is, therefore, how likely are you to see an adjacent spike (as you observed) in a >randomly< ordered collection of skittles bags? This is another case where you could look at the mathematics to estimate the likelihood of it occurring by chance.

@Zteve, this is a good point. I assume you are referring to the shots at around 0:48 and/or 1:03? This certainly suggests that there is no “preserved” ordering within each box.

If we assume that the two outlier packs were *filled* consecutively, but then randomly arranged within the same box (and even landing in the same box is not certain based on this footage), the probability of extracting them consecutively is 2×35!/36!=1/18. Not terribly unlikely… but still suspicious, given that the same thing happened here.

You are nuts!

I put up a blog post as the “interested reader to consider and analyze whether this (yellow) departure from uniformity is significant.”. Short answer, yellows are not over-filled, but greens are under-filled.

http://blog.agafamily.com/?p=331

Reblogged this on Preston Byrne and commented:

The things some people are curious about.

Why didn’t you use the power of mathematics and image recognition code to simplify your process.

https://blog.algolia.com/how-we-handled-color-identification/

This is so cool. Weird question – I’d love to make a poster of the photo with all the skittle distributions. Would you be up to share a high res version of the original file? (Happy to discuss terms so you felt comfortable – and make you one as well!) Feel free to email me.

Interesting idea– I’ve made some large (36″x36″) prints from stuff discussed here before that required reasonably high quality printing to preserve small detail (here and here, for example). This is probably another example; I can help “re-arrange” the images if desired depending on what size/shape poster you want, will email to discuss details.

Here’s an interesting follow-up math experiment. Given that…

• Skittles are 74% processed sugar by weight,

• each day of experiment added (61.5g/pack x 468 packs / 82 days x 74%) 260 g of sugar per day to the experimenter’s diet

• each day of experiment added (4 calories/1 gram x 260 g) 1040 calories from sugar

• for every additional 150 calories of sugar available per day per person, diabetes levels rose 1 percent ( https://www.medicalnewstoday.com/articles/317246.php )

the experimenter’s diabetes risk increased ~7% from Skittles. So how many experimenter-days would have to pass before we would expect at least 1 diabetic experimenter?

John,

> each day of experiment added (61.5g/pack x 468 packs / 82 days x 74%) 260 g of sugar per day to the experimenter’s diet

Nope, the experimenter didn’t eat the skittles.

Fascinating and I’m so glad you didn’t make yourself eat the Skittles. Worst candy ever. Okay maybe in the top 20.

Pingback: Data Sheet—Why We Should Stop Haranguing Our Kids to Put Down Their Phones – Your total sucess Source

Pingback: Finding two identical packs of Skittles among 468 packs #Candy #Math #Probability @Skittles « Adafruit Industries – Makers, hackers, artists, designers and engineers!

Hey! Do you have any interest in presenting your findings?

Pingback: Data Sheet—Why We Should Stop Haranguing Our Kids to Put Down Their Phones – Business Life Winners

Pingback: How to find two identical Skittles packs? – paulvanderlaken.com

When I was a kid my Uncle Jeff swore he found a pack of skittles without any red ones, so he wrote to Skittles and they sent him a giant sack of JUST red skittles and it was the best day of my kid life.

I’ve always wondered what the likelihood is that he *actually* found a bag without a certain color? If there’s a will to find out, there’s a way!

The probability that a single randomly selected 2.17-ounce pack of Skittles would have zero red candies is approximately 1 in 500,000; however, the probability that *some* color would be missing (e.g., no oranges) would be missing is about 1 in 100,000. Certainly unlikely… but assuming that roughly 50,000 new packs are manufactured *every day*, it’s a near certainty that *some* packs will be missing at least one color.

I’ve often asked the question, “If you open a pack of Skittles (or M&Ms) and it doesn’t have at least one of each flavor, do you feel cheated?”. As you point out, the odds of that happening with a full-size pack are very low, but are much higher with “fun size” packs.

This experiment is truly amazing! Who did all this work?

Hey,

Slightly odd request, but would you be prepared to share the hi res images of all the packs as I think the composite photo you have made (without the red circles) would make a great poster, just for me for fun. Turn science into art… 🤪

Chris

Sure, no problem; you’re not the first to ask for this. The 1024×768 images of each pack are here. Note that in the single image of all of the packs in the article, the numbered packs are arranged in columns, not rows.

Just created a visualization from your data (posted it on my twitter _nbquinn), thanks so much for the article and dataset!

Pingback: Link Dump – 2019/05/04 – Nick's Blog

Pingback: A potential exploit of a Mountain Dew promotion | Possibly Wrong

Pingback: The problem with text messages and more – This Geek in Review for Jan. 3, 2020 – Ryan Collins

Very cool post!

One small detail: Wouldn’t it be more clear (unless ofc I missed some easier way to derive it) to write p(n,d) as

It took me a while to get why the function is correct, maybe it isn’t obvious to some others either.

(I edited your comment to add the “latex” tag inside the dollar sign to format the LaTeX.)

You’re right that this is an equivalent representation; more strongly, the generating functions themselves are the same, and thus the effective multipliers are the same as well. It’s a fair question which is clearer, arguably depending on how one interprets the underlying counting problem. For example, in the original form, consider counting ordered sequences of 2n Skittles, with the n Skittles in the first half of the sequence coming from the first pack, and the second half of the sequence from the (identical) second pack. Then the denominator clearly enumerates the sample space with size d^(2n), and the g.f. coefficient is the number of such sequences corresponding to identical packs, with each term in a sum in the g.f. corresponding to arranging k Skittles of a given color in each half/pack of the sequence.

Pingback: NEEEEEEEEEEEERDS! (Actually, Skittles) | Eigenblogger