Who shuffled these? A visual and mathematical introduction to shuffling cards
Note: This blog was created as an entry to the Summer of Math Exposition 3 by 3Blue1Brown which is a part initiative - part competition to increase the number and quality of resources on the internet related to Maths topics.
Edit after SoME3 results. You can check out here other interesting entries and math explainers.
Motivation
We'll focus on a Ticket to Ride deck (boardgame where we pay with colored cards the cost to build roads of different size) because it is while playing this game that I wondered if pile shuffling is enough. Also, there is a second reason, which will become apparent at the middle of the explanation: for an English deck of cards you have 4 kinds but you have a lot of complicated relationships between cards of same or different kind: full house, straight, flush, four of a kind, etc.
- First tests. We'll apply riffle and pile shuffling to a sorted deck, and define what a used deck means.
- Measuring randomness. We will define what to look for and what to measure.
- Finally some results. We will apply a method of shuffling a large number of times and analyze some interesting comparisons.
First tests
Let’s begin by testing a few shuffles and looking at the resulting deck. We will represent a deck as a series of stripes of color as shown in some of the images below and for the first third of this blog we'll start with a sorted deck, but then we will study what happens with an used deck.
We’ll start with a sorted deck of Ticket to Ride which contains 110 cards in total (8 colors with 12 cards each, and 14 multicolored). For ease of visualization, we won’t represent a card with edges, just its color. We see that doing just 1 riffle shuffle is not enough, it’s obvious we need to keep shuffling. Remember, each stripe of color represents 1 card.
The difference is subtle, but it’s there. So if we do multiple riffles shuffles, we won’t be keeping tabs on that. This will add some variability to multiple uses of the same method to shuffle. See Appendix A1 for more details.
Then, multiple uses of the same method means starting with a sorted deck, applying the method (1 riffle shuffle, 7 riffle shuffles, etc.) and getting a shuffled deck as a result.
If we compare these different attempts at shuffling a sorted deck first we see that they’re different from each other. That’s consistent with the variability we already mentioned (left-right). However, some of these shuffled decks don’t look right. What do you see? I can see some patterns in #1 (blue-green-yellow-orange-red) and #3 (green-black-orange).
Takeaway 1. Any method of shuffling could encounter some patterns. That doesn’t make it a bad method necessarily.
Other methods
Then, how do we know if the method we’re using is good enough? What does good enough mean in shuffling? We’ll answer these, but first let’s look at the other method we’re testing: pile shuffling.
Lastly, if we compare multiples uses of the riffle method with multiples uses of the piles method, they don’t look that different. You could argue that some patterns are present, but we can’t say for sure that this will happen all the time.
- Patterns. We are seeing patterns which are sequences of colors that appears more than once. For instance, in ‘pile multiple’ the sequence red-yellow-green-blue appears three times.
- Initial deck. Up until now, we’ve used our shuffling methods to sorted decks. When in reality, we will be shuffling used decks or decks at end-of-the-game. We need to improve on that before reaching any conclusion.
-
Repetition. Let’s
say you have a 100 used decks. For each one, you apply the same method: 7
riffle shuffles. What do you expect? The resulting 100 shuffled decks should
not be all the same! We need to analyze if a method is good enough over a large number of scenarios.
Takeaway 2. To test if a method is good enough, we need to check for patterns over several uses of the method to account for possible variability.
Intermission: Constructing a plausible deck at end-of-the-game
When you finish playing you don’t have either an ordered deck, nor a random one. In the case of the English deck, you will probably have pairs, straights or any other. In the case of Ticket to Ride, you will have a semi-structured deck:
- Part of the desk will be totally random, if you started the game with a well shuffled deck.
- The rest will be in chunks of colored cards.
- The proportion of these two parts is unknown.
So, at the end of the game, the deck will look like the image below. We have a sorted deck as a reference, and also a totally random one. The two 'ticket decks' (end-of-the-game decks) created have a mix of these two types of behavior. On the next sections, we will only apply methods only to used decks. See Appendix A3, for more details on the game.
Measuring randomness
Now, we set out to define a simple measure of randomness based on the concepts already mentioned before such as patterns and repeatability. We will do a simple count method, such as the image below for a test small deck. Some sequences (2 colors) are not present, and some are twice. If I shuffle again, will the count table stay the same?
The general procedure to be used is shown in the diagram below, will be done thousands of times to get a good sample of the variability shuffled decks could have.
Finally getting some results
- Doing just a few shuffles.
- Doing several shuffles.
- Understanding how the number of shuffles influences the deck.
R1. Doing two riffles, or just two piles
We can see at
first glance that both methods have some differences, and we will go over each
one. Note that we have 81 possible patterns (9 colors by 9 colors) such as: green-red, red-red, yellow-multicolor, etc.
- Pile shuffling has some peaks at same color patterns (such as red-red or green-green). This is expected: we started with a deck that already has some same color cards next to each other and the method used was not enough to set them apart.
- Both methods have a higher average at the beginning. This is related to multicolor
cards and it’s because there are more of them (14) vs any other color (any
other color has 12). So, it’s more likely that one of those patterns emerges.
These patterns are of the type (multicolor)-(any color). And why at the beginning? Because I ordered the patterns that way when listing them.
- Riffle shuffling has some peaks, right before the valleys. This is due to patterns of the
type (any color)-(multicolor). Similar to the previous item.
- Riffle shuffling has some valleys, at the same place that pile shuffling
has peaks. This
happens because for a well shuffled deck same color patterns are less likely.
Let’s do an example: you have 2 orange cards and 2 green ones. These are all the
possibilities: orange-orange (two times), orange—green(four times), green-orange (four
times) and green-green (two times). See the image below.
Takeaway 3. With just a few shuffles most
patterns appear with almost the same frequency in the deck over a large number
of games.
R2. Doing 7 riffles, or 9 piles
This comparison will raise some important questions which will then allow us to reach the end result (R3) which we set out to answer.
We can see in the image below that the average ocurrence of each pattern is almost identical for both methods. We can still see some differences, but they’re due to two facts we already mentioned.
- Same color patterns are less likely that mixed color patterns
- Multicolored cards are higher in number that any other color
These facts will stay the same. Even a third method was added: ask the computer to randomize the deck as best it can and we get the same result. So if we look at the average ocurrence of any pattern we know that 7 riffle shuffles or 9 piles are enough to have a random ocurrence of any pattern for a large number of shuffled decks.
So, what do we do now? Surely there must be differences between riffle and pile shuffling. That leads us to include another statistical definition here: the standard deviation. I want to avoid a formal definition, so we’ll give an example as shown in the table. What method would you prefer?
Even if the standard deviation were too large, patterns could occur on average with the same frequency as others. What we need is for shuffled decks to be as closely similar to each other as possible, so that our games are really consistent and feel random.
We'll calculate the standard deviation for each method, and for each possible pattern. Now we can see a difference between these two methods! The riffle shuffle has in general a lower deviation than pile shuffling. The key question you should ask yourself here is this one: what will happen if I do 4 riffle shuffles instead of 7, and 13 piles instead of 9. Will this result stay the same?
Takeaway 4. For a large number of games, doing 7 riffle shuffles versus 9 piles shuffling will yield the same average occurrence of patterns, but in general with a lower standard deviation.
R3. How do results change when we change the number of times we shuffle?
- Look at the variation at how in general the standard deviation changes when changing either: number of shuffles, method or sequence.
- Or, creating a general value of the standard deviation that it is representative of all patterns. This is the option we used and the general value is just the average value of the standard deviation. This allows us to differentiate the scenarios by method and number of shuffles, only.
What we get, is a really interesting result after all this work. We can see that the deviation for riffle shuffling falls rapidly after 3 shuffles and then slowly after 15. As for the pile method, the standard deviation also falls after 4, but much more slowly.
So, we can get a pretty well shuffled deck with 7 to 10 riffle shuffles, but we can't get a good one with pile shuffling. Even if we did 30 piles, which is impractical, we would get something just as good as doing 3 riffle shuffles.
Summary
I believe a summary is in order, given that the work done here involves several steps. Then, we’ll end with some brief thoughts.
- The whole point was to explore how can I be sure that I’m shuffling correctly. If I’m bad at one method, what are the implications of using another one?
- For multiple riffle shuffling, we saw that some patterns emerged. This made us question if that meant that the method was faulty or not.
- We tried another method, pile shuffling and found that it can be close enough to riffle shuffling. At this point, we were only seeing qualitatively if there were differences or not. But that can only take us so far, we needed to include a quantitative method.
- We started to count the numbers of patterns in each shuffled deck and we did it a large number of times. Then, we extracted two important statistics: the average of occurrences of each pattern and how much each shuffle deviated from an ideal quantity.
- We found that doing riffle or pile shuffling a few times was enough to get a consistent average across all patterns, but that riffle shuffling allows us to obtain that with a lower deviation.
Closing thoughts
I learned a lot creating this blog and now I remember back a few months ago while doing pile
shuffling on Ticket to Ride decks at home, thinking if I wasn't hindering my game nights by doing this particular method. But after all this work, I can prove
my past self wrong and show that, indeed, I need to properly riffle shuffle my decks.
If you’re interested in delving deeper into notation and more formal definitions, please refer to the References.
The
Appendices at the end go over some mathematical relevant observations but are
not needed to understand the main blog.
Disclaimer
Most images used were of my own making or they’re
the result from my code on Python, tables in Excel or diagrams in PowerPoint.
'Ticket to Ride' is a trademarked boardgame that involves constructing railroad routes across various landscapes and cities. It's important to clarify that I have no connections with the creators or proprietors of the 'Ticket to Ride' brand/name/game. Any references made to the game in this blog are solely for educational purposes and do not imply any endorsement or affiliation with the brand or its developers. My perspectives and opinions remain impartial and are not representative of any official relationship with the brand.
The Ticket to Ride image used was created by user 'JIP' under the license CC BY-SA 4.0.
About the author
I’m a Mechanical Engineer with a passion for
Maths and Education. I’ve worked as a tutor for Maths and Physics courses at a
college level. I usually play videogames on my spare time. I have plans to do other blogs which could be interesting, also related to Maths and Physics, so stay tuned!
References
[1] https://en.wikipedia.org/wiki/Shuffling (good starting point, work your way up from here)
[2] Anke van Zuylen and Frans Schalekamp (2004). THE ACHILLES' HEEL OF THE GSR SHUFFLE: A NOTE ON NEW AGE SOLITAIRE. Probability in the Engineering and Informational Sciences, 18, pp 315-328
[3] Vennos A., Michaels A., (2021) Shannon Entropy Loss in Mixed-Radix Conversions. Entropy. 23(8):967.
[4] Diaconis, P., Graham R.L., Kantor, W.M. (1983) The Mathematics of Perfect Shuffles. Advances in Applied Mathematics, 4, pp 175-196.
Appendices
A1. What do you mean by variability in a multiple riffle shuffle?
At every riffle shuffle
we have 2 possibilities (left or right). If we do a riffle shuffle N times. Then the number
of possibilities is really 2^N, which is not really random. So, if we started with a sorted deck and did the method using 7 riffle shuffles (2^7=128) after 128 we will surely encounter a unique deck for a second time. For a bigger number such as 20 riffle shuffles (2^20 ~ 1 million), I would be surprised if you could distinguish the decks from each other, so they will truly feel random.
A2. What about long patterns? At some point we won’t be able to see them all in a finite deck
If we are talking of a
sequence of a finite number of cards: red-blue, green-yellow-white,
green-black-yellow, etc. At some point, if the sequence is long enough we’ll
run into an apparent problem. Let’s
call a n-sequence, a sequence of n cards which can be composed from k possible colors (9 in total in Ticket to Ride). Then,
a n-sequence has k^n possible configurations. We’ll compare this
number to N, which is the total number of cards.
Parameter |
Quantity |
Total cards N |
110 |
Number of 1-sequences |
9 |
Number of 2-sequences |
9 x 9 = 81 |
Number of 3-sequences |
9 x 9 x 9 = 729 |
… |
|
If we try to count for each possible 3-sequence how many times it appears in a
110-card deck, even if by some miracle you can manage to order the cards and
each 3-sequence appears only once, then it means that only you can see some of them (108 to be precise).
This is when repeatability
comes into play. If my shuffle method is consistent
over repetitions and creates a truly randomized deck, then we will encounter over time all possible sequences equally.
A3. How did you create a used deck at the end of the game?
If you know how to play Ticket to Ride, you are probably wondering how did I create a random deck, on demand, that truly captures the reality of the game. If you looked closely enough, you noticed that it has some errors. I’ll give some comments here on what was done, and what wasn’t covered:
- From
the Ticket to Ride standard map I counted the number of roads for any color. I
summarized this in a distribution that selects a random road and tells me:
length and color.
- I did not include the fact the multicolored cards can only be played in singles or pairs, in groups with colored cards. I treated it just as another color.
-
I
did not account for the fact that there are definite objectives (from one city
to another). So the selection of roads does not necessarily reflect that. This
is probably the most important note here, but also the hardest to work out.
To truly create a real deck at the end of the game short of creating a bot that
plays the game over and over, I would have to account for the network of roads
and possible ways to win which I deemed not useful for the purposes of this blog.
In summary, the decks used for the explanation, are in the opinion of the author, sufficiently close to reality. Over a large number of end-of-the-game or used decks, the main difference will focus on how well a shuffle method works, even though some subtleties are lost. Working this out could be an entirely new blog from the ground up.
A4. What do you mean by random on your computer
The code used Python as
programming language. When randomness was needed the random library was
used. You can see more details on the library or some discussion about it.
But in practical terms, the method creates something that looks pretty random and you will have a hard time proving otherwise. So, this is random enough for our purposes because we only need something to compare our shuffles to.
A5. Average and standard deviation formal definitions, with examples. [Recommended Appendix]
We will briefly define this statistics in case you need a refresher or more guidance:
First, we'll define the N ocurrences of a pattern or variable with these symbols: x1, x2, x3, ..., xN.
Numerical Example. The ocurrences of a green-red pattern in used decks in Ticket to Ride: 3, 0, 1, 2, 4.
Average
Then, the average of N values is a representative value of a list of numbers and it's calculated by summing them up, and then dividing by the total.
Numerical Example. Based on the previous numerical example, the average would be:
Day-to-day example. You want a general idea of how much you spend on bills every month. You calculate the total, for each month, and then take an average to get an idea of how much could next month be.
Standard deviation
Finally, we define the standard deviation of N values as a way to measure the spread of the values.
Numerical example: Based on the previous numerical example, the standard deviation would be:
Day-to-day example. You already have the total bill amount for each month, but you see that they change a lot from month to month. You calculate the standard deviation, because you need a margin of error. For instance, if the average is $200 and the standard deviation $50, then next month could most likely cost from $150 to $250.
More comments on the standard deviation
To round up here, I'll make two comments on how it's useful and why it's defined like that.
First, for the numerical example we get 2 as average, and 1.41 as standard deviation. We could say that most values then, will be in the range of 0.59 (2 minus 1.41) and 3.41 (2 plus 1.41). That's a practical use of these definitions, because we're getting more information than just an average.
Second, you could be wondering why do we use such a complicated formula. We could've used the range (the maximum value minus the minimum value). Or we could've avoided using powers and roots. It seems unnecessarily complicated.
To answer this, as succintly as possible, I can first say, that there is no easy answer (sorry!). But to give you some food for thought (maybe you can get an answer) I'll show a method to convince yourself:
- Come up with a new variable that captures the spread of the values. I'll use the range and compare myself to the standard deviation (STD).
- Create two sets of numbers. For instance,
- Set A. 1, 2, 3, 4 and 5. Range: 4. STD=1.41
- Set B. 1, 3, 3, 3 and 5. Range: 4. STD=1.26
- We get the same range, but the sets have some slights differences
- Now, let's do a more extreme example, with two new sets.
- Set A. 1, 2, 3, 4 and 5. Range: 4. STD=1.41
- Set B. 1, 2, 3, 4 and 50. Range: 49. STD=19.02
- Now we have kind of an opposite situation. The sets are similar, but the range is really different. What would've happened if we had used the standard definition?
- For the first two sets, the STD value is similar, but it picked up some differences that the range couldn't.
- For the second two sets, the STD has a large difference, but not as much as the range. So it 'understood' that most values are near lower values. Then, it also picked up more information.
- On both cases, we could say that the standard deviation picked up more information that our variable.
- Homework for you. Come up with new sets to test this and new possible definitions. Is the standard deviation still better?
Note: You can check some other simple examples here and a more technical argument on the why we use the standard deviation here.
Something that occurred to me while reading - as a general method, perhaps you could make a matrix for each position in the original deck (just imagining the cards as labelled 1-52 etc), ending up at given position in the final deck. I suppose an ideally random shuffle would have a uniform value across the entire matrix, with biased shuffles having peaks in the response. However, it may well be a less peaky response might have a tendency to create more patterns (for whatever reason) - which is after all what starts the complaints around the table haha.
ReplyDeleteHi, I aprecciate you taking the time to comment. That method is in fact used in some papers, because it's 'ignorant' or 'agnostic' of the colors, kinds, or whatever the desk is composed of.
DeleteFor instance, for an English deck you could have two methods: see pattern ocurrence and this 'original position' tracking. the first, is not really clear how to apply (should all patterns appear with equal probability? probably not), and the second, is not that intuitive, in my opinion. the second reason is why I focused on a deck that was easier to explain.
Could you explain the process of pile shuffling as I've never heard of it. Do you take a small random number of cards and try to randomly put it on different piles? Of course I could just google it but maybe a bit more extensive of an explanation is a good idea.
ReplyDeleteI'll explain the most simple version, and comment on how you could change things up.
Deletea. you decide a number of piles (7, 13, 20). I usually use 9.
b. you take 1 card from the top side of the deck and position it in pile '1'.
c. you continue to take cards, and position them in pile '2', '3', ..., '9'. in order
d. now, with your 10th card, you continue in pile '1' and so on.
e. when you're finished, you stack piles with one another in any order you want.
this is the version used in this blog, where step (e) was done randomly (computer randomness).
some comments:
i. you could put cards in any order, which means that piles won't have a similar 'height' all the time.
ii. you could stack the piles in a specific order. for instance, if it's a 3x3 (1-2-3 | 4-5-6 | 7-8-9) pile grid. you could do them in order, or follow a pattern. I usually do, in real life, 1-6-8, 2-4-9 and 3-5-7. and then stack those three in a single pile.