The bug in the random sort
We needed to randomize a list of values, so I quickly wrote the following code:
What do you expect the output to be?
- A –> 2431
- B –> 2571
- C -> 4998
The number vary slightly, but they are surprisingly consistent overall. C is about 50% likely to be at the head of the list, but I wanted the probability to be 33% obviously.
Can you figure out why this is the case? In order to sort, we need to compare the values. And we do that in a random fashion. We start by comparing A and B, and they have 50% change of either one of them being first.
Then we compare the winner (A | B) to C, and there is a 50% chance of C being first. In other words, because we compare C to the top once, it has a 50% chance of being first. But for A and B, they need to pass two 50% chances to get to the top, so they are only there 25% of the time.
When we figured out why this is happening, I immediately thought of the Monty Hall problem.
The right solution is the Fisher-Yates algorithm, and here is how you want to implement it.
Woah, already finished? 🤯
If you found the article interesting, don’t miss a chance to try our database solution – totally for free!