Joseph Nebus has written a series of posts on the entropy in basketball results: for a single team, for both teams, in the win-loss results for a 64-team tournament like March Madness. For the final question he gets an answer of about 48 bits. That is, the probability of guessing the winners and losers of a tournament correctly is on the order of 248.
From blind guessing one gets 1 in 263, quoted for example in this USA Today story, with the caveat that:
Even so, allowing for some knowledge of college basketball and taking it account the norms of the NCAA tournament, the odds of a perfect bracket are still about 1 in 128 billion, according to DePaul math professor Jay Bergen.
This refers to Jeff Bergen’s video, “where does 1 in 128 billion come from”. Note 128 billion is roughly 237. Bergen’s strategy is to assume that the top seeds always win, since this is the most likely outcome.
The fact that two reasonable people gave two such different answers is an example of just how hard it is to estimate small probabilities. But both of these models gave up on using empirical data after the first round. Yet matchups between the “favorite” seeds should happen fairly often, and there will be data! Let’s look at win probabilities by seed as compiled at mcubed.net. In a tournament where all the favorites win, we’ll have:
- in the first round, four matches of 1 vs. 16, 2 vs. 15, …, 8 vs. 9, one in each region
- four matches of 1 vs. 8, 2 vs. 7, 3 vs. 6, 4 vs. 5, one in each region
- four matches of 1 vs. 4, 2 vs. 3, one in each region
- four matches of 1 vs. 2, one in each region
- three matches of 1 vs. 1, from different regions, of course
Historically, 1 seeds are 124-0 against 16 seeds, 2 seeds are 117-7 against 15 seeds, and so on until 8 seeds are 79-69 against 9 seeds. So the probability of picking all eight first-round games in one region perfectly is
and the probability of getting all 32 first-round games right is the fourth power of this, about or one in 14,000. (The different denominators correspond to different numbers of times each matchup has occurred, presumably due to changes in the tournament structure; the 64-team field only dates back to 1985. Oddly enough, the Washington Post reports that nobody ever seems to pick a perfect first round. This isn’t a contradiction – nobody is boring enough to pick the strategy with the highest expected value for that bet, when the bet most people are interested in is trying to win their pool.
The probability of picking a perfect second round in any given region is ; for all four regions it’s the fourth power of this, about .
The third round in each region consists of a 1-vs-4 game and a 2-vs-3 game, where the favorites win with probability 46/68 and 36/59 respectively; the probability of picking all eight third round games correctly is .
The fourth round in each region is a 1-vs-2 game, where the 1 seed has historically won with probability 38/69; the probability of picking all four correctly is .
Finally, the probability of picking all three Final Four games correctly is 1/8 – the model knows nothing beyond seeding.
Multiplying this all out, I get that the probability of picking all 63 games correctly is
or about one in 42 billion, in a generic tournament. For what it’s worth, FiveThirtyEight gave 1 in 1.6 billion this year and 1 in 7.4 billion last year, using a model that actually knew something about basketball.