Sampling error in sports and politics

Laura McLay asks why is it so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament? Nate Silver famously predicted the winner of all 50 states; but if you look at the NCAA basketball tournament, it’s difficult to get much above the low-70-percent range in predictive accuracy. (Silver himself has pointed this out.)

One thing that’s not mentioned, though, is that a basketball game is simply a smaller sample than voting. The basic unit of basketball analysis is the possession; a typical Division I college basketball game might include 150 or so possesions. (Averages per team are at team rankings.) If you let two basketball teams go at it for hundreds of thousands or even millions of possessions, the chance that the better team would win the game would be much higher.

In short, basketball games are subject to sampling error; voting is not.

Sum of powers of i

James Tanton asks:

What is 1 + i + i^2 + i^3 + i^4 + i^5 + \cdots?

Of course it’s 1/(1-i), right, by the usual formula for summing a geometric series? But this says that

1 + z + z^2 + \cdots = {1 \over 1-z}

when |z|<1.  And |i| = 1, so it doesn’t work here. But who cares? Start taking partial sums. The sum is (after simplifying using i^2 = -1, i^4 = 1):

1 + i - 1 - i + 1 + i - 1 - i + \cdots

and we can write down partial sums: 1, 1+i, i, 0, 1, 1+i, i, 0, \cdots — and the average of this series is $(1+i)/2$, which is $1/(1-i)$. It’s a complex version of Grandi’s series (1 + 1 - 1 + 1 - 1 + 1 \cdots = 1/2), and indeed the argument I’ve outlined here is Cesaro summation.)

Weekly links for March 25

John Cook has an incomplete post about sphere volumes for which he asked for some help in recognizing some familiar formulas.

Andrew Gelman writes for the New York Times on how fast we slow down running longer distances and comments on his blog on where one might get the data.

Peter Cameron has an extended series on Fibonacci numbers: one, two, three, four, five, six, seven, eight.

How to add up quickly, from Plus magazine, on accelerating series convergence.

From Michael Trick, the indiegogo fundraiser of the traveling salesman movie.

On the distribution of time-to-proof of mathematical conjectures, by Ryohei Hisano and Didier Sornette. (I learned about this paper from Samuel Arbesman‘s book The Half-life of Facts: Why Everything We Know Has an Expiration Date.)

Numberphile on statistics on match day as collected by Opta Sports.

The New York Times on Mayor Bloomberg’s geek squad.

Oscar Boykin at the Northeast Scala Symposium gives a talk Programming isn’t math.

Are the Oxbridge bumps races the longest running Markov Chain Monte Carlo simulation in the world?

How deep is a tennis tournament compared to March Madness?

From the Wall Street Journal: a print article about the use of natural language processing in reinventing the smartphone keyboard and an accompanying interview with Ben Medlock, chief technology officer of SwiftKey.

From the BigML blog, bedtime for boosting.

Chris Wilson of Yahoo Research blogs about social network analysis based on Senate votes. Also at the Washington Post. (Democrats are more cohesive than Republicans.)

Jeff Rosenthal spoke in 2010 at Harvard on How to discuss statistics on live television; this was the inaugural Pickard memorial lecture, which was recently posted on youtube.

March Madness links

Probably too late to use these in filling out your brackets, but these may be of interest:

Nate Silver’s advancement probabilities and bracket.

How to pick a winning bracket using analytics, from Laura McLay.

John Ezekowitz at the Harvard Sports Analysis Collective predicts the tournament and predicts the upsets.

And finally, Jordan Ellenberg’s math bracket, created by picking the school with the better math department to win.

Edited to add: a late breaking post from Laura McLay.

Pi day

Here’s a roundup of pi-related links.

A poetic proof of the irrationality of pi.

Liz Landau on Daniel Tammet and pi.

A baker’s dozen of pie chart of pie recipes.

The Bayesian Biologist’s pi day special: estimating π using Monte Carlo.

Pi approximation day: “a holiday for people who are GOOD ENOUGH, just not transcendental”.

John Cook has five posts on computing π.

Jordan Ellenberg talks about pi day for Wisconsin Public Radio.

The Exploratorium in San Francisco will be unveiling its pi shrine today.

The Aperiodical has a podcast on memorizing pi.

Numberphile’s pi videos, including calculating pi using pies.

Vi Hart’s singing pi-gram.

From Dave Richeson. who first proved that C/D is a constant?

Solution to the gambling machine puzzle

From the New York Times “Numberplay” blog:

An entrepreneur has devised a gambling machine that chooses two independent random variables x and y that are uniformly and independently distributed between 0 and 100. He plans to tell any customer the value of x and to ask him whether y > x or x > y.

If the customer guesses correctly, he is given y dollars. If x = y, he’s given y/2 dollars. And if he’s wrong about which is larger, he’s given nothing.

The entrepreneur plans to charge his customers $40 for the privilege of playing the game. Would you play?

Clearly the strategy is to guess that y > x if x is small, and to guess that y < x if x is large. Say you’re told x = 60. If you guess x is the larger variable, then conditional on your guess being correct (which has probability 0.6) you win an average of 30 dollars (halfway between 0 and 60). If your guess is incorrect you win nothing. Similarly, if you guess x is the smaller variable, then conditional on your guess being correct (which has probability 0.4) you win an average of 80 dollars (halfway between 60 and 100). So your expected winnings are 18 dollars if you guess x is the larger variable, and 32 if you guess x is the smaller variable. You should guess x is the smaller variable — that is, 60 is “small”.

This is surprising at first — 60 is closer to 100 than it is to 0, and if you’re just trying to guess correctly you’d guess that 60 was the larger of x and y. But the payoff is the unseen number y, and if x is the smaller variable then that biases the value of y upwards.

To simplify the analysis, I’m going to say that you’re given u and v, which are x/100 and y/100; so they’re uniformly distributed between 0 and 1. You’re told u, you get to guess if v is larger or smaller than u, and if you get it right you get 100v dollars.

You’re told u. If you guess u is the larger of the two variables, then conditional on your guess being correct — which has probability u — you win on average u/2 hundred dollars. And if you guess u is the smaller, then conditional on your guess being correct — which has probability 1-u — you win on average (1+u)/2 hundred dollars. So your expected winnings are L(u) = u(u/2) = u^2/2 if you guess u is the larger, and S(u) = (1-u)(1+u)/2 = (1-u^2)/2 if you guess u is the smaller — all money is in units of one hundred dollars.

So you should guess u is the larger variable exactly when L(u) > S(u); that is, when u^2 > (1-u^2), or u > 1/\sqrt{2} \approx 0.71.

What is the expected payoff? It’s an easy integral, namely

\int_0^1 \max(L(u), S(u)) \: du = \int_0^{1/\sqrt{2}} {1-u^2 \over 2} \: du + \int_{1/\sqrt{2}}^1 {u^2 \over 2} \: du = {\sqrt{2} + 1 \over 6}

and that’s about 0.4024 — the expected value of this game is $40.24. So you should play! But on the other hand the casino operator might still make money, because are people really going to sit down and work out the optimal strategy?

Lots of solutions were offered at the post giving the puzzle; Bayesian biologist had a simulation-based approach.

(Finally, you might notice that I ignored the possibility where x = y. That’s not because I’m forgetful, but because it happens with probability 0.)

Weekly links for March 11

Brian Hayes on the baby Gauss story.

Deep Impact: Unintended Consequences of Journal Rank, by Bjorn Brembs and Marcus Munafo. h/t Jordan Ellenberg; Cathy O’Neil’s comments.

Frank Farris on forbidden symmetries in the Notices of the AMS. via Scientific American.

Natalie Wolchover profiles Doron Zeilberger, evangelist of mathematics using computers.

Neil DeGrasse Tyson, On Being Round.

Phase Plots of Complex Functions:
A Journey in Illustration
by Elias Wegert and Gunter Semmler.

Evelyn Lamb has two posts on the four-color theorem: one, two.

An interview with Tim Harford (Financial Times’ “Undercover Economist”, More or Less presenter).

The return of Dow 36,000, or false extrapolation

Dow 36,000 is attainable again, within three to five years, because the Dow has gone up from 6547 to 14397 in four years; that means a growth rate of 21.54% per year (the fourth root of 14397/6597), and growth at that rate for five years puts the Dow at 38188.

This is an astounding feat of extrapolation. I wonder – the Dow last peaked at 14164 on October 9, 2007. By March 9, 2009 it was at 6547. Would Hassett and Glassman seriously have suggested, four years ago today, that it should drop by more than half in every seventeen-month period — putting it at 741 today?

I think not.