When the US Air Force discovered the flaw of averages

An in-depth description of Napier’s bones.

Cathy O’Neil on being an ethical data scientist.

Artisanal integers (not just a page that gives some out – some actual math content here!)

John Allen Paulos talks with Glen Whitney about his new book, which I am ashamed to admit I haven’t read yet.

Posted in Uncategorized | 1 Comment

Do Super Bowl babies exist?

So a couple times during the Super Bowl, there have been commercials claiming that there are post-Super-Bowl baby booms – that is, nine months after the Super Bowl, there’s a surge in births in the city of the winning team.

This seems a lot easier to gather data for than some of the other things you hear this claimed about (blackouts, blizzards). Here’s what I could find on those:

This is easily proven or disproven, after the data wrangling (which means, let’s face it, that it’s hard). The NBER appears to have the necessary data (the tropical storm paper above links to it) although I don’t know this data set at all.  Have fun, demographers!

Posted in Uncategorized | 1 Comment

Super Bowl squares with other moduli

People are interested in the odds for the “Super Bowl Squares” game: see for example the Harvard Sports Analysis Collective in 2013 and Mike Beuoy writing for FiveThirtyEight in 2014. The way the game works is as follows:

• players pay money into a pool.
• a 10 by 10 grid is made, and the rows and columns are marked 0 through 9.
• One team’s name is written corresponding to the rows, and the other to the columns
• the squares of the grid are assigned randomly to the players, proportionally to the amount of money they paid.
• after each quarter of the Super Bowl, look at the last digit of the number of points each team has scored. This gives a row and a column, and the person who has the corresponding square gets some money (say, one-tenth of the pool)
• at the end of the game, do the same. The person who has the corresponding square gets a lot of money

This game suffers from a flaw – there are lots of squares that are pretty much worthless, so after the random assignment happens, if you have those squares you won’t win. I couldn’t get quarter-by-quarter data, but below see the number of times that each game score occurred, where the scores are reduced mod 10 (i. e. we look at just the last digit). Data is from pro-football-reference.com.  Obviously using (winner, loser) isn’t exactly the same as using (home team, away team) or some other assignment of teams done before the game, but I don’t think the conclusions here are very sensitive to that.

The most common squares are (0, 7) (Note: I’ll refer to squares by (winner score, loser score), which doesn’t agree with the picture but does agree with the way scores are usually read), which occurs 611 times (including the most common single score, 20-17, which has occurred 248 times), and (7, 0) which occurs 610 times (led by 133 occurrences of 17-10 and 102 occurrences of 27-20). On the flip side, (2, 2) has only occurred six times (two games each of 12-12 and 42-32, and one each of 22-12 and 42-22). If you know anything about football, you know that scores come in sevens and threes, for the most part, and this has the property of making certain last digits a lot more common than others.

But there’s an easy fix. What if we play mod 9? Then the distribution of historical scores looks like this:

There’s still some unevenness, no doubt about it.  But there aren’t terrifying white gaps signifying scores that never happen.  The most common square is now (4, 1), which occurs 361 times, most commonly as 13-10, 31-28, or 31-10. But even the lowly (2, 2) occurs 64 times in the historical record, most frequently as 38-20, 20-20, or 29-20. (In fact, all but two of the (2, 2) games had at least one team scoring exactly 20.)

And you don’t even have to do division to reduce a number mod 9 – just add the digits of the score up and repeat until you get a single-digit number. 9 counts as 0.

What about if you don’t have a lot of friends and want to do a smaller pool? Mod 6 works well, and has the advantage that you can assign the squares by rolling a die:

The most common square is (0, 3) (most frequently represented by 24-21 or 30-27) and the least common is (4, 5) (most frequently represented by 28-17 or 34-17, which at least sound like plausible football scores).

But whatever you do, don’t play mod 7:

This is basically a fancy way of beting on how many field goals each team will score: “0” means no field goals, “3” means one field goal, and so on. Also it defeats the purpose of gambling, which is to make the game more interesting – a touchdown plus extra point doesn’t change anything.

Go… um… seriously, I can’t remember who’s playing.  All I know is that the people I know back in San Francisco are complaining and perhaps vandalizing statues.

Uber vs. taxis simulation and explanation of it from Kevin McLaughlin.

An NFL scheduling quirk explains how certain teams can pile up the wins against weak opponents.

Inside the Wall Street Journal’s prediction calculator (for predicting ethnicity from names).

The recently departed Marvin Minsky on What makes mathematics hard to learn?

From the Notices of the AMS:
George Andrews reports on The Man Who Knew Infinity (the new Ramanujan movie) and the editors explain Gauss curvature.

Gunnar Carlsson at Ayasdi writes on How Topological Data Analysis provides a glimpse into what may be powering the Trump engine.. (This may all make a little more sense – or less – after tonight’s caucuses.)

Richard Nisbett talks to EDGE about what’s wrong with multiple regression analysis.

Erik Bernhardsson analyzed 50,000 fonts using deep neural networks. (It’s like Metafont, but with neural networks and more data.)

John Cook asks what are the next areas of math to be applied?

Nicolas Kruchten at MLDB on machine learning meets economics. (ROC is not the One True Criterion for model evaluation.)

Videos of curve-drawing machines (silent and with little explanation, but oddly hypnotic)

Robert Bosch, Robert Fathauer, and Henry Segerman on numerically balanced dice – that is, many-sided dice that are optimally fair even if they’re physically a bit unbalanced.

Steve Paulson inteviews Frank Wilczek for Nautilus: Beauty is physics’ secret weapon.

Nick Berry at DataGenetic explains Hamming codes for error correction.

Posted in Uncategorized | 1 Comment

Thue-Morse and fair sharing

Matt Parker on “the fairest sharing sequence”, the Thue-Morse sequence. The sequence is

0110 1001 1001 0110 1001 0110 0110 1001…

which is generated as follows:

• invert the sequence, replacing all 0s with 1s and vice versa, and concatenate this to the original sequence

So this gives, in sequential steps: 0, 01, 0110, 01101001, …

It’s an interesting sequence, but why the title?

This is a follow-up to a video in which he proposes a puzzle: partition the numbers 0 through $2^{k+1}-1$ into two groups such that the sum of the 1st, 2nd, …, kth powers of each group are the same. For example, for $k = 3$ we have $0 + 3 + 5 + 6 = 1 + 2 + 4 + 7$ and $0^2 + 3^2 + 5^2 + 6^2 = 1^2 + 2^2 + 4^2 + 7^2$.

The solution, as you may have guessed, is that the numbers on one side of the equality should be the positions of 0s among the first $2^{k+1}$ elements of the Thue-Morse sequence, and the numbers on the other side should be the positions of the 1s, where indexing starts at 0. This fact is proven in Allouche and Shallit’s paper “The ubiquitous Prouhet-􏰀Thue-􏰀Morse sequence” – see section 5.1.

So why is this about fair sharing? Let’s consider the $k = 2$ case again. We have $0^k + 3^k + 5^k + 6^k = 1^k + 2^k + 4^k + 7^k$ for $k = 0, 1, 2$. That means, then, that we have

$f(0) + f(3) + f(5) + f(6) = f(1) + f(2) + f(4) + f(7)$

for any quadratic polynomial $f$. So say that we’re divvying up some goods and I get the 0th, 3rd, 5th, and 6th most valuable item while you get the 1st, 2nd, 4th, and 7th. (We are computer scientists so we start counting from zero.) Then if we can write the value of the $n$th item as a quadratic in $n$, then we end up with the same value.

(How does this generalize to the case where $f$ is not polynomial? That seems relevant – for example what if $f(x) = e^{-x}$ or $1/(x+1)$? In a paper which independently rediscovered the Thue-Morse sequence, Robert Richman did some experiments which suggests that this still holds as long as $f$ isn’t “too far” from polynomial. This got written up in the Guardian as how to pour the perfect cup of coffee. Thue was Norwegian, and I hear Norwegians like coffee, so I think he’d approve.

(Via Metafilter.)

Strings of digits, Mersenne primes

Dave Radcliffe observed that The digits of M(74207281) contain 8929592 distinct seven-digit substrings, and 20021565 distinct eight-digit substrings.. (So if you pick an arbitrary 7-digit string, you have an 89% chance of finding it in this number; if you pick an arbitrary 8-digit string, you have a 20% chance.)

Note that M(74207281) is, as the BBC put it, the largest known prime number, discovered in Missouri; it’s 274207281 – 1. In base 2 it has a very nice expansion as a sequence of 1s – but in base 10, there’s no reason it should.

Does this seem right? In particular, what would we expect for a random string of digits of this length? The number has $\lceil 74207281 \times \log_{10}(2) \rceil = 22338618$ digits – call this $d$, so I don’t have to write it out over and over again. Therefore it contains $d-6$ seven-digit strings.

The probability that it contains some given seven-digit string is approximately $1 - \exp (-d/10^7)$. Any “slot” of seven digits has probability $10^{-7}$ of holding any particular string. If all the slots were independent, then the distribution of the number of different seven-digit strings which appear would be $Binomial(d-6, 10^{-7})$, which is essentially $Poisson(n/10^7)$. But they are not quite independent – some overlap. The extent of the overlap is very small, though, so we needn’t worry, and we proceed with the approximation. This works out to about 0.8929, exactly the proportion that Radcliffe reports. Similarly for eight-digit strings we get a probability of 0.2002.

Both of these are right to four digits – just what we expect from this sort of situation! Think of 8929592, above, as being a realization of $Binomial(n, p)$ with $n = 10^7, p = 0.8929$ – we flip a “biased coin” with probability 0.8929 of success to determine if each string, from 0000000 to 9999999, appears in our random string. Then this has mean $np$ and standard deviation $\sqrt{np(1-p)} \approx 978$ – so we should expect the last three or four digits of any realization to be different from those of the mean. In general, $Binomial(n, p)$ has a standard deviation somewhat less than $\sqrt{n}$ – so in realizations of it we expect half the digits to be “right” (i. e. the same as in $np$) and half to be wrong. (I swear I’ve read before about this rule of thumb of half the digits being right, but this is hard to Google.)

Henry Segerman on using Mobius transformations to edit spherical video.

Numberphile on quaternions.