Links for November 30

Stian Haklev on Starting data analysis/wrangling with R: Things I wish I’d been told.

Zulko on Things you can do with Python and POV-Ray (a raytracing engine).

Sanjoy Mahajan has a new book out, The Art of Insight in Science and Engineering” Mastering Complexity. This looks like it could be thought of a more science- or engineering-themed sequel to his Street-Fighting Mathematics.

Nigel Stanford’s Cymatics “is the science of visualizing audio frequencies.”

Does teaching gamblers about probability make them less likely to gamble? (Yes.)

FiveThirtyEight has a nerd holiday gift guide.

From Katie Steckles at the Aperiodical, Twitter’s favorite fictional mathematicians.

Parents strongly cautioned for advanced math

From A. O. Scott’s review of The Imitation Game, the new Alan Turing biopic:

“The Imitation Game” is rated PG-13 (Parents strongly cautioned). Illicit sex, cataclysmic violence and advanced math, most of it mentioned rather than shown.

First, advanced math is reason for caution? Um, no.

Second, what’s the difference between “mentioning” and “showing” mathematics anyway? This seems to be connected to the philosophical use-mention distinction, Godel numbering, and so on, but I doubt that Scott is trying to make a metamathematical point. This may just mean that people in the movie don’t sit around scribbling on pieces of paper but rather talk about the fact that they have done so, which is good, because I’ve watched people do math and there’s really nothing to see.

Thanksgiving power law

In honor of Thanksgiving: take a look at Niall MacKay’s paper Of bombs and boats and mice and men: a random tour through some scaling laws.

I mention this because one of the scaling laws there is the time it takes to roast a turkey – you can work out from dimensional analysis that roasting time is proportional to the two-thirds power of mass. Specifically, the thermal diffusivity κ of meat (or any other substance) – that’s the constant in the heat equation – has dimensions of length squared over time. Since the only relevant parameter is the length of the turkey l, roasting time must be proportional to l2/κ. But assuming a spherical turkey, length is proportional to the cube root of mass, giving the two-thirds law.

Happy Thanksgiving! And for my readers abroad, happy Thursday.

Links for November 23

Mona Chalabi and Andrew Flowers figure out the most common name in America, using some data by Lee Hartman to correct for the fact that first name and last name are not independent, but tend to cluster based on ethnicity. A fun fact: apparently people avoid alliteration. Does this extend to martial name-changing? For example, say my wife’s first name starts with an L, as does my last name; does that make her less likely to have changed her name upon marriage? She didn’t, but I can’t attribute that to the alliteration.

From Jawbone via the Atlantic, people don’t exercise when it’s cold. (A bit less obvious in the data is that they don’t exercise when it’s hot, either.)

David Mimno at Cornell has a fun word similarity tool based on a corpus of pre-1923 books.

Michael Harris at Slate writes about the Breakthrough Prize, which are trying to be the “Oscars of science”. He’s got a book coming out called Mathematics Without Apologies, which is at least the second in a line of titles riffing on A Mathematician’s Apology, after The Unapologetic Mathematician. (And don’t forget Second-Rate Minds (expositors, according to Hardy).

Tim Hersterberg at Google has written on What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum

From Todd W. Schneider (who’s on fire lately): the Reddit front page is not a meritocracy. (Disclaimer: Schneider, or someone with his same name and similar demographic history, was a year behind me in high school.)

Christian Perfect linked to a short documentary, Logically Policed by Damiano Petrucci, about mathematicians.

Rob Hyndman on visualization of probabilistic forecasts.

Mark Jason Dominus on teaching his daughter algebra.

Emilia Petrisor writes (with code – an iPython notebook) on domain coloring as a way of visualizing complex-valued functions

Mike of mikemathspage and his kids made some videos on the Collatz conjecture and a variation on it due to Conway. (The Conway variation is in his paper “On Unsettleable Arithmetic Problems”.)

From Pip ( = Dick Lipton + Ken Regan) at Gödel’s Lost Letter and P = NP: two vs. three. Why is two so special?

Rasmus Bååth and Christian Robert wrote a A Brief Review of All Comic Books Teaching Statistics in Chance.

There’s a second edition of Skiena’s Algorithm Design Manual.

A temperature puzzle

Last Friday morning (the 14th) it was unseasonably cold in Atlanta. On the way to work I noticed my car’s display giving the following temperature readings: 28, 30, 32, 34, 36 – all in Fahrenheit. It somehow knew to avoid the odd numbers.

But in the evening of the same day it read 39, 41, and 43. Now it knew how to avoid the even numbers.

I watched a little more closely for a while – on a trip on Sunday morning it read 46, and then 48. On Monday morning, 50, 52, and 54. Even again.

A puzzle for readers: what’s going on? (I know the answer.)

Links for November 16

When you are listening to corn pop, are you listening to the Central Limit Theorem?

12 Scientific Sculptures: Intangible Data in Physical Form

Kokichi Sugihara uses computation to make three-dimensional illusion

Markov Chains vs Simulation: Flipping a Million Little Coins.

Brian Hayes on lopsided precinct-by-precinct voting results.

Simon Singh on Homer’s Last Theorem. I’m currently enjoying his book The Simpsons and Their Mathematical Secrets.
how the Google priority inbox works

Terry Tao announces his winter analytic prime number theory course, the announcement doubling as an introduction to why one might be interested in such a subject and the techniques used in it.

Spliddit is a new website by Ariel Procaccia and Joanthan Goldman which implements algorithms for splitting rent, dividing goods, and sharing credit. They claim a twofold mission: “To provide easy access to carefully designed fair division methods, thereby making the world a bit fairer. To communicate to the public the beauty and value of theoretical research in computer science, mathematics, and economics, from an unusual perspective.”

From Numberphile, how math goes into making Pixar’s movies.

Random sums of sines and random walks

John Cook, at his Probability Fact twitter feed (@ProbFact), asked (I’ve cleaned up the notation):

What is the expected amplitude for the sum of N sines with random phase? i.e. sum of \sin(x + \phi_i) where \phi_i ~ uniform[0, 2\pi]

Intuitively one expects something on the order of sqrt{N}, since we’re adding together what are essentially N independent random variables. It’s not too hard to throw together a quick simulation, without even bothering with any trigonometry, and this was my first impulse. This code just picks the \phi_i uniformly at random, and takes the maximum of f(x) = \sum_i \sin(x + \phi_i) for values of x which are multiples of \pi/100.

x = (0:200)/(2*Pi)
n = 1:100
num.samples = 100


max.of.sines = function(phi){
max(rowSums(outer(x, phi, function(x,y){sin(x+y)})))
}


mean.of.max = function(n, k){mean(replicate(k, max.of.sines(runif(n, 0, 2*pi))))}
averages = sapply(n, function(n){mean.of.max(n, num.samples)})

This is a bit tricky: in the matrix in max.of.sines, output by outer, each column gives the values of a single sine function \sin(x + \phi_i), and rowSums adds them together.

We can then plot the resulting averages and fit a model y^2 \sim Cx. I get C \approx 0.7872 from my simulation, which is close enough to pi/4 to ring a bell:


C = lm(averages^2~n+0)$coefficients


qplot(n, averages, xlab="n", ylab="mean", main="means for 100 samples") +
stat_function(fun = function(x){sqrt(C*x)})

sample-means

At this point we start thinking theory. If you’re me and you haven’t looked at a trig function in a while, you start at the wikipedia page, and discover that it actually does all the work for you:

\sum_i a_i \sin (x + \delta_i) = a \sin (x+\delta)

where

a^2 = \sum_{i,j} a_i a_j \cos (\delta_i - \delta_j).

That is, the sum of a bunch of sinusoids with a period is a single sinusoid with the same period, and an amplitude easily calculated from the amplitudes and phases of the original sinusoids. There’s a formula for \delta as well, but it’s not relevant here.
In our case all the a_i are 1 and so we get

a^2 = \sum_{i, j} \cos (\delta_i - \delta_j)

If you take the expectation of both sides, and recognize that E \cos (\delta_i - \delta_j) is 1 if i = j (it’s cos(0)) and 0 if i \not = j (just the average of the cosine function), then you learn E(a^2) = N where N is the number of summands. That agrees with our original guess, and is enough to prove that E(a) \le \sqrt{N} by Jensen’s inequality.

To get the exact value of E(a) we can expand on David Radcliffe’s comment: “Same as mean dist from origin after N unit steps in random directions. Agree with sqrt(N*pi/4)”. In particular, consider a random walk in the complex plane, where the steps are given by exp(i \theta_j) where \theta_j is uniform on the interval [0, 2\pi). We can work out that its sum after N steps is

S_N = \sum_{j=1}^N exp(i \theta_j) = \sum_{j=1}^N (\cos \theta_j + i \sin \theta_j)

and so, breaking up into the real and imaginary components,

|S_N|^2 = \left( \sum_{j=1}^N \cos \theta_j \right)^2 + \left( \sum_{j=1}^n \sin \theta_j \right)^2.

Rewriting the squared sums as double sums gives

|S_N|^2 = \left( \sum_{j=1}^N \sum_{k=1}^N \cos \theta_j \cos \theta_k \right) + \left( \sum_{j=1}^N \sum_{k=1}^N \sin \theta_j \sin \theta_k \right)

and combining the double sums gives
|S_N|^2 = \sum_{j=1}^n \sum_{k=1}^N (\cos \theta_j \cos \theta_k - \sin \theta_j \sin \theta_k)

and by the formula for the cosine of a difference we get

|S_N|^2 = \sum_{j=1}^n \sum_{k=1}^N \cos (\theta_j - \theta_k)

which is exactly the a^2 given above. So the amplitude of our sum of cosines is just the distance from the origin in a two-dimensional random walk!

It just remains to show that the expected distance from the origin of the random walk with unit steps in random directions after N steps is \sqrt{\pi N/4}. A good heuristic demonstration is as follows: clearly the distribution of the position (X, Y) is rotationally invariant, i. e. symmetric around the origin. The position X is the sum of N independent variables each of which is distributed like the cosine of a uniformly chosen angle; that is, it has mean \mu = {1 \over 2\pi} \int_0^{2\pi} \cos \theta \: d\theta = 0 and variance \sigma^2 = {1 \over 2\pi} \int_0^{2\pi} \cos^2 \theta \: d \theta - \mu^2 = 1/2. So the X-coordinate after N steps is approximately normally distributed with variance N/2. The overall distribution, being rotationally symmetric with normal marginals, ought to be approximately jointly normal with X and Y both having mean 0, variance N/2, and uncorrelated; then sqrt{X^2 + Y^2} is known to be Rayleigh-distributed, which finishes the proof modulo that one nasty fact.

From the “amusing recommendation engine” files: pe

People who bought Alexandre Grothendieck: A Mathematical Portrait from Amazon also bought some nice-looking Japanese chalk which I’ve seen referred to as dream chalk and praised on MathOverflow. (And a bunch of math books, but you probably expected that.) Mathematicians, as you may know, are one of the last populations on earth to have a fondness for chalk, although I found this ecologist praising chalk.

When I left academia, I remember thinking of it as “putting down the chalk”. I’m not looking to go back to teaching, and I don’t miss having random dust on all my clothes, but I kind of miss it. In the “real world” we use whiteboards, which are superior in some ways — no dust!. But they’re inferior in other ways — their markers always seem to run out of ink and people don’t bother to throw them out, so any work at a whiteboard begins with searching the office for the one set of whiteboards that works.

Actually in the “real world” we mostly sit at desks and use computers. It’s difficult to get real thinking done this way; when I need to think up new ideas, as opposed to writing the code that implements them or communicating the results, I get better work done with a pen and paper. I wonder if this is an artifact of my age – 30, which is just old enough to remember the time without computers, as observed in The End of Absence by Michael Harris. (He puts it as “people born before 1985”. I’d rephrase this as “1984 or before” because of the symbolic significance of 1984.)

And to marry recommendation engines and surveillance: in the spring Alexis Madrigal pointed out that Amazon’s recommendation engine had created a guide to dealing drugs.

Jeopardy! link roundup

The Jeopardy! Tournament of Champions begins today. Here’s a quick roundup of some Jeopardy!-related links I’ve come across recently:

From Gawker, How a geek cracked the Jeopardy! code. The “geek” in question is Roger Craig, who built a tool to help him study for the show; it seems to have helped, as he won six games, and set the one-day record of $77,000.

Slate had a roundup of some of the most common categories which is pretty interesting. This came out around the time of Watson.

In fact, IBM’s Journal of Research and Development had a whole issue on Watson.

And there’s Eric Feder’s model for the probability of winning at a game of Jeopardy! (based on the state within a single game) and a forum thread discussing it.

The data from the Jeopardy archive has been scraped as well if you want to do some of your own analysis.

The Final Wager analyzes the game theory of Jeopardy! wagers.

What I’m curious about right now are the odds of winning today, given that you won yesterday – and more generally, building a model of player strength, in order to predict things like the odds of winning a tournament.  In this year’s tournament there are two that stand out above the rest: Julia Collins (who won 20 games) and Arthur Chu (who won 11). They’re very different players – here’s a profile in the Wall Street Journal.  SB Nation wants them to have an Ultimate Jeopardy Battle of Fire and Lasers. Can I get odds on that battle happening, or on its winner? It seems quite difficult to do this accurately, though – most ranking problems depend on the fact that players will play each other repeatedly, but the Jeopardy! format, in which once you’re gone, you never come back (except for possibly in a tournament) make the analysis more uncertain.

(Bayes should help.)