Are there more dead than alive?

Do the dead outnumber the living?, by Wesley Stephenson. From the BBC via getstats.

Answer: yes, by a factor of about 15 to 1. 108 billion or so people have been born since 50,000 BC; 7 billion are alive today.

This is one of those questions that I’m surprised to even see asked. The BBC article is based on estimates from the Population Reference Bureau; if you trace them back you find they come from an article called How many people have ever lived on earth?, by Carl Haub, the PRB’s chief demographer. As far as I can tell the estimate just comes from simple arithmetic on historical estimates of populations and birth rates. The estimate goes by taking the birth rate times the population for each year from 50,000 BC to the present, and adding those up. That is, however, a lot of arithmetic and requires a lot of guesswork.

Is there some quick way to see that there have been at least fourteen billion human births since the beginning of time, and therefore there are more people dead than alive?

Weekly links – February 12

Sometimes there are things that I think are interesting but I have nothing to say about them. I’m going to compile posts of links to go up weekly.

Peter Rowlett reminds us that mathematicians are people too, and asks who should be allowed to write popular history of mathematics.

Samuel Hansen interviews people behind the Museum of Mathematics. Opening in late 2012 at 11 East 26th Street, Manhattan, not far from the Museum of Sex at 27th and Fifth.

Laura McLay, what is the conditional probability of being struck by lightning?

Steve Lohr, New York Times, The Age of Big Data.

Samuel Arbesman, The diffusion of monasteries; also applies to Christianity as a whole (from Code and Culture).

A spaghetti monster (that does not fly), by Jos Leys.

The birth of the calculus

An old (1986) BBC documentary on The birth of the calculus, presented by Jeremy Gray of the Open University and author of the recent book Plato’s Ghost: The modernist transformation of mathematics. I believe this is part of the course MA290, “Topics in the History of Mathematics”.

Around three minutes in we learn of a pre-Newtonian trick for finding the tangent to a curve — find the osculating circle, draw its radius (which is normal to the curve), and then the tangent is perpendicular to it. This was unfamiliar to me. Much of the history will be familiar to many, but there’s always the novelty of seeing (video of) Newton’s actual papers. As you may have heard, Cambridge has recently digitized many of Newton’s papers.  Though the video is British, it shows some of Leibniz’ papers too, including the one where he introduces the integral sign.

The same youtube channel has videos of a few In our time podcasts from BBC Radio 4, which I cannot account for because they are just audio with a static picture on top, but they may be interesting content. (I haven’t listened.)

(metafilter)

Vi Hart’s phyllotaxis – with googly eyes! and glitter!

You guys know about Vi Hart, right? Check out her videos on “Spirals, Fibonacci, and Being a Plant”:

(And something you don’t see too often on youtube: notes and references, in video form, for the preceding three videos. If you like your phyllotaxis in text form, read “The Mathematical Lives of Plants” by Julie Rehmeyer.

“Are there spirals on other things that start with pine?”

Seriously, this is based on the usual observation that if the angle between leaves is 1/\phi rotations, where \phi=(1+\sqrt{5})/2, then leaves tend to be arranged in such a way that they don’t cover each other up. I actually hadn’t known that Lucas numbers show up in phyllotaxis as well — the relevant angle is 1/(2+\phi) of a circle. And the whole thing comes from the mutual repulsion of the growing leaves.

Less seriously, there are plants with smiley faces. And glitter.

Why do statisticians answer silly questions that no one ever asks?

Via @TimHarford, an article from SignificanCe magazine: Why do statisticians answer silly questions that no one ever asks?, by Matt Briggs. People want to predict the future, and p-values and classical hypothesis testing are not really meant for that. Bayesian methods do better.

(I apologize: the original version of this post had some HTML errors. I’m still getting used to the wordpress software.)

School shouldn’t be an arms race

Stephen Bainbridge, law professor at UCLA, asks an interesting question: should students profit off my classes? (by selling their notes). He writes about his strategy: “I’m going to buy some of these note sets and outlines being sold for my classes. I’ll go through them and find all the mistakes. And then I’ll write exam questions testing on those very same mistakes.”

I can see the appeal of this, from a purely mercenary point of view, but I tend to not worry so much about such things. If the students want to shoot themselves in the foot, let them. I’d rather spend my time helping the students who want to learn than trying to trip up the students who are just looking at school as a hoop to be jumped through.

(And yes, I realize that some people can’t take notes for reasons of disability. This isn’t about them.)

via Hacker News.

An ancient magic trick: il laberinto di Ghisi

There’s a Conjuring Arts Research Center somewhere in midtown Manhattan. This video features Bill Kalush, their director, talking about and showing some highlights of their collection of books on the early History of Magic.

Of mathematical interest: the booklet features three two-page spreads with pictures of sixty saints each. The same sixty pictures are repeated three times. Each spread is divided into four groups of fifteen. You pick a saint, and in each of the three-page spreads you point out which of the four groups your saint is in. “Of course” this works on the following principle: the first pick narrows the number of possible pictures down to fifteen. The pictures are then rearranged so that in the second round, four (or perhaps three) pictures from each group in the first round appear in each group in the second round; that narrows it down to four (or three). In the third round the picture itself is found.

More pictures here (text in Italian). There’s a simulation of the mind-reading trick by Mariano Tomatis, magician and author. Tomatis also refers to a facsimile of the whole (21-spread) book that he’s prepared, and describes the difficulty he had in constructing it so that it would still work – the book is four centuries old, and therefore difficult to read in some places, but of course there is an internal logic to it. He’s also written the books Numeri assassini. Come scoprire con la matematica tutti i misteri del crimine and La magia dei numeri. Come scoprire con la matematica tutti i segreti del paranormale, as well as other books about magic that sound less mathematical.

(via metafilter.)

Are we all descended from Confucius?

Mark Liberman at Language Log asks this question, spurred by a Chinese professor’s claim to be a 73rd-generation descendant of Confucius. His conclusion: well, yeah, but if anyone in China is descended from Confucius (and this is documented), probably everyone in China is. Given a long enough time this would be true with “China” replaced by “the world”, but it probably hasn’t been long enough.

Why are correlations between -1 and 1?

So everyone knows that correlation coefficients are between -1 and 1. The (Pearson) correlation coefficient of x_1, \ldots, x_n and y_1, \ldots, y_n is given by

r = \sum_{i=1}^n {1 \over n}\left( {x_i - \mu_x \over \sigma_x} \right) \left( {y_i - \mu_y \over \sigma_y} \right)

where \mu_x, \mu_y are the means of the x_i and y_i, and \sigma_x, \sigma_y are their (population) standard deviations. Alternatively, after some rearrangement this is

r = {{x_1 y_1 + \cdots + x_n y_n \over n} - \mu_x \mu_y \over \sigma_x \sigma_y}

which is more convenient for calculation, but in my opinion less convenient for understanding. The correlation coefficient will be positive when (x_i-\mu_x)\sigma_x and (y_i-\mu_y)/\sigma_y usually have the same sign — meaning that larger than average values of x go with larger than average values of y — and negative when the signs tend to be mismatched.

But why should this be between -1 and 1? That’s not at all obvious from just looking at the formula. From a very informal survey of the textbooks lying around my office, if a text defines random variables then it gives a proof in terms of them. For example, Pitman, Probability, p. 433, has the following proof (paraphrased): Say X and Y are random variables and X^* = (X-E(X))/SD(X), Y^* = (Y-E(Y))/SD(Y) are their standardizations. First define correlation for random variables as Corr(X,Y) = (E(XY)-E(X)E(Y))/(SD(X)SD(Y)). Simple properties of random variables give Corr(X,Y) = E(X^* Y^*). Then observe that E(X^{*2}) = E(Y^*(2))=1 and look at

0 \le E(X^*-Y^*)^2 = 1+1-2E(X^*Y^*)

and rearrange to get that E(X^* Y^*) \le 1. Similarly looking at X^*+Y^* gives E(X^* Y^*) \ge -1. Finally, the correlation of a data set is just the correlation of the corresponding random variables.

This is all well and good if you’re introducing random variables. But one of the texts I’m teaching from this semester (Freedman, Pisani, and Purves, Statistics) doesn’t, and the other (Moore, McCabe, and Craig, Introduction to the Practice of Statistics) introduces the correlation for sets of bivariate data before it introduces random variables. These texts just baldly state that r is between -1 and 1 always — but of course some students ask why.

The inequality we’re talking about is an inequality involving sums of products: it’s really Cov(X,Y) \le SD(X) SD(Y). And that reminded me of the Cauchy-Schwarz inequality — but how to prove Cauchy-Schwarz for people who haven’t taken linear algebra? Wikipedia comes to the rescue. We only need the special case in \mathbb{R}^n, in which case Cauchy-Schwarz reduces to

\left( \sum_{i=1}^n u_i v_i \right)^2 \le \left( \sum_{i=1}^n u_i^2 \right) \left( \sum_{i=1}^n v_i^2 \right)

for any real numbers u_1, u_2, \ldots, u_n, v_1, v_2, \ldots, v_n. And the proof at Wikipedia is simple: look at the polynomial (in z)

(u_1 z + v_1)^2 + (u_2 z + v_2)^2 + \cdots + (u_n z + v_n)^2.

This is a quadratic. As a sum of squares of real numbers it’s nonnegative, so it has at most one real root. So its discriminant is nonpositive. But we can write it as

(u_1^2 + \cdots + u_n^2) z^2 + 2(u_1 v_1 + \cdots + u_n v_n) z + (v_1^2 + \cdots + v_n^2)

and so its discriminant is

4(u_1 v_1 + \cdots + u_n v_n)^2 - 4 (u_1^2 + \cdots + u_n^2) (v_1^2 + \cdots + v_n^2)

and this being nonpositive is exactly the form of Cauchy-Schwarz we needed.

To show that this implies the correlation coefficient being in [-1, 1]: let’s say we have the data (x_1, y_1), \ldots, (x_n, y_n) and we’d like to compute the correlation between the x_i and the y_j. The correlation doesn’t change under linear transformations of the data. So let $u_i$ be standardizations of the $x_i$ and let $v_j$ be standardizations of the $y_j$. Then we want the correlation in (u_1, v_1), \ldots, (u_n, v_n). But this is just

{u_1 v_1 + \cdots + u_n v_n \over n}.

By Cauchy-Schwarz we know that

(u_1 v_1 + \cdots + u_n v_n)^2 \le (u_1^2 + \cdots + u_n^2) (v_1^2 + \cdots + v_n^2)

and the right-hand side is n^2, since (u_1^2 + \cdots + u_n^2)/n is the standard deviation of the u_i, and similarly for the other factor. Therefore

(u_1 v_1 + \cdots + u_n v_n)^2 \le n^2

and dividing through by n^2 gives that the square of the correlation is bounded above by $1$, which is what we wanted.

So now I have something to tell my students other than “you need to know about random variables”, which is always nice. Not that it would kill them to know about random variables. But I’m finding that intro stat courses are full of these black boxes that some students will accept and some want to open.