Probability of lots of twins

Dave Radcliffe asked on Twitter: “A class of 380 students has 16 sets of twins. How likely is this to happen purely by chance?” and he links to this article. The school in question is the Staples High School in Westport, Connecticut.

Let’s assume for the sake of argument that if there are two twins, they will be in the same grade at the same school. Then we may as well treat each pair of twins as a single person for the purpose of school-enrollment purposes, and so we’re asking: out of 364 entities which are either singletons or pairs of twins, what’s the probability of at least 16 twins?

This refers to the class of 2014 at this school, so let’s figure they were born in 1996. This data brief from the CDC gives the rate of twin births in 1996 at about 27 per 1000. But that’s counted per-child. If you count per-pregnancy you should get just over half that; let’s call it 14 per 1000. The probability that a binomial(364, 0.014) random variable is at least 16 is about one in 14,000.

(The data brief, however, points out one interesting fact – Connecticut has the highest rate of twinning. Different states have quite different rates of twinning, which appear to be explained at least partially by different distributions of age and race of mothers giving birth.)

xkcd’s table of approximations

Randall Munroe’s xkcd has a table of slightly wrong equations and identities.

The table gives that the number of seconds in a year is 754 = 31640625 or (using the “RENT method”) 525,600 × 60 = 31536000. (This refers to the song Seasons of Love, which features the number of minutes in a year prominently in its lyrics.) I’ve always been partial to “pi seconds is a nanocentury” (attributed to Tom Duff), which gives π × 107 for the number of seconds in a year, but you have to know π for this to be useful.

Also, while I’m on the subject: it’s easy to remember that the circumference of the Earth is 4 × 107 meters, or 40,000 kilometers; the meter was originally defined to be one ten-millionth of the distance from the North Pole to the equator through Paris. Call this C, or 2πr. The surface area of the Earth is therefore 4πr2 = C2/π, which works out to (1.6/π) × 1014 square meters; may as well let π = 3.2 for the moment and call that 5 × 1013. Munroe gives 698, which I suppose has some amusement value.

Less-than-weekly links for April 24

game theory used on a British game show

Using R to solve a simple numerical puzzle

A calculator that requires an approximate answer before it will give you an exact answer.

The Traveling Salesman movie, premiering in Philadelphia on June 16.

The kaleidoscopic patterns of cathedral ceilings.

From Language Log: longer reviews of wines are correlated with higher ratings (they are!) and whether baboons can tell the difference between English words and non-words (they can, but it looks like that’s because the “non-words” have different letter frequencies).

The average of all fonts

What does the average of all fonts (on one person’s computer) look like?

It turns out that overlaying letters from different fonts on each other doesn’t work too well – you get blurry results, like composite faces. But if you just take evenly-spaced points along the boundary of each letter in each font, and average them together, you get something quite readable.

I’m reminded of Metafont, which is used by TeX to specify fonts; each font is specified by a bunch of parameters, so averaging fonts becomes averaging numbers. There are some nice illustrations of this in the chapter on Metafont in Douglas Hofstadter’s book Metamagical Themas.

(via Hacker News)


Coursera is offering free online classes from Princeton, Stanford, Michigan, and Penn, in a variety of field.

Andrew Ng is offering a course in Machine Learning, which appears to be the same class as the free Machine Learning class that was offered over the winter. I’m not sure exactly how this compares to the Stanford CS229 class but it looks interesting. It starts today and runs 10 weeks.

The pilot classes were Machine Learning, Introduction to Artificial Intelligence and Introduction to Databases (that link goes to a Stanford press release from August 2011), but Coursera is now offering classes fairly widely spread across the curriculum.

Jordan Ellenberg has written a couple interesting blog posts on the future of higher education: What, if anything, is the future of the university? and Several attacks on the previous post; see also this Crooked Timber comment thread. Cathy O’Neill suggests that online learning promotes passivity.

A practical question about non-response bias

At Berkeley student evaluations of courses and instructors are still done on paper forms; we’re supposed to do them in class on a day when attendance is good. This is why I’m not doing them today – it’s Friday, and beautiful weather, and both of those always lower attendance. Although in practice, I tend to do the evaluations on a day when having a slightly shortened class makes sense, as opposed to introducing new ideas at the end of a normal-length class.

But let’s say an instructor is only interested in having the average of their evaluations be as large as possible. Wouldn’t it make sense to do the evaluations on a day when attendance is comparatively low? On a day with high attendance you’re likely to have the marginally interested students there, who would give lower evaluations. I would assume that the students who come every day like the class more.

(This is actually testable, if you could get everyone to fill out the evaluation. You could ask students “how often did you come to class?” and compare their self-reported attendance with their evaluations of the class.)