Niue standard time

Seen on the web page of Fun with Algorithms: “A full paper should be submitted by January 23, 2012 , 11:59 pm, Alofi, Niue, standard time.”

The conference is in Venice, Italy, and it’s organized by Italians, so why Niue?

Alofi is the capital of Niue, an island country in the South Pacific, where the time is 11 hours behind UTC. (Although Niue is in free association with New Zealand and uses New Zealand currency, it is not the same day in Niue as in New Zealand; New Zealand is 12 hours ahead of GMT, or 23 hours ahead of Niue.) This is the time zone furthest behind UTC that actually contains any population; there’s a UTC-12, but it only contains some uninhabited islands. So presumably the choice was made so that if people acted as if the deadline was January 23, 2012, 11:59 pm, their local time, they would still get their papers in on time.

American Samoa is in the same time zone; the independent county of Samoa is UTC+13, one day later. (They were UTC-11 until they skipped December 30, 2011; this moved them from being three hours behind the US west coast to being three hours ahead of the Australian east coast.) I can’t find any uses of “Samoa time” being used the same way.

There is at least one other use of “Niue standard time” in a non-Niuean context.

Via David Eppstein. I have nothing to do with this conference, other than that I support fun.

All the president’s men

All the presidents of the United States, except Martin van Buren, are descended from King John of England — as shown by BridgeAnne d’Avignon, a seventh-grade student in Watsonville, California. You can buy a poster at we are all related. Via quora and metafilter.

This is less impressive than it seems as first glance. (But still impressive; it’s easy for me to sit here and criticize.) All but van Buren and Eisenhower are at least partially of British descent. That’s actually a larger number than I expected, and implies some combination of:

  • people of British descent are more likely to be president, or
  • a very large proportion of Americans are at least partially of British descent.

On the other hand, as I mentioned a couple weeks ago, if we go far enough back everyone has a common ancestor. I wouldn’t be surprised if someone starts digging back through van Buren’s family tree and digs this up.

Mathematicians go drinking – a puzzle

Here’s a puzzle: r people are in a bar, sitting around a round table. A certain person pays for the first round of drinks. The person to their right pays for the second round. The person two people to the right of that person pays for the third round. In general, whoever paid for the nth round, the person n people to their right pays for the (n+1)st round.

Question the first: For a given value of r, is there somebody who never pays for any drinks?

Question the second: How many somebodies are there?

(As of this writing I know the answer to the first of these questions but not the second. Metasolvers should feel free to use this information.)

For concreteness, say there are five people, named Alice, Bob, Carol, Dave, and Eve. <b>Alice</b> pays for the first round. The person to her right, <b>Bob</b>, pays for the second round. Carol is skipped and <b>Dave</b> pays for the third round. Eve and Alice are skipped and <b>Bob</b> pays for the fourth round. Carol, Dave, and Eve are skipped and <b>Alice</b> pays for the fifth round. And so on. So Alice pays for two rounds out of every five; so does Bob. Dave pays for one round out of every five. Carol and Eve never pay.

Unfortunately my friends know I’m a mathematician, and this sounds enough like a math problem that they’d be suspicious if I suggested this as a paying-for-drinks scheme. That’s probably for the best.

Weekly links: February 19

Phil Keenan, at Meandering through Mathematics, explores how the logarithm is calculated numerically. Interestingly, calculating it as the inverse of the exponential seems to be more efficient than using the Taylor series.

Hopefully this one doesn’t remind anybody of a traumatic experience they had five days ago: XKCD on the Valentine dilemma. On a related note, you might be interested in William Spaniel’s Game Theory 101 videos or the free online Stanford game theory course by Matthew Jackson and Yoav Shoham (economics and CS, respectively) that’s starting soon.

Cathy O’Neil compares the recent Elsevier journal boycott to the Occupy movement.

The collapse of the Soviet Union and the productivity of American mathematicians, an NBER working paper by George J. Borjas and Kirk B. Doran. The basic claim is that when Soviet mathematicians were able to emigrate to the US in the 1990s, this pushed aside mathematicians already in the US rather than “growing the mathematical pie”.

Niles Johnson has a video on visualizing seven-manifolds.

Cary and Michael Huang have an interactive thing-to-play-with on the scale of the universe. This is even more impressive when I find out they’re half my age.

Birth timing and Valentine’s Day. (Also Halloween.)

At MathOverflow: how to motivate and present ε-δ proofs to undergraduates.

From the New York Times magazine: How companies learn your secrets.

John Nash’s letter to the NSA (via Turing’s Invisible Hand and hacker news)

John Baez has a series of posts on the roots of polynomials having integer coefficients, titled “The Beauty of Roots”: part one, part two, part three, easy version of slides from a talk, similar slides with some proofs. (Joint work between Baez, Dan Christensen, and Sam Derbyshire.)

Zachary Abel

Zachary Abel is a Ph. D. student in math at MIT, which would predispose me to like him because I was an undergrad there N years ago. You might remember him from the documentary Hard Problems, about the 2006 International Mathematical Olympiad. According to Amazon the actors in that movie are named “this is documentary” and “so there are no actors”. The documentary is by George Csicsery, also the director of N is a Number: A Portrair ot Paul Erdos and Julia Robinson and Hilbert’s Tenth Problems.

Abel has made some awesome mathematical scupltures from office supplies. On Thursday night I was at Noisebridge (a hacker space in San Francisco which reminds me of nothing so much as my college dorm) and at some point I Was playing with some zometools and I couldn’t figure out how to make an octahedron out of them. If you were there, I was the one cursing. Zachary Abel, on the other hand, makes much more awesome things, and he wants you to think he made them out of stuff he had lying around. Although binder clips are in short enough supply chez nous that I’m pretty sure I don’t have 132 of them. The pictures of the sculptures available on Abel’s web site come with brief explanations as well. He also has an interesting math blog.

There’s apparently an actor of the same name but my filter bubble seems to work in such a way that I haven’t heard of him, and the mathematician comes up first in Google for me.

The waiting room principle

An issue of sampling bias: doctors think people are sicker than they actually are, because most of the patients they see are sick. This is also known as the “clinician’s illusion”. Say you’re a doctor, and you have six “well” patients that you see once a year and one “sick” patient that you see once a month. Then out of your eighteen annual appointments, two-thirds will be with “sick” patients even though only one-seventh of your patients are sick! Patricia and Jacob Cohen wrote about this in 1984; here’s a recent explanation from a web site about addiction.

This is also a hazard of teaching: and perhaps related to the “80-20” rule: you spend 80 percent of your time dealing with 20 percent of the students. Certainly grading feels this way to me, since I teach classes where the questions generally have “right” answers, and I give partial credit – my impression of the average student is probably less favorable than the actual average student. The ones who get things right are easy to grade, so they take up less time than the ones who get things wrong, since I have to read their wrong answers carefully to decide where they went wrong. Perhaps grading feels less like this on more open-ended assignments. I’d be interested to hear.

Another educational example is that by simply making all classes at an institution the same size, one can reduce the average class size experienced by students without actually having to hire more faculty. Say your institution has one class of thirty students and one of sixty. Then if you pick a student uniformly at random, one-third will say “there are thirty students in my class” and two-thirds will say “there are sixty students in my class”, for an average of (1/3)(30)+(2/3)(60)=50. If you rebalance the classes to have forty-five students in each class, then the average class size experienced by students is 45. (The average class size experienced by students, by the way, is always greater than or equal to the average class size experienced by instructors, with equality if and only if all classes are the same size.)

Finally: how many children, including you, did your mother have? The average, averaging over women, is somewhere around 2 in the US. But say that the proportion of women having i children is p_i. Then if there are N total mothers, there are i p_i N people in families of i children. The total number of children is (p_1 + 2p_2 + \cdots) N. So the average family size, averaging over children, is

{\sum_{i=1}^\infty i (i p_i N) \over \sum_{i=1}^\infty i p_i N} = {\sum i^2 p_i \over \sum i p_i}.

Let \mu be the mean number of children per woman, and let \sigma^2 be the variance; then this is {\mu^2 + \sigma^2 \over \mu}, or \mu + (\sigma^2/\mu). Again, the mean as experienced by the children is substantially larger than the mean as experienced by the parents.

Yesterday’s coincidence

Yesterday I was preparing a lecture where I would need to compute means and standard deviations. I happened to pick five integers x_1, \ldots, x_5 such that their mean and the mean of their squares are both integers. (Alternatively, their sum and the sum of their squares were both multiples of 5.) What are the chances of that?

Incidentally, the integers I chose were 1, 2, 4, 5, 8, so their sum is 1+2+4+5+8 = 20 and their sum-of-squares is 1^2+2^2+4^2+5^2+8^2 = 110, both divisible by 5.

Of course we need some sort of model here. Let’s assume that however we pick integers, they’re uniformly distributed modulo 5 — that is, we have probability 1/5 of choosing a multiple of 5, 1/5 of choosing a number that’s one more than a multiple of 5, and so on. Then the obvious guess is as follows:
(1) the probability that the sum of five randomly chosen integers is divisible by 5 is one-fifth;
(2) the probability that the sum of their squares is divisible by 5 is also one-fifth;
(3) these two events are independent.
As it turns out all of these are true. It’s easy to see that (1) is true. What’s the probability that the sum of n randomly chosen integers is divisible by five? Consider the sum of the first n-1 randomly chosen integers. Whatever this sum is, the probability that we’ll pick an integer that makes the sum of the first n a multiple of five is one-fifth.

However this argument doesn’t work for (2). Look at the squares of the positive integers: they’re 1, 4, 9, 16, 25, and so on. Dividing them by 5 and taking remainders we get 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, \ldots. In fact one-fifth of the squares are divisible by 5; two-fifths are one more than a multiple of five; two-fifths are one less than a multiple of five.

So how could a sum of five squares work out to be a multiple of five? It could consist of:
(A) five multiples of five, like 5^2 + 10^2 + 15^2 + 20^2 + 25^2. This happens with probability 1/5^5.
(B) five squares which are one more than a multiple of 5, like 1^2 + 4^2 + 6^2 + 9^2 + 11^2. This happens with probability (2/5)^5 = 32/5^5.
(C) five squares which are one less than a multiple of 5 — which also happens with probability 32/5^5.
(D) one square which is one more than a multiple of five; one which is one less than a multiple of five; and three multiples of 5, like 1^2 + 2^2 + 5^2 + 10^2 + 15^2 = 355. The probability of this is the number of ways of choosing which of our five integers is one more than a multiple of 5 and which is one less — that’s 5 \times 4 = 20 — times the probability of getting a square that’s one more than a multiple of five, then one that’s one less, then three multiples of 5, which is $(2/5)(2/5)(1/5)^3 = 4/5^5$, for a total of $80/5^5$.
(E) two squares which are one more than a multiple of five; two which are one less; one multiple of five, like 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 55. The probability of this is {5 \times 4 \over 2} {3 \times 2 \over 2} (2/5)^4 (1/5) = 480/5^5.

These add up to (1+32+32+80+480)/5^5 = 625/5^5 = 5^4/5^5, or exactly one-fifth! Is there a simple argument for this?

Finally, how can we have a set of integers where both the sum and the sum of squares is a multiple of 5? Working through the cases, the possibilities are either that all five integers have different remainders when dividing by 5, which happens with probability 5!/5^5 = 120/5^5, or that all five have the same remainder, which happens with probability 5/5^5. And these add up to… (120+5)/5^5 = 125/5^5 = 5^3.

As it turns out, though, the sum of five random integers mod 5 and the sum of their squares mod 5 are not independent. Twenty-one of the possible twenty-five outcomes (sum, sum of squares) have probability 1/25. However, (0,1) and (0,4) both have probability $6/125$, and (0,2) and (0,3) both have probability $4/125$. I have no idea why things work out this way, as this is just the result of a brute-force calculation.

Street addresses on the Oakland/Berkeley border

San Francisco colored by address numbers, via cartophile. Yellow means lower house numbers, blue means higher numbers, as far as I can tell; the highest numbers are on very long streets like Mission and Geary. Parts of San Francisco have nice orderly grids of streets and street numbers that reflect that; others less so. I especially get confused in the Mission; 16th and Mission, for example, is 2000 Mission, while 16th and Valencia is 500 Valencia. Since there are numbered streets I want the house numbers to line up with the street numbers — I want both to be 1600. (For the most part Market and the streets running parallel to and south of it, like Mission, are off by four hundred from where I expect them to be, consistently. I can live with that, I can remember the offset.)

I live near the Oakland/Berkeley border. Oakland also has numbered streets, and at least in the flatlands the street numbers reflect the numbering — so 4799 Telegraph is between 47th and 48th Street, on the corner of 48th. Berkeley has very few numbered streets (and I don’t spend much time in the part of town where they are) but generally has a grid of streets and the numbering of houses seems to respect them — the address 2100 on an east-west street will be at Shattuck Avenue (which runs north-south), for example, or the addresss 2500 on a north-south street will be at Dwight Way (which runs east-west). So in both cases the analogue of this picture would look like the western parts of San Francisco. But it can get a little confusing along the border — In Oakland numbers increase as you head north and west, and even numbers are on the north and east sides of streets. In Berkeley they increase as you head south and east, and even numbers are on the south and west side of streets. Notice that there are four independent parameters here and the two cities clash on all four, which almost seems like a conscious decision.

Two very nice coordinate systems. Both have been useful in helping me navigate. But they don’t mesh.

This is particularly confusing if you walk the length of Alcatraz Avenue. Those not familiar with the Oakland-Berkeley borderlands are thinking “wait! isn’t Alcatraz on an island in the bay?” Those people are right, but if you extend the street into the bay it comes reasonably close to hitting Alcatraz Island. Alcatraz Avenue runs east-west for two miles roughly along the Oakland-Berkeley border, which it crosses three times. This wreaks havoc on the street numbering.

From Claremont Avenue to just west of College Avenue it’s in Berkeley; numbers decrease heading westbound, as they do in Berkeley, from the mid-2700s to the mid-2600s.

Then suddenly we’re in Oakland. On the south side of College the numbers drop from 2636 (Berkeley), 317, 319, 321 (Oakland). Alcatraz Avenue remains in Oakland until Dover Street, the next intersection west of Shattuck); for example on the north side of the street numbers go 770, 774, 778, 784 (last house in Oakland), 1917 (first house in Berkeley), 1909, 1901, and decreasing. (This is all from poking around maps on Zillow, the real estate site, since they show every lot.)

Numbers reach 1269 just west of Idaho Street; then they jump back to the Oakland numbering scheme as the street crosses the border again, starting at 1046 and increasing westbound – to just under 1100 at San Pablo Avenue, where the street ends.

So the numbers go down, then up, then down, then up; the ranges of numbers that exist are roughly 310-790 and 1040-1100 (both in Oakland) and 1260-1920 and 2630-2750 (both in Berkeley). It’s lucky that there’s no duplication, at least, because people don’t generally seem too aware of where the municipal border is, despite the “Nuclear Free Zone” signs that Berkeley posts. (Oakland’s nuclear-free, too, but they’re quieter about it.) I live in Oakland, if I remember correctly in the sixth house over the border. My landlord says the house is in Berkeley, presumably to get more rent.

Some statistics on complexity of numbers

The sequence A025280 in the Encyclopedia of Integer Sequences gives the “complexity of n: number of 1’s required to build n using [addition, multiplication, and exponentiation]”. Let’s call this the “exponential complexity” and denote it by c_e(n).

I’ll abbreviate sums of 1s by the numbers they sum to. So the complexity of 4 is 4 (you can write it as 1+1+1+1, which I’ll abbreviate as 4, or (1+1) \times (1+1), which I’ll abbreviate as $2+2$. But the complexity of 6 is 5 (it’s 2 \times 3) and so the complexity of 7 is 6 (it’s (2 \times 3) + 1). 36 has complexity 7, since it’s (2 \times 3)^2.

There are some variants of it: A005245 doesn’t allow exponentiation, and A091334 allows subtraction. (This is a practical difference: 15 = 3 \times 5, and 15 has complexity 8 if subtraction isn’t allowed, but 15 = 4^2 - 1 gives a complexity of 7 if subtraction is allowed. Not allowing subtraction makes computation easier, though — by only allowing addition, multiplication, and exponentiation we compute everything in terms of small numbers.  Call the version in terms of addition and multiplication only c_m(n), for “multiplicative complexity”.

R. K. Guy, in Some suspiciously simple sequences, suggests that the complexity of n is bounded above by some mutliple of log n.

Can we get some asymptotic trend? The following code in R produces the multiplicative and exponential complexities of the integers from 1 up to 10000. (R is probably not the ideal language for this, since it’s essentially a number-theoretic problem, but I want to make boxplots later.)

n = 10^4;
commul=rep(0,n); commul[1]=1; commul[2]=2;
 for(i in 2:n){m=i;
 m = min(m, commul[1:floor(i/2)] + commul[(i - 1):(i - floor(i/2))]);
 for(j in 2:sqrt(i)) {
 if (i%%j == 0) {m = min(m,commul[j]+commul[i/j]) };
comexp=rep(0,n); comexp[1]=1; comexp[2]=2;
 for(i in 3:n){m=i;
 m = min(m, comexp[1:floor(i/2)] + comexp[(i - 1):(i - floor(i/2))]);
 for(j in 2:sqrt(i)) {
 if (i%%j == 0) {m = min(m,comexp[j]+comexp[i/j]) };
 for(j in 2:log(i,2)) {
 base = log(i,j);
 if (abs(base-round(base)) comexp[i]=m}

We know the complexity of an integer i is bounded above by i. First we run over the pairs (1, i-1), (2, i-2), \ldots, (\lfloor i/2 \rfloor, \lceil i/2 \rceil) and compute the sums of their complexities; then we find factors of i by trial division and do the same for products; then we check to see if i is a perfect square, cube, and so on up to log_2 ith power.

So we end up with the multiplicative complexity and the exponential complexity of each integer. And we can plot them:

(Multiplicative complexity is in black; exponential complexity is in red.)

Indeed they both seem to grow logarithmically. As you’d expect exponential complexities are smaller, and there are some quite noticeable downward spikes at large powers: for example the exponential complexity of 4096 is 9 (it’s 4^(2 \times 3)) while its multiplicative complexity is 24 (as 4 \times 4 \times 4 \times 4 \times 4 \times 4). If you divide out the logarithms — that is, plotting n against c_m(n)/\log n or c_e(n)/\log n – all the black points lie between two horizontal line (around 2.8 and 3.6), and all the red points appear to lie below some horizontal line as well. There can’t be a horizontal line that all the red points lie above, other than the x-axis — if n has multiplicative complexity bounded above by c log n, then 2^n will have exponential complexity bounded above by 2 + (c \log n). But powers and numbers very near them will be sparse, so perhaps almost all integers have exponential complexity above some logarithmic bound.

We can then make a boxplot for the numbers c_m(2)/log(2), c_m(3)/log(3), \ldots, c_m(1000)/log(1000); one for c_m(1001)/log(1001), \ldots, c_m(2000)/log(2000); and so on, in groups of 1000. We can do the same for the exponential complexities.

Is there some limiting distribution? Can we say, for example, that in the limit half of integers have multiplicative complexity less than 3.1 times their natural logarithm?  I don’t know.

A webcomic of romance and math (and some other things)

Some of you may know the webcomic xkcd. I’ve compiled a (possibly incomplete) list of probability and statistics-related comics from xkcd. A lot of these seem to also have something to do with love, which is appropriate for today; but that may not be more than you’d expect from a random sample. xkcd calls itself “A webcomic of romance, sarcasm, math, and language.”

Data mining with jellybeans. I put a story adapted from this on a final exam last year, and for zero points extra credit asked my students where I borrowed the story from. None of them knew.

Sustainable extrapolation (although shouldn’t this be fit to a logistic curve, not an exponential?)

But not even a better curve-fitting model will save our intrepid hero in extrapolating number of husbands.

How do we really know that statistics courses teach that correlation does not imply causation? Maybe the sort of people who take statistics courses just know that anyway.

Boxplots can be used to identify your statistically significant other. (See also Language Log.) If you don’t have one, you can take part in statistical voyeurism — someone near you is having sex. (And in Paris, Amélie Poulain tells us that fifteen couples are having an orgasm right now.)

What’s the conditional probability of dying by lightning strike, given that you stay out in a storm?

Sports as weighted random number generators should be required reading for all sports announcers. (Sometimes I have to watch baseball games muted.)

Probability used to be my favorite branch of math because it had so many real-life applications. (But see Stephen Jay Gould, The median isn’t the message.

Schools are still teaching the null hypothesis.

My normal approach is useless here. (Not really about probability or statistics, but this was posted on a bulletin board at Penn for a while before I knew the source. May or may not have been published on february 12, 2007.

Psychic reminds me of the stockbroker’s coin flip scam.

Analyzing the size of dating pools, based on the “half your age plus seven” rule for the youngest person you can date without things seeming sketchy. Oddly enough, I caught myself actually doing this math a couple days ago. And not for practical purposes; the person I’m sort of seeing right now is a few months younger than me. Someone’s actually done the analysis (found via reddit). However, I’d posit that the “half your age plus seven” rule is just a linear approximation and shouldn’t be extrapolated like this.

And if you’re at a loss for what to get your statistically significant other today, don’t do this.

(Andromeda Yelton has an index to science topics in XKCD which was useful in compiling this post.)