Links for November 9

Pierre Cartier wrote a long profile of Grothendieck for the new journal Inference: International Review of Science.

Jonathan Touboul wrote a paper The hipster effect: why nonconformists all look the same. Here’s apopular summary by Gabe Bergado at Mic. Basically, if you don’t want to look like everyone else, but it takes you some time to figure out what everyone else is doing, you’ll end up synchronizing with the other people with the same preference for nonconformity,

Portrait of the Hilbert curve by Aldo Cortesi.

Becca Cudmore and Jennifer Daniel at Nautilus show us five ways to lie with charts.

Here’s an interesting paper on best practices for scientific computing. One of the authors, Greg Wilson, is from an organization called Software Carpentry which teaches programming to scientific researchers.

From Natalie Wolchover and Peter Byrne at Quanta, In a multiverse, what are the odds? (First in a series; the second one should be out tomorrow.)

jasmcole has written about the mathematics of stereographic lampshades, which are made so that the light that shines through them makes interesting patterns on your walls. This was inspired by a blog post of Alex Bellos on the work of Henry Segerman and Saul Schleimer. Of course this can be made into reality with a 3-D printer, and you can buy it at Segerman’s shapeways store.

Is it professional for a professor to ask “surprise” questions on a test?, from Academia Stack Exchange. (Short version: the question is poorly phrased, but yes, and perhaps it is part of the professor’s duty, because being able to figure out things you haven’t seen before is usually one of the things you should learn in a course.)

Posted in Uncategorized | Leave a comment

We always think we’re right, but we don’t think we’re always right.

Jordan Ellenberg on how many states Nate Silver is going to get wrong, according to Nate Silver. (This refers to the elections of US Senators taking place tomorrow.) For each state Silver gives a probability of winning; we can give a probability that Silver will be wrong which is just his own predicted probability that the underdog wins. The answer is an an expected value of 2.5. Silver has been saying since the 2012 election that he got lucky in calling all fifty states correctly. In some sense it would have been more impressive if he’d missed a couple, which would have shown his predictions were calibrated correctly. (I remember trying to explain this to colleagues at my job at the time, where I’d been for a bit over a month; I think I did so successfully, but it’s a subtle point.)

Silver’s famous 50-for-50 2012 presidential predictions are still available; according to his own predictions, he would have expected to get about 1.8 states wrong, on average. It’s hard to say just how good going 50-for-50 is, though, because the errors are correlated.

However, it almost never makes sense to look at binary outcomes, but rather at the continuous outcomes that they collapse. (For example, when looking at sports data use difference in points scores instead of win-loss records.) Andrew Mooney at the Boston Globe did exactly this, and saw that 68% of the time Silver got within his stated one-standard deviation margin of error, and 96% of the time within two standard deviations.

Posted in Uncategorized | 1 Comment

Index of ignorance, or just innumeracy?

From Zach Wener-Fligner at Quartz, <a href=”http://qz.com/288707/everything-you-think-you-know-about-the-news-is-probably-wrong/”>Everything you think you know about the news is probably wrong</a>, based on <a href=”https://www.ipsos-mori.com/researchpublications/researcharchive/3466/Perceptions-are-not-reality-10-things-the-world-gets-wrong.aspx”>this Ipsos MORI study</a> of online panels in fourteen countries: Australia, Belgium, Canada, France, Germany, Hungary, Italy, Japan, Poland, South Korea, Spain, Sweden, Great Britain and the United States of America. Ipsos MORI compute an “index of ignorance” – but to some extent this may just be an index of innumeracy.

For example, the average American, when asked, guessed that 24% of girls aged 15-19 give birth each year. The actual value is 3%. In every country surveyed people were off by a factor of at least five. I’d posit that this is not a question of being uniformed so much as innumerate. If 24% of girls aged 15-19 give birth each year, and nobody gives birth before 15, then the average number of children of a woman at age twenty would be 1.2. Do people seriously think the average twenty-year-old woman has more than one child? I doubt it.

The other questions were percentage of Muslims (most people overestimate), Christians (most underestimate), immigrants (overestimate), percentage who voted in the last major election (underestimate), percentage “unemployed and looking for work” (overestimate), and life expectancy of a child born in 2014 (pretty much right on).

One of these numbers is not like the others. We’ll all die someday, and we all have some idea of how long people live, so we naturally get this right. But the others are asking for percentages, and I don’t think most people could tell the difference between “10% of people have this trait” and “20% of people have this trait” just by guessing. South Koreans and Japanese overestimate the number of Christians – and those are the two countries on this list in which Christians are a minority. I wonder, if you looked at the estimates people gave for a lot of these percentages, if they’d show a peak at 50%, the thought process being “well, people with trait X exist, but not everybody has trait X, so what’s a number in between? I’ll just pick the simplest rational number between 0 and 1.”

I’m a bit puzzled about the unemployment numbers, though. These are generally fairly loudly trumpeted in the media, so I’d expect people at least give estimates in line with those ranges, and yet, for example, the US guess is 32 percent. (The percentage of people “unemployed and looking for work” is actually lower than the unemployment rate, by definition, as the unemployment rate is the percentage of people in the labor force who are looking for work – the unemployment rate has the same numerator and a smaller denominator.) Even if people think about the experiences of those close to them instead of the public at large, on average this shouldn’t change things unless unemployed people happen to have lots of friends and family who take these surveys.

I’d also be interested to see how estimates correlated with political views. For example, are people who think there are more immigrants more likely to be anti-immigration? Do people who think the unemployment rate is higher support policies that would stimulate their nations’ economies?

Posted in Uncategorized | Leave a comment

Links for November 2

John Rauser of Pinterest gave a talk at the Strata Conference / Hadoop World on statistics without the agonizing pain – you replace the pain of statistics with the pain of simulation, which for an audience of programmers is much less painful. Via revolution analytics.

Surfacing this week on Hacker News was this article by NASA engineer Don Pettit on “The tyranny of the rocket equation”.

Tim Gowers writes on the results of an experiment concerning computer-generated mathematical writing.

Vi Hart made a scary Halloween video featuring candy corn and the Sierpinski calendar.

The video from the Online Encyclopedia of Integer Sequences conference is available online.

Posted in Uncategorized | Leave a comment

A place value puzzle

Fawn Nguyen, a middle-school math teacher, has written on finding the greatest product of a three-digit number and a two-digit number made up from some set of five digits. For example: if you’re given the digits 8, 7, 5,4, and 2, you’d have (to pick a product at random) 745 \times 82 = 61090. The question is to write the largest possible such product (ideally without doing the multiplication explicitly).

In this case it’s pretty obvious that if you can make one factor larger by just switching around its digits, you should do it: so 754 \times 82 > 745 \times 82. But how can you move around the digits between the factors? Which is larger, 854 \times 72 or $\latex 754 \times 82$? The trick here is to rewrite as 10 \times 85.4 \times 72 and 10 \times 75.4 \times 82, and recall that of two pairs of numbers with the same sum, the ones closer together have a larger product. That is, (x-a)(x+a) > (x-b)(x+b) if and only if a<b. Since 85.4 + 72 = 75.4 + 82, we can conclude that 85.4 \times 72 < 75.4 \times 82.

So 754 \times 82 is larger. But now we can switch the 2 and the 4 by the same sort of logic: 754 + 82 = 752 + 84 and so 754 \times 82 < 752 \times 84. This, it turns out, is the best we can do, as we can check by brute force – but how do we know this holds up generally? I haven’t used the differences between digits explicitly, only their ordering, so perhaps everything only depends on the order of the digits. Let’s let our digits be a, b, c, d, e, with a > b > c > d > e. Then there are just ten possible products to look at if we’re trying to find the largest once, since the digits in each factor have to be increasing:

abc \times de, abd \times ce, abe \times cd, acd \times be, ace \times bd, ade \times bc, bcd \times ae, bce \times ad, bde \times ac, cde \times ab.

(Note to pedants: juxtaposition of letters means juxtaposition of the corresponding digits, so when I write ad I mean 10a + d, and so on.)

We want to show that $bce \times ad$ is the largest of these products. We can make “moves” of the form bde \times ac = 100 \times b.de \times ac > 100 \times c.de \times ab = cde \times ab, for example, to show that it’s larger than $cde \times ab$; the inequality in the middle follows from b.de + ac = c.de + ab. If I’m not mistaken, any time two of these ten products differ by just switching two letters we can prove an inequality between them. And after a seriously grungy case analysis (which I won’t bore you all with) I believe we get that bce \times ad is always the largest product. In any case, for this particular problem there are only {9 \choose 5} = 126 possibilities so you could check by brute force. (Not a good exercise for people who you’re trying to teach to think about place value, but also not a bad programming exercise…)

Is there a general rule, when the number of digits is not five? Certainly we want to spread out the large digits, but how exactly?

Posted in Uncategorized | Leave a comment

Income inequality, social mobility, and sample size

Matt O’Brien at the Washington Post’s Wonkblog has an infographic that contains the following information:

quintile of income distribution first second third fourth fifth
% of college graduates from poor families 16 17 26 21 20
% of high school dropouts from rich families 16 35 30 5 14

This comes from a paper entitled Equality of opportunity: definitions, trends, and interventions by Richard V. Reeves and Isabel V. Sawhill. The second row is from their figure 10, the first from their figure 11. Rich and poor families are those in the top and bottom income quintiles; the table is looking at their children’s income at age 40.

The interpretation that O’Brien suggests is that “Even poor kids who do everything right don’t do much better than rich kids who do everything wrong. Advantages and disadvantages, in other words, tend to perpetuate themselves. ”

And that is true, but there’s something interesting I can’t help but see here – the distribution of incomes for high school dropouts from rich families appears to have two peaks. Are there some of these “rich” who have gotten a leg up from their families while others didn’t? More likely, though, is that the sample size involved is just too small to make detailed claims like this. (And the 80th percentile is hardly rich.). I bet it’s possible to pull off something like that in a society with multiple castes that hardly overlap, but that’s not the situation in the US – we have a lot of income inequality but there are smooth gradations between the different segments of the income distribution.

Posted in Uncategorized | Leave a comment

Polling and the wisdom of crowds

From The Fix at the Washington Post: Americans think the Republicans will win control of the Senate. See also the New York Times’ Upshot, which references this paper by David Rothschild and Justin Wolfers. In some sense, by asking me who I think is going to win an election you’re looking at not just who I’m going to vote for but who I think my friends are going to vote for, from talking to them.  For example, if hypothetically I’m part of one party’s base but I know a lot of swing voters, I might think of who my swing-voting friends say they’re going to vote for and say that that candidate will win.

Essentially you’re inviting me to construct an ad hoc estimator of how the election will turn out by observing my social network. My own voting behavior is a biased estimator of the final election results; explicitly inviting me to think about what will happen invites me to remove that bias.

Posted in Uncategorized | 1 Comment