Oscars edition

Nate Silver fivethirtyeights the Oscars. (Yes, that’s a verb.) That is, he predicts who’s going to win Academy Awards tonight by looking at who’s won (or been nominated for) awards previously in this awards season, weighting the results in proportion to how well those results have predicted Oscar results in the past. See also his 2009 and 2011 (behind NYT paywall) attempts at the same, which try to take some other variables into account; Silver seems to believe that he may have overfit, hence the simplification.

Meanwhile, John Lopez of Vanity Fair reports on a 2008 paper by Jonas Krauss, Stefan Nann, Daniel Simon, Kai Fischbach, and Peter Gloor, “Predicting Movie Success and Academy Awards Through Sentiment and Social Network Analysis”; at least at the time, the IMDB comments section gave lots of useful information. But there was no Twitter at the time of the paper (which was based on data from 2006); the folks at Topsy have an Oscars Index.

(I will refrain from predicting, because unlike Nate Silver I don’t have minions to clean the data for me.)

(Bi-)weekly links for February 18

Larry Wasserman: statistics declares war on machine learning.

Natalie Wolchover at Wired: In Mysterious Pattern, Math and Nature Converge, on random matrix theory.

A draft book by John Hopcroft and Ravi Kannan, CS theory for the information age (large PDF). Used in this CMU course by Venkatesan Guruswami and Ravi Kannan on modern mathematics for computer science, emphasizing high-dimensional geometry, probability, and other non-discrete mathematics.

257885161-1 is prime, says GIMPS. Liz Landau blogged about it and people at Metafilter talked about it.

Daniel Navarro of the University of Adelaide has a free e-book Learning statistics with R:
A tutorial for psychology students and other beginners

sarah-marie belcastro writes Adventures in Mathematical Knitting for American Scientist.

Simpson’s paradox in the wild

Found on Wikipedia by Kate Owens: a chart of education by income and race. At each level of education, white Americans outearn Asian-Americans. But overall, Asian Americans outearn white Americans. How does this happen?

The answer, of course, is that Asian Americans have a higher level of education overall. If the two groups had the same overall level of education, white Americans would outearn Asian Americans. It’s an example of Simpson’s paradox in the wild. (Note: one example of Simpson’s paradox at the Wikipedia article involves characters called “Lisa” and “Bart”.)

The Wikipedia chart is based on 2003 data. I would like to be able to reconstruct this with present data, but unfortunately more recent data seems to not break out Asian Americans separately.

Fractal broccoli

Did you know that broccoli is fractal in nature? It’s self-similar – little bits of broccoli look like big bits of broccoli.

To illustrate this, here’s a big piece of broccoli from tonight’s dinner at God Plays Dice headquarters:

20130215-185829.jpg

And here’s a small piece of broccoli, against a backdrop of a smaller pattern:

20130215-185935.jpg

They look quite similar!

I’m not the first to notice this: see Fractal Broccoli for the Gardening Geek and Fractal Broccoli with a Macro Lens, which features better photography. But what do you want before dinner?
My art department has a variety of fabric backdrops, mostly from recent quilting pursuits. More about that, perhaps, in a future post.

Super Bowl edition

From Freakonomics this morning: Just how bad are football pundits at picking winners? Not bad – about half of the time right, against the spread. I’m not surprised that individuals picking can’t beat the Vegas line consistently – my understanding is that thos individuals who consistently make money on sports betting are doing it by taking advantage of those rare occasions when Vegas misses something, and are only betting on some very small minority of games.

But what kind of success could someone expect picking not against the spread, but just trying to pick a winner? Sean J. Taylor wrote something about this back in November in which he observed that “These rankings only are only about 70-75% accurate, while optimal ranking almost always breaks 80%.” By “optimal ranking” he means an ordering of the teams done retrospectively, at the end of the season; “these rankings” are various methods which attempt to assign a rating to each team based on its statistics and then picks the team with the higher rating to win the game. The disparity here is because the “optimal ranking” model is inherently overfitting.

As for rating systems, the simple rating system is an example, where the rating works out to be the amount that a team would beat an average opponent on on a neutral field. Interestingly, that rating system, as implemented at pro-football-reference.com, has the 49ers at 10.2 points better than an average team and the Ravens at 2.9 points better. But the 49ers are only a 4-point favorite today. I don’t really care about football so I’m not going to comment.

Also, I recently came across a series of articles from 2009 at Math Goes Pop! on the “Super Bowl Squares” betting pool: one, two, three. One interesting variant is to use the score mod 9 instead of the score mod 10 as the thing being bet on – due to football scores typically coming in sevens and threes, and the arithmetic fact 7+3 = 10, some last digits are far more common in football scores than others, but this effect goes away if you work mod 9.

And you guys know about Facebook’s football map (also by Sean J. Taylor!), right?

Weekly links for January 27

Hollywood Hates Math, a montage of movie clips in which math gets a bad rap.

From Smithsonian magazine, Origami: A blend of sculpture and mathematics, featuring some of Erik Demaine’s origami.

Are upsets in women’s tennis more common because women play three sets, not five?

A mathematical card trick.

Margaret Wertheim writes about tactile fractals. This one is on exhibit at the University of Southern California.

Vi Hart’s guide to comments and George Hart on knot theory.

Gary Marcus and Ernest Davis on what Nate Silver gets wrong from the New Yorker.

Gary King, Professor of Government at Harvard is offering a class on Advanced Quantitative Research Methodology available online to the general public.

Markov chains and skill and luck

Markov chains are a hundred years and four days old, which brings to mind Using Markov chains to analyze Candy Land. As you may know, you can’t even lose in Candy Land on purpose! The results are entirely determined by the initial shuffle. Since it’s a game for children, this is a shame; I think that at least if you have really little kids, you want to be able to lose on purpose, or at least tilt the odds in their favor. Michael Mauboussin has argued that this is a sign that Candy Land is entirely a game of luck. Okay, Mauboussin didn’t talk about Candy Land specifically, but he argues in Untangling Skill and Luck that

There’s a simple and elegant test of whether there is skill in an activity: ask whether you can lose on purpose. If you can’t lose on purpose, or if it’s really hard, luck likely dominates that activity. If it’s easy to lose on purpose, skill is more important.

As far as I can tell, you can only lose in Candy Land by stacking the deck, which doesn’t really count – if I’m going to play games with my future children and I’m going to let them win, I don’t want to have to resort to stacking the deck. This is mostly because I’m not Persi Diaconis and so any stacking I did would be ruined by my shuffle.

Mauboussin has also argued (following Tom Tango, sabermetrician) that although a shortened season is good enough for the cream to rise to the top in the NBA (like last year), the same isn’t true in the NHL (like this year).

Power laws for plant lifespans?

I’ve been a bit slow at posting lately – I moved, got sick, and so on – but here we go again.

Yunfan Tan posts some wonderful time-lapse pictures of plants dying, linking to a paper Allometric scaling of plant life history by Yúria Marbà, Carlos M. Duarte, and Susana Agustí which shows that “both population mortality and population birth rates scale as the −¼ power and plant lifespan as the ¼ power of plant mass across plant species spanning from the tiniest phototrophs to the largest trees. ”

The pictures are nice, but as Cosma Shalizi (blog post, slides from talk) and Michael Mitzenmacher have pointed out, it’s all too easy to think you have a power law when you really don’t.

A new dartboard

A new dartboard is in use in the darts world championships currently being held, reports Alex Bellos at his Guardian blog. Because in darts one has to end on a double, parity becomes important – but previous dartboard designs had clusters of odd and even numbers. The design, by David Percy at Salford in the UK, tries to separate odds and evens in addition to separating large and small numbers as much previous work had done; you can read Percy’s Mathematics Today article.

I’m having trouble thinking of other games where such a drastic change to the field of play could be implemented. One possible example might be Scrabble. The dynamics of Scrabble and of its clone Words with Friends have always felt just a little different to me because WWF has a different arrangement of premium squares which make very high-scoring plays possible.