Income inequality, social mobility, and sample size

Matt O’Brien at the Washington Post’s Wonkblog has an infographic that contains the following information:

quintile of income distribution first second third fourth fifth
% of college graduates from poor families 16 17 26 21 20
% of high school dropouts from rich families 16 35 30 5 14

This comes from a paper entitled Equality of opportunity: definitions, trends, and interventions by Richard V. Reeves and Isabel V. Sawhill. The second row is from their figure 10, the first from their figure 11. Rich and poor families are those in the top and bottom income quintiles; the table is looking at their children’s income at age 40.

The interpretation that O’Brien suggests is that “Even poor kids who do everything right don’t do much better than rich kids who do everything wrong. Advantages and disadvantages, in other words, tend to perpetuate themselves. ”

And that is true, but there’s something interesting I can’t help but see here – the distribution of incomes for high school dropouts from rich families appears to have two peaks. Are there some of these “rich” who have gotten a leg up from their families while others didn’t? More likely, though, is that the sample size involved is just too small to make detailed claims like this. (And the 80th percentile is hardly rich.). I bet it’s possible to pull off something like that in a society with multiple castes that hardly overlap, but that’s not the situation in the US – we have a lot of income inequality but there are smooth gradations between the different segments of the income distribution.

Polling and the wisdom of crowds

From The Fix at the Washington Post: Americans think the Republicans will win control of the Senate. See also the New York Times’ Upshot, which references this paper by David Rothschild and Justin Wolfers. In some sense, by asking me who I think is going to win an election you’re looking at not just who I’m going to vote for but who I think my friends are going to vote for, from talking to them.  For example, if hypothetically I’m part of one party’s base but I know a lot of swing voters, I might think of who my swing-voting friends say they’re going to vote for and say that that candidate will win.

Essentially you’re inviting me to construct an ad hoc estimator of how the election will turn out by observing my social network. My own voting behavior is a biased estimator of the final election results; explicitly inviting me to think about what will happen invites me to remove that bias.

Links for October 26

Mona Chalabi at FiveThirtyEight on queueing theory as applied to grocery stores.

Heuristics for estimating life expectancy, from Decision Science News.

Natalie Wolchover at Quanta Magazine, At the Far Ends of a New Universal Law, on the Tracy-Widom distribution from random matrix theory.

How to tell the temperature using crickets from Priceonomics. (Supposedly this eventually goes back to the Arrhenius equation but a quick Google only finds me unsupported claims of this fact. Google Scholar is a little better.)

Better Explained has an interactive guide to the Fourier transform and the law of sines. (I bet at least one of these is old, but I came across them this week.)

Colm Mulcahy has rounded up a bunch of Martin Gardner’s puzzles for the BBC, in honor of the 100th anniversary of Gardner’s birth, and also his top ten Scientific American columns The CBC program The Nature of Things did a show on Martin Gardner in 1996.

I’m still working out what to think of this app that solves math problems by pointing your phone at them.

Tiny Data, Approximate Bayesian Computation and the Socks of Karl Broman applies Bayesian computation to doing the laundry.

A book on the foundations of data science (high-dimensional geometry, Markov chains, etc.) by John Hopcroft and Ravindran Kannan of Microsoft Research is available online.

Users of MathOverflow have compiled a list of obscure names in mathematics, i. e. theorems whose names don’t tell you what the theorem is about or who discovered it.

Michael Jordan is interviewed by IEEE Spectrum and comments on how that process was disillusioning.

When to buy airplane tickets

From Yahoo Travel: what day of the week to buy airplane tickets for the best deal. Short version: round trip domestic airfares average about $430 on weekends and about $500 on weekdays, so buy on the weekend. The Yahoo piece is, in turn, a condensation of this piece from the Wall Street Journal. The WSJ piece acknowledges that a portion of this is because price-insensitive business travelers buy on weekdays and price-sensitive leisure travelers buy on weekends.

(Are business travelers really price insensitive? Sure doesn’t seem like it where I work, and lots of places have policies that basically require the employee to book at the lowest price unless they jump through a whole bunch of bureaucratic hoops. Whereas if I’m an individual buying a ticket, I can pay a little more for the more favorable schedule without asking anyone. But I digress…)

Seems to me that the big elephant in the room is that business travelers travel on different routes than leisure travelers. And if I’m trying to buy plane tickets for myself and hoping to be able to time this purchase, I don’t care what prices I could get on other tickets being bought by other people on the same day as me.

We could reproduce the phenomenon these articles are showing as follows. Imagine an airline with two routes. Say that tickets on route A, a business-heavy route, cost 600 dollars regardless of the day, and on route B, a leisure-heavy route cost 300 dollars. On weekdays, two-thirds of tickets purchased are on A and the average price is 500; on weekends only half of tickets are on A and the average price is 450. This whole thing may be a less severe form of Simpson’s paradox – I’m saying it’s less severe because a true Simpson’s paradox would have it actually being more expensive to buy tickets on the weekend for any given route.

It’s not impossible that it’s actually cheaper to buy tickets for a given route on the weekend – but looking at simple averages won’t prove it.

Simulating a bet on a whole series from bets on individual games

From Mind Your Decisions, a puzzle about gambling:

Your friend wants to make an even-payoff bet on the outcome of the entire World Series. That is, he wants to make a $100 bet so that if his team is the champion he will win $100, and if his team loses he will lose all of his money.

The problem is he uses a bookie that takes bets only on individual games, and not the entire outcome. The bookie is, however, offering even-payout bets for each game and for any dollar amount.

How much should your friend bet on each game so that he can simulate an even-payout $100 bet on the outcome of the entire series?

For notational simplicity, I’m going to measure money in units of $100, so you start with 1. And for concreteness, let’s say you want to bet on the Giants against the Royals. (I used to live in San Francisco and have never been anywhere near Kansas City.) The goal is to put together a series of bets that will leave you with 2 if the Giants win and 0 if they lose.

The “probabilities” that I’m going to mention are probabilities computed as if all games are independent and equally likely to be won by both teams; of course this is not true in reality. (The finance folks have a name for this; it’s been a while since I looked at any finance. What is it?)

The answer can be summarized as follows: To determine what to bet on the Giants in game n, before game n but after game n-1:

  • determine the probability that the Giants will win the series if they win game n; call this p^+;
  • determine the probability that the Giants will win the series if they lose game n; call this p^-;
  • bet p^+ - p^-

Now, note that the winning probability before game n must be p = (p^+ + p^-)/2.

By following this strategy, if the Giants win your bankroll goes up by p^+ - p^-, and the probability of the Giants winning goes up by p^+ - p or (p^+ - p^-)/2; that is, the change in your bankroll is twice the change in probability. This is also true if the Giants lose. At the beginning your bankroll is 1 and the probability of a Giants win is 1/2, so your bankroll is always twice the win probability. In the end it’s 2 if the Giants win and 0 if they lose, simulating the desired bet.

On a related note, people’s guesses about how scores proceed in an NFL game are wrong.

Fund Samuel Hansen’s kickstarter

Hopefully this isn’t too little, too late: you should fund Samuel Hansen’s kickstarted Relatively Prime: Series 2, an excellent series of long-form “stories from the mathematical domain”. Samuel is the creative force behind such excellent science podcasts as Combinations and Permutations, Strongly Connected Components, Science Sparring Society, and (with Peter Rowlett) Math/Maths, and he did a series of Relatively Prime a couple years ago, so you know it’ll be good.

And because I know you were wondering, there’s a site that can tell you the probability that a Kickstarter will be funded.

Margins of error on Atlanta-area traffic signs

Every work day, in the evening on the way home, I pass a sign on Georgia 400 a few miles north of I-285. On a good day it will read something like:

“I-285: 4-6 MIN / I-85: 11-13 MIN”

and on a bad day it’ll read something like

“I-285: 10-12 MIN / I-85: 32-34 MIN”

I figure this sign is somewhere in Sandy Springs, Georgia, although it may be in Roswell, the next city north; see a google map. There are other, similar signs that I also pass on my commute but this is the one I pay attention to.

But what’s interesting about these signs is that, no matter how long they claim the drive will take, the range is always two minutes wide. And you’d expect that the error on the distance from my sign to I-285 would be smaller than my sign to I-85 – the drive to I-285 is a part of the drive to I-85, and it would be quite strange for errors on estimates in the segment from the sign to I-285 to be negatively correlated with errors on the segment from I-285 to I-85. Presumably these one-minute “errors” are purely cosmetic, in order to remind people that these estimates are not always correct.  I assume that there is some internal estimate of the margin of error in the system doing the estimation, though, presumably calibrated on past estimates – why not just use this?  Although this would perhaps be a level of sophistication beyond what people are used to handling.  In weather forecasting, for example, we regularly see probabilities of precipitation, but not error bars around temperature forecasts.

Another, less mathematical, thing about these signs: where Georgia 400 crosses I-285, in the morning (when traffic is generally heading towards 400, not away from it) a sign often reads “I-285 SPEEDS : EAST 55+ MPH WEST 55+ MPH”. I suppose they don’t want to just come out and admit that people go at least 70 when there’s no traffic; the speed limit is 65 in good conditions.