Bi-weekly links for July 15

A visualization of normal versus fat-tailed distributions

Archimedes: Separating Myth from Science at the New York Times.

Daniel Walsh plays detective with rolling shutter photos: given a picture of a moving propeller, can you tell how fast it was moving?

Dave Richeson uses a kayak to measure the perimeter of a lake.

From Brain Facts, some visualizations of nonlinear systems.

A profile of Aaron Clauset’s research on power laws and terrorism.

Academic doesn’t have a PhD problem. It has an attitude problem.

Gil Kalai givces a solution to auction-based tic tac toe.

the Guardian has a video in which Paul Klemperer talks about geometry and the banking crisis. Surprisingly it’s tropical geometry to the rescue in resource allocation problems, as seen in this paper by Klemperer and Elizabeth Baldwin.

Brian Whitman tells us how music recommendation works and doesn’t work.

From yhat, handwritten digit recognition with node and python.

Amazon rankings redux

I asked about a month ago are Amazon rankings Zipfian?

Morris Rosenthal, a self-publishing author, has an interest in this subject for obvious reasons; he’s got some interesting-looking resutls for both paper books and e-books. Roughly speaking, yes; the most interesting thing I notice is that the slope of the rank vs. estimated sales curve (on a log-log scale) is higher both at the head (best-seller) and in the tail of the distribution compared to its bulk. What to make of this, I don’t know.

Chad Orzel, a physicist and author of How To Teach Physics to Your Dog took his own shot at this question a couple years ago.

A quick bound on shuffles

David Eppstein asks: how many riffles does it take until all permutations are possible?

A riffle shuffle permutation, in the mathematics of shuffling cards, is a permutation of the cards that can be obtained by a single riffle shuffle — that is, you cut the deck into two packets and then interleave the packets. There are 2^n - n distinct riffle shuffles of n cards. For example, consider a five-card deck, which is initially in the order 1, 2, 3, 4, 5. Then a riffle shuffle consists of:

  • cutting the deck in one of the six possible ways: 12345/ (no cut), 1234/5, 123/45, 12/345, 1/2345, /12345 (also no cut) where the slash represents the cut.
  • riffling. For example look at 123/45. There are {5 \choose 2} = 10 possible ways to make a permutation from this – we decide which two slots to put the 4 and the 5 in, and then everything else is forced. For example if we decide that 4 and 5 will go in the second and fourth positions, we must get 14253. In general if we have k cards in the left-hand pile and n-k cards in the right-hand pile, we have $\latex n \choose k$ possible shuffles. Furthermore we can decompose any one of these permutations into two “rising sequences” in exactly one way, so it can come from exactly one cut — with a single exception. That exception is the identity permutation 123\ldotsn, which we can obtain from any of the n+1 possible cuts — so we must subtract n for the duplications. (If you put probabilities on this it becomes the Gilbert-Shannon-Reeds model.

Eppstein asks how many riffle shuffles it takes for each permutation to have nonzero probability. Since there are 2^n - n outcomes of a single riffle shuffle, in k iterations there are at most (2^n - n)^k possible results. There will actually be less, because there are relations among the permutation subgroup generated by the riffle shuffles. (In less fancy language, there are sequences of different shuffles which give the same result. It’s the pedigree collapse of shuffling.)

Let’s replace this with 2^{nk} to make the math easier. Now, there are n! permutations; this is greater than (n/e)^n by a standard bound. So just in order to have n! possible sequences of shuffles of length k, we have to have 2^{nk} > (n/e)^n, or, taking nth roots of both sides, 2^k > n/e. Taking base-2 logs, we get k > \log_2 n - \log_2 e.

Eppstein gives a dynamical systems argument that you need at least $\lceil \log_2 n \rceil$ shuffles – and it turns out that that’s enough. This is in comparison to the classic result of Bayer and Diaconis, claiming that you need $3/2 \log_2 n$ shuffles to get a well-shuffled deck.

A Russian puzzle

Dave Richeson tweeted about a puzzle from Futility Closet (original source a Russian mathematical olympiad): can you split the integers 1, 2, …, 15 into two groups A and B, with 13 elements in A and 2 elements in B, so that the sum of the elements of A is the product of the elements of B?

Think about it for a moment. There’s of course the temptation to brute-force it, which is doable, but there’s a more elegant solution.

This got me thinking – when can you split the integers 1, 2, …, n into two groups A and B, where B has two elements, so that the sum of the elements of A is the product of the elements of B?

Say B contains x and y. Then their product is of course xy. The sum of the elements of A is 1 + 2 + ... + n - (x+y) = n(n+1)/2 - (x+y). Setting these equal and rearranging gives

n(n+1)/2 + 1 = xy + x + y + 1

where we’ve added 1 to make the factorization work out – this becomes

n(n+1)/2 + 1 = (x+1)(y+1).

So the problem is reduced to finding factorizations of n(n+1)/2 + 1, which satisfy two conditions:

  • x and y can’t be equal (for specificity we’ll say x < y), and
  • x and y are both at most n.

Since we have y ≤ n, we’re going to have x ≥ (n+1)/2 + 1/n. n must be at least 2, so we can just write x ≥ (n/2) + 1. So we’re looking for factors of n(n+1)/2 + 1 in the interval [n/2+1, n]. Here’s some brute-force Python code to find all such solutions:

import math

def solutions(n):
    out = []
    total = n*(n+1)/2+1
    xmin = int(math.ceil(n/2.0) + 1)
    xmax = n
    for x in range(xmin, xmax+1):
        if total % (x+1) == 0:
            y = total/(x+1)-1
            if x < y:
    return out

def all_solutions(n):
    out = []
    for i in range(2, n+1):
        sols = solutions(i)
        for sol in sols:
            sol.insert(0, i)
    return out

solutions takes an integer n as input and returns pairs [x, y] which are solutions to the problem. For example solutions(17) returns [[10, 13]].
And all_solutions takes an integer N and returns all triples [n, x, y] with n \le N which are solutions to the problem — that is, where xy equals the sum of all the integers up to n except for x and y. The first few solutions are:

n x y
10 6 7
17 10 13
26 15 21
36 22 28
37 21 31
45 27 36
50 28 43
61 42 43
65 36 57
67 42 52
78 45 66
82 45 73
91 52 78
94 57 76
101 55 91
102 70 73
110 70 85
122 66 111
136 76 120
138 87 108

So it appears that there’s nothing particularly special about the number 15 in the initial puzzle. There are plenty of values n for which you can’t do this, and plenty for which you can. Also, there are values of n for which there are multiple solution pairs (x, y), although not surprisingly they are rare. The smallest such n is 325, for which x = 171, y = 307 and x = 175, y = 300 are both solutions. In this case $latex n(n+1)/2 + 1 = 52976 = 24 \times 7 \times 11 \times 43$, from which 52976 has (5)(2)(2)(2) = 40 factors. A typical number of this size has about log(52976) \approx 11 factors. This abundance of factors makes it more likely that 52976 would have two factorizations of the sort we’re looking for. And in fact 52976 = 172 \times 308 = 176 \times 301.

Solutions to this problem appear to have some interesting statistical properties… more on that in a future post.

Weekly links for July 1

Simulated car design using genetic algorithms.

From the arXiv:
The Supreme Court is a spin glass.
How should traffic signals be timed on two-way streets?

From the June Notices of the AMS:
Judith R. Goodstein and Donald Babbitt’s article of E. T. Bell and Caltech mathematics between the wars (of Men of Mathematics and Bell numbers fame).
Richard Hoshino and Ken-ichi Kawarabayashi, “Graph Theory and Sports Scheduling”. As you might suspect from the names of the authors, they’re Japanese; the numbers they use in their problem apply to Japanese pro baseball (NPB), and their work has been used in actual scheduling of NPB.

Bryna Kra on mathematics as a toolbox for the sciences in the Chronicle of Higher Education.

Joel Grus can analyze data and has a two-year-old daughter, so naturally he looked at the most boyish and girlish colors and eigenshirts for children’s T-shirts.

Alex Bellos at the Guardian shows us mathematical food items.

William Beaty on the physics behind traffic jams.

Tom Fawcett has a gallery of visualization of results from machine learning classifiers.

Jon McLoone asks is there any point to the 12 times table?

Rafe Kinsey, at the University of Michigan, is teaching a freshman writing course on math, writing, and the world in the fall of 2013.

The boy who loved math: the improbable life of Paul Erdos is an illustrated children’s book.

John Cook on statistical evidence vs. legal evidence.

Gurmeet Manku has a collection of “75 combinatorial puzzles for mathematicians and computer scientists.”.

Celebrities die e at a time. (via Reddit)

From Nautilus magazine: how to insure against a rainy day and taming the unfriendly skies.