## Will the rest of the hurricane season be worse?

Isaias will be the ninth tropical storm of the season, likely to form on July 30. This is unusually early. From CNN on July 28: “Once it is given the name Isaias — pronounced (ees-ah-EE-as) — it will be the earliest storm to begin with an “I” on record. The previous record was set on August 7, 2005, part of the busiest season to date.”  Storms are named in alphabetical order, so a storm starting with “I”  is the ninth storm of the season.

This did seem early to me. The recent example that I remember Irma, which made landfall on the mainland US on September 10, 2017. (I spent a day seeing lots of people with Florida license plates driving around Atlanta, thinking “at least I don’t live in Florida!”, and then the next day having my power out and being glad that the trees the storm took out in my backyard were ones that needed taking out anyway.)

What does this mean for the rest of the hurricane season? Should we expect a busy one?

There’s an R package, HURDAT, that compiles the NOAA hurricane database. For each year, we can count the number of storms. First let’s make a list of all the storms:

library(HURDAT) library(lubridate) al <- get_hurdat(basin = 'AL') cutoff_month = 7 cutoff_day = 29 storms = al %>% group_by(Key) %>% filter(Status %in% c('TS', 'HU')) %>% mutate(rk = rank(DateTime)) %>% filter(rk == 1) %>% mutate(season_part = ifelse(month(DateTime) < cutoff_month | month(DateTime) == cutoff_month & day(DateTime) <= cutoff_day, 'early', 'late')) %>% select(Key, Name, DateTime) 

and count the storms by time in the season:

storm_counts = storms %>% mutate(yr = year(DateTime)) %>% group_by(yr, season_part) %>% count()

But how do we know about are storms? Not all storms were observed in the past, because some stay out to sea. The “satellite era” began in 1966, according to this article about this year’s Tropical Storm Gonzalo (the earliest “G” storm ever). So let’s plot the number of storms before and after July 29, separately for each of the two “eras”.

satellite_start = 1966 storm_counts %>% spread(season_part, n, 0) %>% mutate(satellite = ifelse(yr >= satellite_start, 'satellite', 'pre-satellite')) %>% ggplot() + facet_wrap(~satellite) + geom_jitter(aes(x=early, y=late)) + stat_smooth(aes(x=early, y=late, group = satellite), method = 'lm') + scale_x_continuous(paste0('storms before ', cutoff_month, '/', cutoff_day)) + scale_y_continuous(paste0('storms after ', cutoff_month, '/', cutoff_day)) 

But 2005 is a big outlier! I remember that season. Of course it was the year of Katrina. And in December there was a hurricane named Epsilon, which I could not take seriously because epsilon is small. Indeed if we drop it the effect of early-season storms on late-season ones gets much smaller.

So there’s a small effect – but not as large as might be expected! So how many storms will we have this season? Calling predict(storm_model, data.frame(early = 9), interval = 'prediction', level = 0.95), gets a 95% prediction interval for the number of storms in the rest of the season: 12.5 plus or minus 7.2. So we could reasonably have as few as 5 more or as many as 20 more, for a total of 14 to 29.

A 50% prediction interval goes from 10 to 15 more storms in the remaining part of the season, for a total of 19 to 24. For comparison, the quartiles of the distribution of the number of storms after 7/29 in the satellite era are 7 and 12. So if we knew nothing about what’s happened so far, we’d give a 50% prediction interval of 16 to 21 storms for the number at the end of the season.

Let’s see what happens. And let’s hope these storms stay far out to sea, because pandemics and hurricanes don’t seem very compatible.

## Double (or triple) and sort

Here’s a sequence I keep mulling over in my head:

1, 2, 4, 8, 16, 23, 46, 29, 58, 116, 223, 446, …

Does it grow without bound? First we should specify what it is. Each number is double the one before, but with the digits sorted into ascending order. The doubling makes the number larger… but the sorting makes it smaller, so perhaps this sequence tops out somewhere?

This sequence actually becomes periodic pretty quickly. The full sequence is:

1, 2, 4, 8, 16, 23, 46, 29, 58, 116, 223, 446, 289, 578, 1156, 1223, 2446, 2489, 4789, 5789, 11578, 12356, 12247, 24449, 48889, 77789, 155578, 111356, 122227, 244445, 48889, …
and then it repeats with period 6. The key is in the final step: once we get a number that ends with 5, there’s a 0 in its double so the next term is actually shorter. (It is surprisingly hard to keep the mental arithmetic straight to the point where I can do this in my head.) There are other such loops: if you start with 3 you eventually reach

69, 138, 267, 345, 69, …

(note that everything stays divisible by 3 here!). If you start with 7 you get to

167, 334, 668, 1336, 2267, 3445, 689, 1378, 2567, 1345, 269, 358, 167, …

9, 18, 36, 27, 45, 9, …

And there’s a series of these sequences. Every number in that period-6 loop that you reach if you start with 1 has a repeated digit, and you can have less or more of those:

49, 89, 178, 356, 127, 245, 49, …
489, 789, 1578, 1356, 1227, 2445, 489, …
4889, 7789, 15578, 11356, 12227, 24445, 4889, …

The structure is quite a bit richer than the triple-and-sort sequence, presumably because divisibility by three “plays well” with rearranging digits. There, as far as I can tell, no matter what number you start with you end up with 45. (Tripling that give 135, tripling again gives 405, which sorts back to 45.) The reason for this seems to be that any repeated digits, when tripled, give you repeated 3s or 6s, and tripling those gives repeated 0s. For example, consider the start number 9261632697:

• Tripling gives 27784898091 – which incidentally has a zero – and sorting that gives 1247788899.
• Tripling again gives 3743366697 – the repeated 7s and 8s “turn into” repeated 3s and 6s — and sorting gives 3334666779.
• At this point everything collapses – tripling gives 10004000337, and sorting gives 13347.
• Another collapsing step: 13347 × 3 = 40041, which sorts to 144.
• The remainder of the sequence is uninteresting:  234, 27, 18, 45, 135, 45, 135, …

This is the “typical” behavior – but it’s a bit surprising to me that it always works. I suspect a proof would be in terms of vectors that show the number of each digit, rather than with the raw numbers themselves.

## How full is the pool?

Riddler Express for July 3, 2020:

Suppose a queue of swimmers arrives at [a] pool when it opens at 9 a.m. One at a time, each person randomly picks a lane from among the lanes that are available (i.e., the lane has no swimmer already and is not adjacent to any lanes with swimmers), until no more lanes are available.

At this point, what is the expected number of swimmers in the pool?

For five lanes the pool will be full when lanes 1, 3, and 5; 1 and 4; or 2 and 5 are occupied.

If lane 2 is first to be filled then we will certainly end up with just two occupied lanes (2 and 4); similarly with 4.

If lane 3 is first we certainly end up with three occupied lanes, since 1 and 5 remain and they don’t interfere with each other.

If the first swimmer uses lane 1 we actually have to think. The remaining lanes are 3, 4, and 5. If the second swimmer fills one of the “new” edge lanes (3 or 5) first then we’ll end up with three full lanes; if the second swimmer fills up 4 we end up with two full lanes. So if the first swimmer fills lane 1 we have an expectation of 2 2/3 lanes full, and similarly for lane 5.

Putting it all together the expected number of full lanes is $(2/5) (2) + (1/5) (3) + (2/5) (8/3) = 37/15$.

This is sort of a discrete version of “Renyi’s parking problem”: if you have a street of length x and cars of length 1 park at random, how full is the street expected to be when nobody else can park?

In the n-lane case there’s a recursive solution. Let f(n) be the expected number of full lanes in an n-lane pool. Clearly f(1) = 1, f(2) = 1; we already worked out f(5) = 8/3.

To find f(n), consider where the first swimmer goes.

• If they use lane 1 or lane n then there is one remaining sub-pool of n-2 lanes.
• If they use lane 2 or lane n-1 then there is one remaining sub-pool of n-3 lanes.
• If they use lane $k$ with $3 \le k \le n-2$ then there are two remaining sub-pools of k-2 and n-k-1 lanes.

So this gives us, if $n \ge 4$,
$f(n) = 1 + {2 \over n} f(n-2) + {2 \over n} f(n-3) + \sum_{k=3}^{n-2} {1 \over n} (f(k-2) + f(n-k-1))$$with the base cases f(1) = 1, f(2) = 1, f(3) = 5/3. If we compute this for large n we get $f(n) \sim 0.4323n$, which agrees with the Monte Carlo simulations in Georgiou, Kranakis, and Krizanc 2009. (Simulation isn’t necessary here but it’s necessary if you consider an analogous problem on a grid.) The constant $0.4323$ is $(1-e^{-2})/2$. Kranakis and Krizanc have some online notes on the urinal problem. as well, and the problem goes back at least to a problem posed in the SIAM Review in 1962 by Freedman and Shepp. (2020-07-07: fixed numerical expression of constant 0.4323) ## The smallest non-obvious composite It’s been a while, I know. My partner sent me a link to this tweet yesterday: Testing divisibility by 17 is hard! There’s a test: 10A + B is divisible by 17 if and only if A – 5B is. (Proof: 10A + B is divisible by 17 if and only if 12(10A + B) is, since 12 and 17 are relatively prime; then observe 12(10A + B) – 17(7A + B) = A – 5B.). But nobody has internalized this test. But 51 is also divisible by 3, and that’s easy to see if you know the usual test: a number is divisible by 3 if and only if the sum of its digits is. 5 + 1 = 6. So assume you know this test. Also assume you know how to test for numbers that are divisible by 2 or 5 (you can look at the last digit), 11 (for two-digit numbers you can just check if the two digits are the same), and you know the squares. Then the smallest number that looks prime but isn’t is the product of the first two distinct primes I haven’t mentioned, 7 × 13 = 91. Back in 2016 Christian Lawson-Perfect collected data on this from a game he made that asked people which numbers were prime; 51 is the number people got wrong most often. But 91 is the composite most often identified as prime that’s not a multiple of three. 57 also looks prime. ## A tale of zloty and forint I had a wonderful vacation in Vienna and Budapest in May. (Vienna is entirely irrelevant to this story.) My wife every so often talks about taking a trip to Poland for no particular reason. (Readers: why do I want to visit Poland?) So I started looking through Wikitravel to answer the above question, and every so often it mentions a price. I need to get the exchange rate to put it in context. So I type “usd to pln” into Google: In the chart, it looks like the zloty has been appreciating against the dollar. Is that because the dollar has been going down or because the zloty has been going up? Well, let’s see if the forint has made the same move. Type “usd to huf” into Google: It has! Hmm, I guess the dollar isn’t doing so well in the past few months. You might wonder why I checked the exchange rate for Hungarian forints and not for euros. The reason is because I know the forint was trading at around 275-280 to the dollar in May, whereas I can’t remember what the euro was. It looks like it was around$1.15 US but I used “vacation math” and just treated prices in euros as if they were in US dollars.

And then because I like typing random things into Google, try “pln to huf”:

Is there some currency arbitrage opportunity? Of course not, the math works out perfectly:

Can you even directly trade zloty for forint, or would you end up going via some major currency anyway?

Also, somewhere in Budapest you can see this:

and here’s a closeup of the text at the bottom:

A transcription of the text (sorry if I have introduced any errors):

A Rubik-kocka bármely keverés utan legfeljebb 20 tekerésből kirakható, de az osszetettsége miatt osszesen 43252003274489856000-féle eltérő állás hozható létre. “Mindig van megoldas és nem is csak egy!”(Rubik Ernő)

I do not read Hungarian (nor any language related to it, because there are very few such languages) but I know what this says. I suspect some of you do too.

## The riddle of the number chain

From Itay Bavly, a chain-link number problem:

You start with the integers from one to 100, inclusive, and you want to organize them into a chain. The only rules for building this chain are that you can only use each number once and that each number must be adjacent in the chain to one of its factors or multiples. For example, you might build the chain:

4, 12, 24, 6, 60, 30, 10, 100, 25, 5, 1, 97

You have no numbers left to place after 97, leaving you with a finished chain of length 12.

What is the longest chain you can build?

It turns out this question is literally as old as I am. Carl Pomerance wrote a paper On the longest simple path in the divisor graph, C. Pomerance, Proc. Southeastern Conf. Combinatorics, Graph Theory, and Computing, Boca Raton, Florida, 1983, Cong. Num. 40 (1983), 291–304. The result of this paper is that if $f(n)$ is the length of the longest chain in $\{1, 2, \ldots, n\}$, then $f(n) = o(n)$. That is, as $n$ gets large, we can’t “use up” a constant fraction of the integers. This paper was dedicated to Erdos on his seventieth birthday.

But the paper is not constructive. The natural thing to do is to think of this as a graph-theory problem. Construct a graph with vertices $1, 2, \ldots, n and draw an edge between$latex a\$ to $b$ if $a | b$. But finding the longest path in a graph is NP-hard.

For example, with $n = 10$ this graph appears as follows:

and you can see that 7 is going to cause something of a problem. If we just avoid it we can find the path

9, 3, 6, 1, 4, 8, 2, 10, 5

so $f(n) \ge 9$. You can also see something of the structure of this problem – numbers which share prime factors cluster together in the graph, and we end up putting numbers with the same prime factors together in the sequence. Here all the multiples of three are at the beginning, all the multiples of two in the middle, and all the multiples of five at the end.

In fact $f(n) = 9$, which we can figure out by arguing around 7. We can’t have a path that has 7 in the middle because 7 has degree 1 in this graph, so if 7 occurs at all it must occur at the start or the end. Without loss of generality let’s say we have a path that starts with 7, of length 10 We must have 7, 1, and then a path in this graph that goes through all eight vertices:

but such a path clearly doesn’t exist – if it existed it would have to go between the two vertices of degree 1, which are 9 and 5, but one can’t navigate properly around 2.

So what to do? I know nothing better than trial and error. We can quickly build the following path of length 30:

64, 32, 96, 48, 16, 8, 24, 72, 36, 12, 4, 2, 6, 18, 54, 27, 9, 81, 3, 1, 5, 15, 45, 90, 30, 10, 60, 20, 40, 80

which basically zig-zags through the 3-smooth numbers (that is, numbers with all factors 2 and 3) and then the numbers you get if you multiply those by 5. Then we want to insert little excursions out along different primes. For example between 3 and 1 we can insert

3, 51, 17, 68, 34, 1

which is essentially the path 3, 1, 4, 2 multiplied by 17. Doing a few of these insertions gives a length-49 sequence

64, 32, 96, 48, 16, 8, 24, 72, 36, 12,
4, 84, 28, 7, 14,
2, 88, 44, 22, 11, 99, 33, 66,
6, 18, 54, 27, 9, 81,
3, 51, 17, 68, 34,
1, 38, 76, 19, 95,
5, 15, 45, 90, 30, 10, 60, 20, 40, 80

where I’ve bolded the newly inserted bits, which go “out along” the primes 7, 11, 17, and 19 respectively. (Don’t ask me what happened to 13.) We can keep hacking away at this, for example sticking some multiples of 25 in between 5 and 15 to get

64, 32, 96, 48, 16, 8, 24, 72, 36, 12,
4, 84, 28, 7, 14,
2, 88, 44, 22, 11, 99, 33, 66,
6, 18, 54, 27, 9, 81,
3, 51, 17, 68, 34,
1, 38, 76, 19, 95,
5, 100, 50, 25, 75, 15, 45, 90, 30, 10, 60, 20, 40, 80

Since there are no 13s in here, let’s put them in. One way to do this is in the second line, which we can replace with the second and third lines here:

64, 32, 96, 48, 16, 8, 24, 72, 36, 12,
4, 56, 14, 42, 21, 84, 28,
7, 91, 13, 39, 78, 26, 52
,
2, 88, 44, 22, 11, 99, 33, 66,
6, 18, 54, 27, 9, 81,
3, 51, 17, 68, 34,
1, 38, 76, 19, 95,
5, 100, 50, 25, 75, 15, 45, 90, 30, 10, 60, 20, 40, 80

and this has length 62. Things get trickier now – the idea becomes to look for numbers that aren’t in the sequence but have all their factors small. 63 isn’t in the sequence and it feels insertable, and if we replace 27, 9, 81, 3 with 27, 81, 9, 63, 3 we can do it. A few more little moves like this get us up to length 67:

64, 32, 96, 48, 16, 8, 24, 72, 36, 12,
4, 84, 28, 56, 14, 98, 49,
7, 91, 13, 39, 78, 26, 52,
2, 88, 44, 22, 11, 99, 33, 66,
6, 18, 54, 27, 81,
9, 63, 21, 42,
3, 51, 17, 68, 34,
1, 38, 76, 19, 95,
5, 80, 40, 20, 100, 50, 25, 75,
15, 45, 90, 30, 60, 10, 70, 35

which is the solution I submitted.

The “missing” numbers now are

23 29 31 37 41 43 46 47 53 55 57 58 59 61 62 65 67 69 71 73 74 77 79 82 83 85 86 87 89 92 93 94 97

and you can see what the basic problem ends up being.  All of these have at least one prime factor that’s greater than or equal to 11. Can we squeeze any of these in? In particular 55 and 77 seem like they could be inserted along this basic route, although I can’t figure out how.

## Picturing math

The Metropolitan Museum of Art (New York) has an exhibit Picturing Math: Selections from the Department of Drawings and Prints, running from January 31 to May 8.  I was in New York mid-January, and I may have reason to be there again in June or July, so this seems like they’re setting their schedules to toy with me.  Fortunately, you can “view exhibition objects” at their web site or you can look at this article about the exhibit by Allison Meier for Hyperallergic.

## Do people on busier roads get more yard signs?

I live in Georgia’s sixth congressional district.  As you may have heard, we’re having a special election to replace Tom Price, the new secretary of health and human services.  Price was a popular incumbent, but Trump barely scraped out a win in this district, so there’s a lot of national attention on this race.  And even if there weren’t, Georgia has a system for special elections where everyone will run in a primary (on April 18), and if nobody gets 50% of the vote (which there won’t), then there will be a runoff between the top two candidates on June 20.

And I do mean everyone.  There are 11 declared Republican candidates and 5 declared Democrats.    What little polling there’s been indicates that we should end up with a standard partisan election in June, with Karen Handel (former Secretary of State of Georgia and former candidate for Governor and Senator) for the Republicans, and Jon Ossoff, former congressional aide, current maker of investigative films, and person who is younger than me and therefore unelectable due to things he posted on Facebook, for the Democrats.

But this post is not about politics.  It’s about yard signs.  If I’m not mistaken – and I very well may be mistaken – I’m seeing more yard signs than I did for the presidential election.  Now, I live in an American suburb, which means that there are houses on main roads that people pass by,  and there are houses in subdivisions that nobody drives by unless they live in the subdivision. And I’m noticing that yard signs seem to be more prevalent on the main roads.

Is this true? It’s hard to count and drive at the same time – it’s easy enough to count the signs, but counting the houses is tricky, and without the denominator this would be worthless – and I wouldn’t have been able to get a large enough sample anyway.  But it seems to make sense.  I live on one of those roads that nobody drives down unless they live on it.  If I had a sign, who would see it?   I’m not much for putting up signs anyway, but if I were, I’d be more likely to if I lived on a road where the sign would have more effect. As it turns out, I’m not the only one. The paper Understanding Visible Political Participation: An Analysis of Yard Sign-Displays during the 2008 Presidential Election came up with a model to predict which properties display yard signs, and traffic volume (subjectively coded into six categories from “dead-end” to “major artery” does turn out to be significant.  I found out about this from a Slate article by Sasha Issenberg which rounds up a lot of the research on signs.

By the way, this is one of those things that’s very hard to Google, because the advice on  how to put up signs swamps the research on signs.  But the conventional wisdom seems to agree that signs on busier roads are more valuable.  One sign vendor writes (emphasis added):

1. The most effective way is to combine your dooknocking [sic] with your yard sign recruitment. Target the busiest streets for door to door work first, and for those people who are friendly, ask them if it would be okay to put a sign in the yard.

Perhaps this is why nobody bothers to canvass my neighborhood and offer me a sign.

## A sequence of Fibonacci facts

The Hampshire College Summer Studies in Mathematics program has what it calls an “Interesting Test” that interested students are invited to take as part of their application. Their web page offers a sample of past test problems. One of these, under the heading “Problems which at first seem to lack sufficient information”, is as follows:

From the third term on, each term of the sequence of real numbers $a_1, a_2, \ldots, a_{10}$ is the sum of the preceding two terms; that is, $a_n = a_{n-1} + a_{n-2}$ for $n = 3, 4, 5, \ldots, 10$. If $a_7 = 17$, what is the sum of all the ten terms?

I’d heard this one before (though it’s hard to find a source for things like this – the cloesst I could find was this Math StackExchange question. The solution is as follows: we can write $a_3, a_4, \ldots, a_{10}$ as linear combinations of $a_1, a_2$. In particular we have $a_3 = a_1 + a_2, a_4 = a_1 + 2a_2, \ldots, a_{10} = 21a_1 + 34 a_2$, where the general term is $a_n = F_{n-2} a_1 + F_{n-1} a_n$ (we’ll use this later). Here $F_n$ is the $n$th Fibonacci number, where $F_1 = F_2 = 1, F_n = F_{n-1} + F_{n-2}$. Adding everything up we get

$a_1 + \ldots + a_{10} = \left( 1 + 0 + 1 + 1 + 2 + 3 + 5 + 8 + 13 + 21\right) a_1 + \left( 0 + 1 + 1 + 2 + 3 + 5 + 8 + 13 + 21 + 34 \right) a_2$

and doing those sums gives $a_1 + \ldots + a_{10} = 55 a_1 + 88 a_2$. But $a_7 = 5 a_1 + 8 a_2$, and so the sum is just $11a_7 = 11 \cdot 17 = 187$. For example, consider the sequence with $a_1 = 5, a_2 = -1$:

$5, -1, 4, 3, 7, 10, 17, 27, 44, 71$

which you can easily verify sums to 187. In the case where you actually have the whole sequence, this is a reasonably well-known “arithmetic trick” that one can show grade school children! The HCSSiM version, of course, doesn’t let you see the whole sequence.

But are there similar tricks to this with shorter or longer sequences? That’s usually the way with the Fibonacci numbers; facts about them tend to be parametrizable. A bit of experimentation shows us that with six terms we have

$a_1 + \ldots + a_6 = 8 a_1 + 12 a_2 = 4 (2a_1 + 3a_2) = 4 a_5.$

See for example the series $4, 1, 5, 9, 14, 23$; its sum, 56, is four times its fifth term. And with fourteen terms we have

$a_1 + a_2 + \ldots + a_{14} = 377 a_1 + 609 a_2 = 29 (13a_1 + 21a_2) = 29 a_9$

(I’ll leave you to come up with your own example.) It would appear that we have $a_1 + \ldots + a_{4k+2} = L_{2k+1} a_{2k+3}$, where the $L_j$ are the Lucas numbers $L_1 = 1, L_2 = 3, L_n = L_{n-1} + L_{n-2}$. Consider the sum of the first $4k+2$ terms satisfying a Fibonacci recurrence:

$a_1 + a_2 + \ldots + a_{4k+2} = F_{4k+2} a_1 + (F_{4k+3}-1) a_2.$

Now we have $F_{4k+2} = F_{2k+1} L_{2k+1}$ and $F_{4k+3}-1 = F_{2k+2} L_{2k+1}$. These are both special cases of the identity givnen in Wikipedia, $F_{n+k} + (-1)^k F_{n-k}= L_k F_n$. So we have

$a_1 + a_2 + \ldots + a_{4k+2} = F_{2k+1} L_{2k+1} a_1 + F_{2k+2} L_{2k+1} a_2 = L_{2k+1} (F_{2k+1} a_1 + F_{2k+2} a_2) = L_{2k+1} a_{2k+3}$.

So the sum of a fourteen-term Fibonacci sequence is equal to a multiple of the ninth term; the sum of a ten-term Fibonacci sequence, a multiple of the seventh term; the sum of a six-term Fibonacci sequence, a multiple of the fifth term. What of a sum of a two-term Fibonacci sequence? We get $a_1 + a_2 = L_1 a_3$. Recall $L_1 = 1$ — the sum of the first two terms is, trivially, a multiple of the third term.

(I don’t claim this is original. Only that I had some fun thinking about it. That probably means I need to get a life.)

## Another Trump dump

Terry Tao’s open thread for mathematicians on the immigration executive order, not exactly a follow-up to it ought to be common knowledge that Donald Trump is not fit for the presidency of the United States of America. Tao writes:

mathematical research is a transnational activity, in that the specific nationality of individual members of a research team or research community are (or should be) of no appreciable significance for the purpose of advancing mathematics.

Not surprisingly, then, given the current political situation, 90 percent of people identifying themselves as “mathematician” donating money to US political campaigns give to Democrats, according to analysis by Verdant Labs. Only 85 percent of those calling themselves “Professor of Mathematics” do.  One possible explanation is the those identifying as “professor of mathematics” identify more with their position as educators, and education doesn’t have the same “transnational” quality.

You can download the data from the FEC if you want! I suspect there’s something interesting in here… let’s see how my new laptop handles those twenty million records.

And please point me to interesting analysis of FEC data on campaign contributions that’s already been done! A couple things I’ve seen from the data science community:

• Someone named Gary did some analysis in November of 2015, for the 2016 presidential cycle as it had unfolded thus far in Ohio
• Verdant Labs has also looked at the politics of first names .

But there is surprisingly little out there, given that this is such an interesting data set!

If you want to do something more useful than navel-gazing at data, I recently came across Data for Democracy, and Moon Duchin is organizing a summer school on the geometry of redistricting, including some training in being an expert witness in gerrymandering cases. Will someone write the gerrymandr R package?