Which party will be president if there’s a 50-50 Senate?

FiveThirtyEight currently gives Joe Biden an 88% chance of winning the US presidential election, and Donald Trump 12%.

The Economist gives Biden 92, Trump 7.

Both have Biden ahead by 8.4 percent in the popular vote. FiveThirtyEight has 53.6 to 45.2 (with 1.2 percent to third parties), while the Economist has 54.2 to 45.8 (with no third parties – I presume they’re measuring share o the two-party vote). My assumption is that the difference in these odds is due to FiveThirtyEight’s model putting a larger correlation between states than the Economist’s, and therefore giving a wider distribution around that center point.

Both sites also have a model of the Senate election. FiveThirtyEight expects the Democrats to have 51.5 seats after the election, with a 74% chance of control; the Economist expects 52.5 seats for the Democrats, with a 76% chance of control. Recall that if the Senate is tied, 50-50, then the Vice President (Kamala Harris for the Democrats, or Mike Pence for the Republicans) breaks the tie; that is, Senate control belongs to the party holding the White House. So what do the models say about that tie?

FiveThirtyEight presents the diagram below, where the 50/50 bar is split between the parties:

If you hover over the red part of the 50-50 bar you get “1.7% chance” (of a 50-50 Senate and Republican president); if you hover over the blue part you get “11.2% chance” (of a 50-50 Senate and Democratic president). That is, conditional on a 50-50 Senate, FiveThirtyEight gives a probability of 0.112/0.129, or about 87%, of a Democratic president. (This is different from the 88% figure above.)

The Economist, on the other hand, explicitly says that conditional on a 50-50 Senate, there’s an 18% chance of a Democratic presidency:

Which one of these probabilities is more realistic? Where do they come from?

The Economist writes, on their methodology page:

In presidential-election years, the model also simulates which party will control the vice-presidency, which casts the tiebreaking vote in case of a 50-50 split, based on the simulated national popular vote for the House.

FiveThirtyEight’s Senate forecast methodology page doesn’t seem to make a statement about this; they mention that the 2020 Senate model is “mostly unchanged since 2018”, and of course there was no presidential election in 2018.

My instinct is that 50-50 is a bad night for Democrats. The Democrats start with 35 seats not up re-election. Both sites agree on the 15 most Democratic-leaning Senate seats that are up for election. So let’s say that Democrats win those 15 states and no others, for a 50-50 Senate. For the sake of argument, assume that every state that has a Senate seat up for grabs chooses the same party for the Senate and the presidency. So let’s fill in an Electoral College map with those 15 states in blue, and the nineteen other states with Senate seats at stake in red, to get the map below. (15 + 19 = 34, you say? Well, Georgia has two Senate seats at stake.)

Click the map to create your own at 270toWin.com

Next let’s fill in those states that don’t have a Senate election, but are safe for one party or the other. For the Democrats, California, Washington, Hawaii, New York, Maryland, Vermont, Connecticut (and DC). For the Republicans, Utah, North Dakota, Missouri, and Indiana. (I’m old enough to remember when Missouri was a swing state.) Here’s the map you get.

Click the map to create your own at 270toWin.com

So in a world where the Senate is 50-50 in what is probably the most likely way, it looks like the Democrats are right on the cusp of winning the presidency – FiveThirtyEight is probably right after all, to color the 50-50 bar mostly blue. I just hope we don’t get that 269-269 map, partially because it’ll be exhausting and partially because then I should have written a post on how a tied Electoral College gets thrown to the house instead of writing this one.

Zip codes divide the country

From r/dataisbeautiful by alekazam13, a chart of how many people live in regions where the zip codes start with each digit. The distribution is surprisingly uneven.

To give some context: zip codes in the US (what other countries would call “postal codes”) have five digits. The first digit corresponds to one of ten regions of the US (these regions don’t exist anywhere outside of the zip code system), the next two to a postal service sorting center, and the last two to an individual post office. There are something like 40,000 zip codes, and of course 100,000 possible ten-digit numbers. Here’s a map of the regions, public domain from Wikipedia:

Of course, ZIP codes were invented some time ago; they were introduced in 1963! So what if we use 1960 census figures? There was some imbalance, but perhaps less than there is today.

first digit9371248605
population (millions), 201953.546.840.633.232.932.723.823.723.717.3
population (millions), 196021.217.917.028.516.625.
Distribution of population by first digit of zip code, 2019 and 1960

The population of the US right now is about 332 million; so in an ideal world you’d have 33.2 million in each bucket. If you want to divide the US into ten regions, all made up of adjacent states (we’ll make some exceptions for Alaska, Hawaii, and Puerto Rico), and it’s 1963, you run into a problem pretty quickly. The population of the US at that time was 179 million. New England (that’s the six states in yellow to the far northeast; note that New York and New Jersey are not part of New England) had 10.4 million people, not enough to plausibly call it a tenth of the country. New York (labeled “10-14” above) had 16.8 million, nearly a tenth all on its own. The only sensible decision was to skip over New York and add New Jersey. (Why New York didn’t get its own first digit but has to share with Pennsylvania, I don’t know.). The “5” and especially “8” regions had very low populations back then; these were quite rural parts of the country and I assume had more post offices per capita. But since then we figured out how to get people to live in the desert.

I suspect if the system were invented today, then, the regions would look a bit different. In particular:

  • California would get its own region (at 39 million, it’s fully 12% of the US population)
  • Texas (29 million) would get a region nearly to itself, sharing with Oklahoma (4 million)
  • Georgia plus Florida (11 + 21) would be a region – these are two states that have grown quite a bit since 1960.
  • The six New England states (15) plus New York (19) would be a region

As is usual with these sorts of things, you nibble around the edges and then there end up being lots of ways to divide the middle of the country, none of which are any good. (I tried.). The rough design criteria seem to be:

  • divide the country into ten sets of states of roughly equal population;
  • such that each region is contiguous (probably Alaska and Hawaii should be in the same regions as Washington and California?, respectively, and Puerto Rico with somewhere on the East Coast);
  • and such that the regions don’t “look funny”, but what does that even mean?

In other words, the criteria for forming congressional districts or similar. (Without gerrymandering.)

Ultimately with computerized sorting having a system where zip codes are “interpretable” doesn’t really matter. And the bureaucracy of the Postal Service came up with a different solution than the bureaucracy of Bell Telephone, which was inventing area codes at not all that different a time. Area codes seem random, although with the design principle that more populous areas get codes with smaller sums of digits, which took less time to dial. My understanding is that similar area codes were deliberately put far apart geographically, in order to reduce confusion. I’ve never actually seen that written down, though.

Figuring out when a book was written from the names in it

My daughter likes the book Knuffle Bunny Too: A Case of Mistaken Identity, by Mo Willems. Maybe I like it more than she does; she’s old enough to pick out her own books now, and doesn’t pick this one. One day Trixie brings her bunny to school, and it turns out that another child has the same bunny and they become friends! (The children, not the bunnies.)

I’ve read this book enough time that my mind can wander while I read it. There’s a list of names embedded in it, of the other kids that Trixie wants to show the bunny to: Amy, Meg, Margot, Jane, Leela, Rebecca, Noah, Robbie, Toshi, Casey, Conny, Parker, Brian.

So… from this list of names, can we figure out when Trixie was born?

The R package babynames is really useful for this kind of question. This is a wrapper around the Social Security Administration’s baby names data, which gives the number of births of people with each name in the US, each year, for names that were given to at least five babies. It goes from the most common baby names of 1880:

> head(babynames)
A tibble: 6 x 5
year sex name n prop

1 1880 F Mary 7065 0.07238359
2 1880 F Anna 2604 0.02667896
3 1880 F Emma 2003 0.02052149
4 1880 F Elizabeth 1939 0.01986579
5 1880 F Minnie 1746 0.01788843
6 1880 F Margaret 1578 0.01616720

to the least common male baby names of 2017:

> tail(babynames)
A tibble: 6 x 5
year sex name n prop

1 2017 M Zyhier 5 2.55e-06
2 2017 M Zykai 5 2.55e-06
3 2017 M Zykeem 5 2.55e-06
4 2017 M Zylin 5 2.55e-06
5 2017 M Zylis 5 2.55e-06
6 2017 M Zyrie 5 2.55e-06

Here’s code to generate a list of “typical” names from each year ending in 0:

nms = list()
for (y in seq(1880, 2010, by = 10)){
  cat(y, ': ', babynames %>% filter(year == y) %>% 
           sample_n(size = 10, weight = prop, replace = TRUE) %>% 
         summarize(y, nms = paste0(sort(name), collapse = ', ')) %>% select(nms) %>% unlist(), collapse = '',

which gives output

1880 : Alice, Amy, Callie, Cora, Ella, George, Izora, John, Ulysses, Will
1890 : Agnes, Ben, Frank, Frank, John, Lottie, Lula, Margaret, Mildred, Onie
1900 : Alice, Charlie, Daniel, Dorothy, Elda, Joseph, Joseph, Mary, Monroe, Walter
1910 : Beatrice, Eulah, Francisco, Hoyt, James, John, Joseph, Mabel, Susie, Sylvester
1920 : Blanche, Concetta, Isom, Jane, Jean, Katherine, Mary, Mary, Presley, William
1930 : Bobbie, Charles, Herbert, James, Mack, Marilyn, Raymond, Robert, Salvatore, William
1940 : Alton, Bobbie, Bobby, Dorothy, Frank, Helen, Jerry, Joanne, John, Robert
1950 : Ann, Donald, Donna, Elizabeth, John, Joseph, Ronald, Shirley, Susan, Thomas
1960 : Colleen, Darlene, Don, Glen, Johnny, Mark, Phyllis, Ronald, Steve, Steven
1970 : Christopher, Jeffrey, Kathy, Leonard, Lorene, Paula, Raymond, Rebecca, Sally, Sarah
1980 : Adam, Christee, Christina, Curt, Garrett, Jacob, Maurice, Melissa, Ruby, Todd
1990 : Brian, Candice, Damien, Gaspar, John, Marina, Nicholas, Sarah, Vicente, Walter
2000 : Annalycia, Chrishauna, Cristian, Daniel, Devon, Elizabeth, Maggie, Payton, Sebastian, William
2010 : Ella, Joshua, Kenlee, Lois, Lucian, Makayla, Monique, Noah, Nyasia, Pete

Note that this isn’t the ten most common names in each year. Some names appear twice in some years (Joseph in 1900, Mary in 1920). Some rare names appear (there were only 200 babies named Nyasia born in 2010), but that’s to be expected in a random sample of names. But if you read through this list, at least if you’re American, you see an evolution from “old-fashioned” names to “normal” names to “names people are giving to their kids that sound totally weird”.

So this might be possible. We can plot the frequency of the names occurring in the relevant passage against time:

knuffle_bunny_names =  c("Amy", "Meg", "Margot", "Jane", "Leela", "Rebecca", "Noah", "Robbie", "Toshi", "Casey", "Conny", "Parker", "Brian")
mins = babynames %>% group_by(year) %>% summarize(minprop = min(prop))
grid = expand.grid(year = unique(babynames$year), name = knuffle_bunny_names)
props = grid %>% left_join(babynames) %>% group_by(year, name) %>% summarize(prop = sum(prop)) %>%
  left_join(mins) %>% mutate(corrected_prop = ifelse(is.na(prop), minprop * 4/5, prop))
props %>% ggplot() + geom_line(aes(x=year, y=corrected_prop, group = name, color = name)) + 
  scale_y_log10('name frequency', breaks = c(10^((-6):(-2)), 3*(10^((-6):(-2))))) + 
  scale_x_continuous('birth year') +
  ggtitle('Frequency of names appearing in Knuffle Bunny Too')

Note the use of the log scale on the y-axis; without that you just learn that everyone was naming their kids Amy and Brian in the 1970s. Names that don’t occur in the data set for a given year are assumed to occur four times, which is the line along the bottom. The Social Security program wasn’t introduced until 1937, and didn’t originally cover all workers, so data coverage is sparse for births pre-1920 or so. But we already knew Trixie isn’t that old.

The probability that 13 randomly chosen kids born in year y have those particular thirteen names is just 13! times the product of the name frequencies:

props %>% group_by(year) %>% 
  summarize(n = n(), total_prob = factorial(13) * exp(sum(log(corrected_prop)))) %>% 
  ggplot() + geom_line(aes(x=year, y=total_prob)) + 
  scale_y_log10('probability of name set', breaks = 10^((-46):(-38))) +
  ggtitle('Probability that 13 randomly chosen newborns have\nthe names of the children from Knuffle Bunny Too')

So this set of names has the largest probability of occurring in 2000, followed by 1996, 1997, and 2003. The “right answer” is 2001, according to a 2016 New York Times profile of Mo Willems, at least if Trixie is meant to be Willems’ actual child. The book was published in 2007, and in it Trixie goes to pre-K (likely a class of ages 4 or 5); the previous book in the series was published in 2004 and in it Trixie couldn’t even “speak words” yet.

Sixth notes

Sticky Notes is an excellent podcast about classical music, by the conductor Joshua Weilerstein. A recent episode was on Bruckner’s fourth symphony, a piece I wasn’t familiar with. In Bruckner’s work, there appears the “Bruckner rhythm”, which is two quarter notes followed by three quarter note triplets. And in musical terminology, that’s entirely correct!

But if the terminology made mathematical sense, that would be a “sixth note”, which somehow I had never realized. A few people have pointed this out (here, here, here). It seems like this is most natural to programmers creating software that has to deal with music. For example – and I can’t believe I remember this – QBasic had a PLAY command that played music through the computer speaker, specified note by note. For the main theme of the first movement of Bruckner’s fourth symphony, something like

PLAY "O3 E-4 < B-4 A-6 G6 F6 E-4"

would have played the example

Main theme of first movement of Bruckner’s fourth symphony

(That’s a tenor clef, the fourth line from the bottom is middle C.). The - represents flat, the O3 represents the starting octave, and the < represents an octave change. (I cannot guarantee this code actually works, because it’s 2020.)

The PLAY command mishandles the other tricky bit of notation, which is “dotted” notes. For example C4 is a quarter-note C. C4. is a dotted-quarter-note C, that is, one and a half times as long as a quarter-note. But C4.” is one and a half times as long as that, or 2.25x the original quarter-note. But a double-dotted note in ordinary musical notation is 1.75x as long. If I remember correctly I thought it was 2.25 because I found the PLAY command before I ran across double dots in any actual music.

Also, in music, the ordinary fact “2 + 2 = 4” becomes “3 + 3 = 5” – that is, two thirds make a fifth. Every so often on Music Stack Exchange one sees people saying musical notation and nomenclature don’t make sense, which is true if you’re coming at things fresh, but is also an example of how Stack Exchange sites get weird as you go conceptually further away from StackOverflow.

Another baseball inequality

John Cook posted about the “baseball inequality“. In his formulation, if you have two lists of k positive numbers each, n_1, \ldots, n_k and $\latex d_1, \ldots, d_k$, then

\min_{1 \le i \le k} {n_i \over d_i} \le {n_1 + n_2 + n_3 + \cdots + n_k \over d_1 + d_2 + d_3 + \cdots + d_k} \le \max_{1 \le 1 \le k} {n_i \over d_i}.

This has the interpretation that the batting average of a team is between the batting average of the best player and that of the worst player.

This is not what I expected from the headline “baseball inequality” and the list of numbers. What I was expecting was the following. If the two lists are in numerical order, x_1 \le x_2 \le \cdots \le x_k and y_1 \le y_2 \le \cdots \le y_k, then

x_1 y_1 + \cdots + x_k y_k \ge x_{\sigma(1)} y_1 + \cdots + x_{\sigma(k) y_k}

for any permutation \sigma of 1, 2, …, k. This is actually what’s called the rearrangement inequality. Its use in baseball is in setting the batting order. If you want to arrange the batting order so your team gets the most hits, then you want the players with the best batting averages (the highest y) to be earlier in the batting order and therefore get the most at-bats (the highest x). (One-line idea of proof: if any of the x_i are out of order, you can increase the sum by putting them in order.)

Reality is a bit more complicated, because:

  • first, the goal of baseball is of course to get runs, not hits, so you want to count walks, and extra-base hits as well; hence sorting players by OPS (on-base percentage plus slugging percentage) should be better than sorting by batting average. (OPS doesn’t make dimensional sense because you’re adding two fractions with different denominators, but let’s ignore that.)
  • second, there are interactions between the players – in order to score runs you usually need to get multiple hits in close succession. Since batting orders are cyclic, you perhaps don’t want to have your worst hitter going immediately before your best hitter, and indeed some teams have tried batting the pitcher eighth. (I’m a National League fan; don’t talk to me about designated hitters.)

These are probably problems that are best solved by simulation, and I’ve got a day job.

Would the 2016 US presidential election result be different if the Electoral College were bigger?


A quick way to see this: Trump won 306 electoral votes, Clinton 232. (There were faithless electors, but we can ignore them for the purposes of this question.). Trump won 30 states, Clinton 21. For the purposes of this post, DC is a state, which it probably also should be in reality.

Each state has as many electoral votes as it has senators and representatives combined. (DC gets 3.). Imagine breaking up the electoral votes in each state into two classes: the “senatorial” electoral votes (two per state), and the “representational” electoral votes (the other ones). Then Trump won the “senatorial” electoral votes by 60 to 42, leaving a 246-190 margin among the “representational” votes.

In 2000, on the other hand, Bush won 271 electoral votes in 30 states, and Gore 267 in 21 states; the “representational” votes were therefore split as 211 for Bush, 225 for Gore. If there were, say, twice as many Representatives (870 instead of 435 – and let’s keep the math simple and say DC gets 6 electoral votes in this scenario), and every state had twice as many electoral votes, then Bush would have had about (60 + 211 &times; 2) = 482 EV, and Gore (42 + 225 &times; 2) = 490 EV. Also in this world Nate Silver’s web site is named nineseventytwo.com (which is available, as of this writing). In fact, Bush would have won with any number of representatives less than 491; Gore with more than 655; and in between, the lead swings back and forth, according to a 2003 analysis by Michael G. Neubauer and Joel Zeitlin.

In short: 2000 was the way it was because small states are overrepresented in the Electoral College; 2016 was the way it was because the Electoral College is winner-take-all.

Balancing the centrifuge – a Riddler puzzle

From the riddler:

Quoc’s lab has a microcentrifuge, a piece of equipment that can separate components of a liquid by spinning around very rapidly. Liquid samples are pipetted into small tubes, which are then placed in one of the microcentrifuge’s 12 slots evenly spaced in a circle.

For the microcentrifuge to work properly, each tube must hold the same amount of liquid. Also, importantly, the center of mass of the samples must be at the very center of the circle — otherwise, the microcentrifuge will not be balanced and may break.

Quoc notices that there is no way to place exactly one tube in the microcentrifuge so that it will be balanced, but he can place two tubes (e.g., in slots 1 and 7).

Now Quoc needs to spin exactly seven samples. In which slots (numbered 1 through 12, as in the diagram above) should he place them so that the centrifuge will be balanced? Extra credit: Assuming the 12 slots are distinct, how many different balanced arrangements of seven samples are there?


I’ve seen this one before, but I’ll take it on as a programming challenge.

I’ll say that hole k is at coordinates (\sin 2\pi k/12, \cos 2\pi k/12). This is nonstandard, but the result looks like a clock, so it’s easier to visualize:

The centrifuge layout.

Now, there are {12 \choose 7} = 792 ways that we could pick 7 out of the 12 holes. To generate a list of these I can use the following R function:

combinations = function(n, k){
  if (k == 0 & n == 0){return(list())} else {
    if (k == 0){return(list(c()))} else {
      if (n == k){return(list(1:n))} else {
        without_n = combinations(n-1, k)
        with_n = lapply(combinations(n-1, k-1), function(x){c(x, n)})
        return(c(without_n, with_n))

To generate the subsets of [n] of size k, I just take:

  • the subsets that don’t contain n (which are therefore subsets of [n-1] of size k), and
  • the subsets that do contain n (which are therefore subsets of [n-1] of size k-1, with n adjoined)
    (I learned this a good two decades ago, from section 3.6 of Herb Wilf’s notes East Side, West Side.)

Then iterate over the subsets to find their centers of mass:

our_comb = combinations(12, 7)
com_x = rep(NA, length(our_comb))
com_y = rep(NA, length(our_comb))
for (i in 1:length(our_comb)){
  com_x[i] = mean(x[our_comb[[i]]])
  com_y[i] = mean(y[our_comb[[i]]])

So our_comb runs through the combinations, and com_x and com_y are the x- and y-coordinates of the center of mass. Plotting the centers of mass gives we get an attractive symmetric pattern. Here I’ve plotted transparent points, so the darker points are points that arise from more distinct combinations. The outermost points represent the most imbalanced centrifuges – for example the one closest to the top represents loading slots 9, 10, 11, 12, 1, 2, and 3. The main question is: how many points are on top of each other at the very center.

Centers of mass of all subsets of 7 of the 12 centrifuge positions.

I can extract those subsets from the matrix our_comb and build them into a matrix for looking at. There’s a small problem in that we used floating-point arithmetic, so the coordinates don’t come out exactly at zero, but it’s enough to ake the ones that are “close enough”:

epsilon = 10^(-6)
balanced_indices = which(abs(com_x) < epsilon & abs(com_y) < epsilon)
balanced_combs =  t(matrix(unlist(our_comb[balanced_indices]), nrow = k))

and the matrix balanced_combs, which has one row for each combination that balances the centrifuge, is

      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]    1    2    3    6    7    9   10
 [2,]    1    2    4    5    8    9   10
 [3,]    1    2    5    6    7   10   11
 [4,]    2    3    4    7    8   10   11
 [5,]    2    3    5    6    9   10   11
 [6,]    1    2    5    6    8    9   12
 [7,]    1    3    4    7    8    9   12
 [8,]    1    4    5    6    9   10   12
 [9,]    1    4    5    7    8   11   12
[10,]    2    3    6    7    8   11   12
[11,]    3    4    5    8    9   11   12
[12,]    3    4    6    7   10   11   12

(The order here is lexicographic order, but you have to read the rows backwards.)

If you can make sense of this pile of numbers without making some pictures, good for you! But I am human, though a mathematician, so let’s plot. Here big black dots represent the loaded spots, and small blue dots represent the non-loaded spots.

(And yes, it’s in base R graphics. It’s been a while since I’ve used those – at my day job it’s all ggplot all the time.)

par(mfrow = c(4, 3), mar = c(1, 1, 1, 1))
for (i in 1:nrow(balanced_combs)){
  full = balanced_combs[i,]
  empty = setdiff(1:n, balanced_combs[i,])
  plot(x[full], y[full], pch = 19,cex = 2,
       asp = 1, axes = FALSE, xlab = '', ylab = '', 
       xlim = c(-1.25, 1.25), ylim = c(-1.25, 1.25))
  points(x[empty], y[empty], pch = 19, cex = 1, col = 'lightblue')

So there are 12 ways to load the centrifuge… but we can clearly see that all of them are really just rotations of a single pattern. To me the visually most convenient representative of that pattern is the third one in the third row, which has loadings at 12, 1, 4, 5, 7, 8, and 11.

We can play a little game of connect-the-dots to see why this pattern is balanced. Connect 12, 4, and 8 to form an equilateral triangle. Connect 1 and 7; connect 5 and 11. Like this:

12-hole centrifuge loaded with 7; symmetric subsets specified

Each of those three subsets has its center of mass at the center of the circle, so the whole arrangement does too.

That decomposition turns out to be the key to the general problem of which centrifuge sizes can be loaded with which number of tubes and remain balanced. Stay tuned.

Rainbow cats considered harmful

Partner: you’re stressed! fill out this!

Me: I can’t do that, because rainbow color maps are considered harmful. This is especially true because if we’re going to use one, shouldn’t it be green for “go” and red for “stop”.

Partner: just shut up and color.

(The original version of this meme didn’t have the colors, so you can make whatever palette you want, as in this example.)


Evelyn Lamb has a delightful page-a-day calendar. Today (yes, a few days late) I learned that TWENTY NINE is the only word in English that is written with a number of straight-line strokes equal to its value. (This is in a sans serif font; in particular I is one stroke, not three.)

English is surprisingly rich in numbers that have all straight-line strokes. In the Latin alphabet the letters that are all straight lines are A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z. That leads to the following words that are all straight-lined and appear in numbers, and the number of straight lines that make them up:

– having more strokes than their value: FIVE (10), NINE (11), ELEVEN (19), TWELVE (18), FIFTEEN (20), NINETEEN (23)
– having less strokes than their value: TEN (9), TWENTY (18), FIFTY (12), NINETY (16)

Any word that equals its own value must combine elements from both these lists, and it’s not hard to see that TWENTY NINE is the only one that works.  The full list of straight-line numbers in English is: 5, 9, 10, 11, 12, 15, 19, 25, 29, 55, 59, 95, 99.  (All the larger numbers include “HUNDRED” or “THOUSAND” or something ending in “-ION”, so we can stop there.)

Evelyn suggests this as part of how to memorize the largest known prime (it’s a Mersenne prime, and she suggests doing it in binary so every bit is 1, so the hard part is remembering where you are).

It’s hard to even find straight-line numbers in other languages, because a lot of the alphabet is missing.  They include:

  • German ZWEI (2), ZEHN (10), ELF (11)
  • Dutch EEN (1), TWEE (2), ZEVEN (7), TIEN (10), ELF (11) (edited 9/3: also ZWAALF (12))
  • Norwegian has a lot: EN (1), FEM (5), ATTE (8), NI (9), TI (10), ELLEVE (11), FEMTEN (15), ATTEN (18), NITTEN (19), FEMTI (50), ATTI (80), NITTI (90) (and also 55 = FEMTIFEM, 58, 59, 85, 88, 89, 95, 98, 99)
  • As does Danish: EN, FEM, NI, TI, ELLEVE, FEMTEN, ATTEN, NITTEN, with the same meanings as in Norwegian, but then the bigger numbers are formed irregularly.
  • Swedish has less: EN (1), FEM (5), ATTA (8), ELVA (11) – the numbers are very similar to Norwegian but the “-teen” ending is “-ton”, not “-ti” like Norwegian.
  • Spanish VEINTE (20), MIL (1000), and MIL VEINTE (1020)
  • Italian VENTI (20), MILLE (1000), and MILLEVENTI (1020)
  • Portuguese VINTE (20), MIL (1000) and MIL E VINTE (1020)
  • French MILLE (1000), but not VINGT (20)

ELVA (Swedish for 11) is the only other one I could find that also has the self-referential property, and the Chinese numerals , ,  if you want to stray from the alphabetic world. (Edited 9/3: also Dutch TIEN = 10, which I inexplicably missed before.)

(This post was edited to add the list of numbers and to clarify that ELVA is not the only straight-line number outside of English, but the only one I could find with this self-referential property.)

The second derivative I compute every day

Every evening I compute an approximate second derivative. It comes from the data at the COVID Tracking Project. Specifically, it’s

(P(t) - P(t-7)) - (P(t-7) - P(t-14))

where P(t) is the number of positive coronavirus cases in the US on day t. So this is the number of new cases this week, subtracted from the number last week. Every day I hope it’s negative. Lately it has been. Yesterday it was

(5884053 – 5595343) – (5595343 – 5282758) = 288710 – 312585 = -23875.

Who would have thought 288 thousand cases in a week would seem good? But derivatives matter. That would have been terrifying on June 29 (282k cases from June 22 – 29; 194k cases from June 15 – 22). But now it feels like a reason for maybe a little bit of hope.

The most hopeless plots, I think, are the cumulative ones. The number of positive tests never goes down. Early on in all this we all wanted to “flatten the curve”… some people seemed to be referring to this curve, which could never be flat.


What we want to know is how many new cases there are. How much should we worry right now? A lot, it seems. But less than before?


And there is something heartening about seeing a negative number. If we plot the approximate second derivative, we can get negative numbers!


Here we can see, broadly, four phases: the growth of March and early April; the slow decline that lasted through mid-June; the “second wave” of late June and July; the decline of August.

But what’s not visible here is the terror of the fast growth of March, because that was growth from a low base. If we plot the growth rate

$latex {(P(t) – P(t-7)) – (P(t-7) – P(t-14)) \over P(t)

then the original terror is clear.


This was the period of doubling times, when cases were doubling multiple times a week. I’ve suppressed the growth rates over 100 percent per week – for example there were 44867 new cases in the US between March 16 and 23, compared to 5744 between March 9 and March 16, nearly three doublings in a week. Those days were, quite literally, off the charts.