2021

2021 x 1202 = 2429242, a palindrome.

That is, when you take 2021 and multiply it by its digit-reversal, you get a palindrome.

This is rare – you (if you are young) will see it again in 2101, 2102, 2111, 2201, and then not until five-digit years. It follows from the digits in 2021 being small – according to the Encyclopedia of Integer Sequences, this is a property of integers not ending in 0 with sum of squares of digits < 10.

In general we can view an integer as a polynomial – in the case of 2021, 2x^3 + 2x + 1 – evaluated at x = 10. Call this polynomial f(x). Then its coefficient-reversal is x^{deg(f(x))} f(1/x) where deg(f(x)) is the degree of the polynomial f(x). For eample, if f(x) = 2x^3 + 2x + 1 then we get the reversal x^3 (2/x^3 + 2/x + 1) = 2 + 2x^2 + x^3. Then we can show that g(x) = latex f(x) f(1/x) x^{deg(f(x))} is its own coefficient-reversal. It has degree deg g(x) = 2 deg f(x). Upon substituting 1/x for x and multiply by x^{2 deg f(x)} we get

f(1/x) f(x) (1/x)^{deg f(x)} x^{2 deg f(x)} = f(x) f(1/x) x^{deg f(x)}$

which is g(x) itself.

Now if the coefficients of g(x) are all less than 10, we can interpret this as a fact about integers. The middle coefficient of g(x) is just the sum of the squares of the coefficients of f(x) – for example,

(x^3 + 2x + 1) (x^3 + 2x^2 + 2) = 2x^6 + 4x^5 + 2x^4 + 9x^3 + 2x^2 + 4x + 2

with middle coefficient 2^2 + 2^2 + 1^2 = 9.

For the proof that the sum of the squares is the largest coefficients, wave your hands and say “Cauchy-Schwarz”, then look at Proposition 10 of On Polynomial Pairs of Integers by Martianus Frederic Ezerman, Bertrand Meyer, and Patrick Sole.

Some other interesting properties of the number 2021: it’s a product of two consecutive primes and a value of Euler’s prime-generating polynomial. These don’t contradict each other – the polynomial n^2 + n + 41 is prime when evaluated at 0, 1, 2, …, 39, and 2021 = 44^2 + 44 + 41.

Applied circle packing

Twelve circular muffins fit nicely on a circular plate.

Twelve muffins on a plate.

Yes, I know they’re not quite uniform in size. What do you want? My sous-chef is two years old. Also she was not helping but rather running around under the dining room table.

Anyway, this apparently is not the optimal packing – that is, the one that maximizes the ratio (muffin radius)/(plate radius), although it is a piece of the optimal packing in the infinite plane. You could fit slightly larger muffins if you packed them like this:

File:Disk pack12.svg
Optimal packing of twelve circles in a circle

Image from the Wikipedia article Circle packing in a circle. The proof is due to Ferenc Fodor, The Densest Packing of 12 Congruent Circles in a Circle, Beiträge zur Algebra und Geometrie, Contributions to Algebra and Geometry 41 (2000) ?, 401–409. The radius of the plate is 4.029… times the radius of the muffin. (This is {2 \over \sqrt{3} x_0} + 1 where x_0 is the smallest positive root of 9x^5 - 15x^4 + 7x^3 - 3x + 1.)

As it turns out, the packing I discovered isn’t all that far off from this constant. Let the radius of the muffin be 1, and draw triangles as below.

The center of the plate is at the center of the packing, which is the center of the red equilateral triangle. If this triangle has side 2, then the distance from its center to any of its vertices is 2/\sqrt{3}. This is the length of the shortest side of the blue triangle.

The blue triangle therefore has sides of length 2/\sqrt{3} and 2, with an angle of 150 degrees between them. The long side of the blue triangle, by the law of cosines, is given by

\sqrt{(2/\sqrt{3})^2 + 2^2 - 2 (2/\sqrt{3}) (2) \cos 5\pi/6} = \sqrt{4/3 + 4 - (8/\sqrt{3}) (-\sqrt{3}/2)} = \sqrt{4/3 + 4 + 4} = \sqrt{28/3}.

The distance from the center to the edge of the plate is then $latex\sqrt{28/3} + 1$, the length of the long side of the blue triangle plus the green line segment which is a single muffin radius. If you’re working this out in your head while watching the aforementioned sous-chef run around at the park, though, you wonder about the numerical value of this constant and think maybe you shouldn’t pull out your phone-calculator. Fortunately it’s easy to work out approximately: \sqrt{28/3} = 3 \sqrt{1 + 1/27}, and remembering \sqrt{1+x} \approx 1 + x/2 for small x this is very close to $3 (1 + 1/54) = 3 + 1/18 \approx 3.055$. So the radius of the (idealized) plate is about 4.055 times the radius of the (idealized) muffin, not all that far off from the 4.029\cdots due to Fodor.

44 candles

Hanukkah candles can be bought in sets of 44. My older daughter came home from preschool with a box, which is how I came to know this.

That number surprised me at first – but on each of the eight nights of Hanukkah you light one candle (the shamash) and then use it to light the number of candles corresponding to the night. So you need 2 + 3 + … + 9 candles; summing the series gives 11 x 8 / 2 = 44.

It does seem like they should give you a few extras in the box, though, in case something goes wrong.

Difference of cubes

My partner and I tried to have “When I’m Sixty-Four” played at our wedding, but we didn’t because I couldn’t find the sheet music.

It’s my birthday. When I’m sixty-four our second child, who will arrive in a few days, will also have an age which is a cube.

We’ll never be prime at the same time, though.

Another odd crossword clue

Actually posted on Tuesday, November 3. Not election news!

Thursday, October 29 New York Times crossword, by Kurt Weller. 36 Across: “Like all prime numbers except one”, three letters.

Answer: same as yesterday’s 34 Across no.

Of course 1 isn’t prime! But I think in mathematical discourse if you spell out the number you’re not referring to it directly. (This week’s posts notwithstanding, mathematical content doesn’t come up enough in crosswords to be sure what the conventions are there.) There is one prime number that is not odd, namely two.

Hopefully this post sees the light of day on Thursday; my cell service is spotty due to Zeta. Not the function having to do with primes, the storm. (I did say there would be a lot of storms.)

Two-thirds of all Fibonacci numbers are…

Today’s New York Times crossword puzzle (October 28, 2020), by Peter Gordon, 34 across. Three letters, “like two-thirds of all Fibonacci numbers”.

The answer is this sequence.

To get the Fibonacci numbers you start with 1 and 1, and then each number is the sum of the two before it. I’ve bolded the even numbers.

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …

So it’s not just some random smattering of these numbers that happens to be odd, but every third one. To see this, let’s build an addition table for odd and even numbers:

+evenodd
evenevenodd
oddoddeven
Addition table for even and odd numbers

Then if you start with two odd numbers, just following this gets you

odd, odd, even, odd, odd, even, …

and this will repeat itself forever. (If you started with “odd, even” or “even, odd” you’d get the same pattern, but shifted; if you start with “even, even” then the sequence stays even forever.

(The Twitter hashtag #NYTXW mostly missed this, preferring to focus on, and complain about, the fact that the puzzle was built around a too-long quote from Sex and the City.)

Note that 34 (the position of this clue in the puzzle) is a Fibonacci number. I’d like to think this was intentional.

Which party will be president if there’s a 50-50 Senate?

FiveThirtyEight currently gives Joe Biden an 88% chance of winning the US presidential election, and Donald Trump 12%.

The Economist gives Biden 92, Trump 7.

Both have Biden ahead by 8.4 percent in the popular vote. FiveThirtyEight has 53.6 to 45.2 (with 1.2 percent to third parties), while the Economist has 54.2 to 45.8 (with no third parties – I presume they’re measuring share o the two-party vote). My assumption is that the difference in these odds is due to FiveThirtyEight’s model putting a larger correlation between states than the Economist’s, and therefore giving a wider distribution around that center point.

Both sites also have a model of the Senate election. FiveThirtyEight expects the Democrats to have 51.5 seats after the election, with a 74% chance of control; the Economist expects 52.5 seats for the Democrats, with a 76% chance of control. Recall that if the Senate is tied, 50-50, then the Vice President (Kamala Harris for the Democrats, or Mike Pence for the Republicans) breaks the tie; that is, Senate control belongs to the party holding the White House. So what do the models say about that tie?

FiveThirtyEight presents the diagram below, where the 50/50 bar is split between the parties:

If you hover over the red part of the 50-50 bar you get “1.7% chance” (of a 50-50 Senate and Republican president); if you hover over the blue part you get “11.2% chance” (of a 50-50 Senate and Democratic president). That is, conditional on a 50-50 Senate, FiveThirtyEight gives a probability of 0.112/0.129, or about 87%, of a Democratic president. (This is different from the 88% figure above.)

The Economist, on the other hand, explicitly says that conditional on a 50-50 Senate, there’s an 18% chance of a Democratic presidency:

Which one of these probabilities is more realistic? Where do they come from?

The Economist writes, on their methodology page:

In presidential-election years, the model also simulates which party will control the vice-presidency, which casts the tiebreaking vote in case of a 50-50 split, based on the simulated national popular vote for the House.

FiveThirtyEight’s Senate forecast methodology page doesn’t seem to make a statement about this; they mention that the 2020 Senate model is “mostly unchanged since 2018”, and of course there was no presidential election in 2018.

My instinct is that 50-50 is a bad night for Democrats. The Democrats start with 35 seats not up re-election. Both sites agree on the 15 most Democratic-leaning Senate seats that are up for election. So let’s say that Democrats win those 15 states and no others, for a 50-50 Senate. For the sake of argument, assume that every state that has a Senate seat up for grabs chooses the same party for the Senate and the presidency. So let’s fill in an Electoral College map with those 15 states in blue, and the nineteen other states with Senate seats at stake in red, to get the map below. (15 + 19 = 34, you say? Well, Georgia has two Senate seats at stake.)


Click the map to create your own at 270toWin.com

Next let’s fill in those states that don’t have a Senate election, but are safe for one party or the other. For the Democrats, California, Washington, Hawaii, New York, Maryland, Vermont, Connecticut (and DC). For the Republicans, Utah, North Dakota, Missouri, and Indiana. (I’m old enough to remember when Missouri was a swing state.) Here’s the map you get.


Click the map to create your own at 270toWin.com

So in a world where the Senate is 50-50 in what is probably the most likely way, it looks like the Democrats are right on the cusp of winning the presidency – FiveThirtyEight is probably right after all, to color the 50-50 bar mostly blue. I just hope we don’t get that 269-269 map, partially because it’ll be exhausting and partially because then I should have written a post on how a tied Electoral College gets thrown to the house instead of writing this one.

Zip codes divide the country

From r/dataisbeautiful by alekazam13, a chart of how many people live in regions where the zip codes start with each digit. The distribution is surprisingly uneven.

To give some context: zip codes in the US (what other countries would call “postal codes”) have five digits. The first digit corresponds to one of ten regions of the US (these regions don’t exist anywhere outside of the zip code system), the next two to a postal service sorting center, and the last two to an individual post office. There are something like 40,000 zip codes, and of course 100,000 possible ten-digit numbers. Here’s a map of the regions, public domain from Wikipedia:

Of course, ZIP codes were invented some time ago; they were introduced in 1963! So what if we use 1960 census figures? There was some imbalance, but perhaps less than there is today.

first digit9371248605
population (millions), 201953.546.840.633.232.932.723.823.723.717.3
population (millions), 196021.217.917.028.516.625.26.218.016.612.1
Distribution of population by first digit of zip code, 2019 and 1960

The population of the US right now is about 332 million; so in an ideal world you’d have 33.2 million in each bucket. If you want to divide the US into ten regions, all made up of adjacent states (we’ll make some exceptions for Alaska, Hawaii, and Puerto Rico), and it’s 1963, you run into a problem pretty quickly. The population of the US at that time was 179 million. New England (that’s the six states in yellow to the far northeast; note that New York and New Jersey are not part of New England) had 10.4 million people, not enough to plausibly call it a tenth of the country. New York (labeled “10-14” above) had 16.8 million, nearly a tenth all on its own. The only sensible decision was to skip over New York and add New Jersey. (Why New York didn’t get its own first digit but has to share with Pennsylvania, I don’t know.). The “5” and especially “8” regions had very low populations back then; these were quite rural parts of the country and I assume had more post offices per capita. But since then we figured out how to get people to live in the desert.

I suspect if the system were invented today, then, the regions would look a bit different. In particular:

  • California would get its own region (at 39 million, it’s fully 12% of the US population)
  • Texas (29 million) would get a region nearly to itself, sharing with Oklahoma (4 million)
  • Georgia plus Florida (11 + 21) would be a region – these are two states that have grown quite a bit since 1960.
  • The six New England states (15) plus New York (19) would be a region

As is usual with these sorts of things, you nibble around the edges and then there end up being lots of ways to divide the middle of the country, none of which are any good. (I tried.). The rough design criteria seem to be:

  • divide the country into ten sets of states of roughly equal population;
  • such that each region is contiguous (probably Alaska and Hawaii should be in the same regions as Washington and California?, respectively, and Puerto Rico with somewhere on the East Coast);
  • and such that the regions don’t “look funny”, but what does that even mean?

In other words, the criteria for forming congressional districts or similar. (Without gerrymandering.)

Ultimately with computerized sorting having a system where zip codes are “interpretable” doesn’t really matter. And the bureaucracy of the Postal Service came up with a different solution than the bureaucracy of Bell Telephone, which was inventing area codes at not all that different a time. Area codes seem random, although with the design principle that more populous areas get codes with smaller sums of digits, which took less time to dial. My understanding is that similar area codes were deliberately put far apart geographically, in order to reduce confusion. I’ve never actually seen that written down, though.

Figuring out when a book was written from the names in it

My daughter likes the book Knuffle Bunny Too: A Case of Mistaken Identity, by Mo Willems. Maybe I like it more than she does; she’s old enough to pick out her own books now, and doesn’t pick this one. One day Trixie brings her bunny to school, and it turns out that another child has the same bunny and they become friends! (The children, not the bunnies.)

I’ve read this book enough time that my mind can wander while I read it. There’s a list of names embedded in it, of the other kids that Trixie wants to show the bunny to: Amy, Meg, Margot, Jane, Leela, Rebecca, Noah, Robbie, Toshi, Casey, Conny, Parker, Brian.

So… from this list of names, can we figure out when Trixie was born?

The R package babynames is really useful for this kind of question. This is a wrapper around the Social Security Administration’s baby names data, which gives the number of births of people with each name in the US, each year, for names that were given to at least five babies. It goes from the most common baby names of 1880:

> head(babynames)
A tibble: 6 x 5
year sex name n prop

1 1880 F Mary 7065 0.07238359
2 1880 F Anna 2604 0.02667896
3 1880 F Emma 2003 0.02052149
4 1880 F Elizabeth 1939 0.01986579
5 1880 F Minnie 1746 0.01788843
6 1880 F Margaret 1578 0.01616720

to the least common male baby names of 2017:

> tail(babynames)
A tibble: 6 x 5
year sex name n prop

1 2017 M Zyhier 5 2.55e-06
2 2017 M Zykai 5 2.55e-06
3 2017 M Zykeem 5 2.55e-06
4 2017 M Zylin 5 2.55e-06
5 2017 M Zylis 5 2.55e-06
6 2017 M Zyrie 5 2.55e-06

Here’s code to generate a list of “typical” names from each year ending in 0:

set.seed(1)
nms = list()
for (y in seq(1880, 2010, by = 10)){
  cat(y, ': ', babynames %>% filter(year == y) %>% 
           sample_n(size = 10, weight = prop, replace = TRUE) %>% 
         summarize(y, nms = paste0(sort(name), collapse = ', ')) %>% select(nms) %>% unlist(), collapse = '',
  '\n')
}

which gives output

1880 : Alice, Amy, Callie, Cora, Ella, George, Izora, John, Ulysses, Will
1890 : Agnes, Ben, Frank, Frank, John, Lottie, Lula, Margaret, Mildred, Onie
1900 : Alice, Charlie, Daniel, Dorothy, Elda, Joseph, Joseph, Mary, Monroe, Walter
1910 : Beatrice, Eulah, Francisco, Hoyt, James, John, Joseph, Mabel, Susie, Sylvester
1920 : Blanche, Concetta, Isom, Jane, Jean, Katherine, Mary, Mary, Presley, William
1930 : Bobbie, Charles, Herbert, James, Mack, Marilyn, Raymond, Robert, Salvatore, William
1940 : Alton, Bobbie, Bobby, Dorothy, Frank, Helen, Jerry, Joanne, John, Robert
1950 : Ann, Donald, Donna, Elizabeth, John, Joseph, Ronald, Shirley, Susan, Thomas
1960 : Colleen, Darlene, Don, Glen, Johnny, Mark, Phyllis, Ronald, Steve, Steven
1970 : Christopher, Jeffrey, Kathy, Leonard, Lorene, Paula, Raymond, Rebecca, Sally, Sarah
1980 : Adam, Christee, Christina, Curt, Garrett, Jacob, Maurice, Melissa, Ruby, Todd
1990 : Brian, Candice, Damien, Gaspar, John, Marina, Nicholas, Sarah, Vicente, Walter
2000 : Annalycia, Chrishauna, Cristian, Daniel, Devon, Elizabeth, Maggie, Payton, Sebastian, William
2010 : Ella, Joshua, Kenlee, Lois, Lucian, Makayla, Monique, Noah, Nyasia, Pete

Note that this isn’t the ten most common names in each year. Some names appear twice in some years (Joseph in 1900, Mary in 1920). Some rare names appear (there were only 200 babies named Nyasia born in 2010), but that’s to be expected in a random sample of names. But if you read through this list, at least if you’re American, you see an evolution from “old-fashioned” names to “normal” names to “names people are giving to their kids that sound totally weird”.

So this might be possible. We can plot the frequency of the names occurring in the relevant passage against time:

knuffle_bunny_names =  c("Amy", "Meg", "Margot", "Jane", "Leela", "Rebecca", "Noah", "Robbie", "Toshi", "Casey", "Conny", "Parker", "Brian")
mins = babynames %>% group_by(year) %>% summarize(minprop = min(prop))
grid = expand.grid(year = unique(babynames$year), name = knuffle_bunny_names)
props = grid %>% left_join(babynames) %>% group_by(year, name) %>% summarize(prop = sum(prop)) %>%
  left_join(mins) %>% mutate(corrected_prop = ifelse(is.na(prop), minprop * 4/5, prop))
props %>% ggplot() + geom_line(aes(x=year, y=corrected_prop, group = name, color = name)) + 
  scale_y_log10('name frequency', breaks = c(10^((-6):(-2)), 3*(10^((-6):(-2))))) + 
  scale_x_continuous('birth year') +
  ggtitle('Frequency of names appearing in Knuffle Bunny Too')

Note the use of the log scale on the y-axis; without that you just learn that everyone was naming their kids Amy and Brian in the 1970s. Names that don’t occur in the data set for a given year are assumed to occur four times, which is the line along the bottom. The Social Security program wasn’t introduced until 1937, and didn’t originally cover all workers, so data coverage is sparse for births pre-1920 or so. But we already knew Trixie isn’t that old.

The probability that 13 randomly chosen kids born in year y have those particular thirteen names is just 13! times the product of the name frequencies:

props %>% group_by(year) %>% 
  summarize(n = n(), total_prob = factorial(13) * exp(sum(log(corrected_prop)))) %>% 
  ggplot() + geom_line(aes(x=year, y=total_prob)) + 
  scale_y_log10('probability of name set', breaks = 10^((-46):(-38))) +
  ggtitle('Probability that 13 randomly chosen newborns have\nthe names of the children from Knuffle Bunny Too')

So this set of names has the largest probability of occurring in 2000, followed by 1996, 1997, and 2003. The “right answer” is 2001, according to a 2016 New York Times profile of Mo Willems, at least if Trixie is meant to be Willems’ actual child. The book was published in 2007, and in it Trixie goes to pre-K (likely a class of ages 4 or 5); the previous book in the series was published in 2004 and in it Trixie couldn’t even “speak words” yet.