# Spelling and prime factorization

Ben Zimmer writes a column for the New York Times, “On Language”. His June 25, 2010 column was entitled Ghoti. It’s not about beards. That’s not a misspelling of “goatee”. Rather, it’s a misspelling of “fish” (the “gh” of “enough”, the “o” of “women”, and the “ti” of “action”) that’s traditionally attributed to George Bernard Shaw.

In this column we learn about the absurd respellings that Alexander Ellis, a mid-ninteenth-century spelling reformer, came up with. And he did some calculations. He thought “scissors” should be spelled “sizerz” (okay, that’s not bad, although how would you spell “sizers”, as in “people who size”?), but at least it’s not spelled “schiesourrhce” (“combining parts of SCHism, sIEve, aS, honOUr, myRRH and sacrifiCE.”).

And Ellis gave three different numbers for the number of possible spellings of “scissors”: 1745226, 58366440, and 81997920. In the interest of trying to guess where these came from, the first thing that comes to mind is finding the prime factorizations. Why? Well, say someone told us “there are twelve ways to spell cat“. We’d logically think that they’d come up with, say, three ways to spell the first sound of that word (say, “c”, “k”, and “ck”) , three ways to spell the second sound (“a” and “ah”), and two ways to spell the third sound (“t” and “tt”), for a total of $3 \times 2 \times 2 = 12$ spellings:

cat, catt, caht, cahtt, kat, katt, kaht, kahtt, ckat, ckatt, ckaht, ckahtt

Of course English doesn’t work that way — you can spell the first sound of “cat” as “ck” but not at the beginning of a word! Zimmer tells us that Ellis acknowledged this. But if you assume the calculation was done this way, then twelve is an easy number to get. But eleven and thirteen are less likely, being primes. The numbers obtained in this way should be products of relatively small numbers, and therefore shouldn’t have large prime factors. And indeed we get

$1745226 = 2 \times 3^8 \times 7 \times 19, 58366440 = 2^3 \times 3^3 \times 5 \times 11 \times 17^3, 81997920 = 2^5 \times 3^6 \times 5 \times 19 \times 37$

and these could conceivably be products of six relatively small numbers. For example:

$1745226 = 9 \times 193914 = 9 \times 9 \times 21546 = 9 \times 9 \times 14 \times 1539$
$= 9 \times 9 \times 14 \times 9 \times 171 = 9 \times 9 \times 14 \times 9 \times 9 \times 19$

$58366440 = 20 \times 2918322 = 20 \times 18 \times 162129 = 20 \times 18 \times 17 \times 9537$
$= 20 \times 18 \times 17 \times 17 \times 561 = 20 \times 18 \times 17 \times 17 \times 17 \times 33$

$1997920 = 20 \times 4099896 = 20 \times 19 \times 215784 = 20 \times 19 \times 24 \times 8991$
$= 20 \times 19 \times 24 \times 27 \times 333 = 20 \times 19 \times 24 \times 27 \times 9 \times 37$

Where did I get these from? Let’s consider how I went from $20 \times 18 \times 162129$ to $20 \times 18 \times 17 \times 9537$ in my decomposition of 58366440. I’ve already written $58366440 = 20 \times 18 \times 162129$. I know I’m going to have to write $162129$ as a product of four numbers, so they’re going to be near $162129^(1/4) = 20.07$. It turns out that $162129/17$ is an integer, namely $9537$, and no factor of $162129$ is closer to its fourth root than 17 is. (That is, 18, 19, 20, 21, 22, and 23 are not factors of 162129.) This is a greedy algorithm, and these aren’t optimal decompositions in the sense of having the smallest sum. For example in the last one I could replace 24 and 9, which multiply to 216, with 18 and 12 which have the same product but a smaller sum. But there’s no reason to expect that Ellis’ products had this property anyway; some sounds can be spelled in more way than others. In particular the last one of these is unlikely to be what Ellis came up with, because the word “scissors” has two of the same sound — so I’d expect two of the factors to be the same. But what do you want from a greedy algorithm?

By the way, it’s not terribly hard to write down rules for going from spelling to pronunciation that work reasonably well. It seems like the same should be true of the reverse.

I’m looking for a job! See my linkedin profile.