Spelling and prime factorization

Ben Zimmer writes a column for the New York Times, “On Language”. His June 25, 2010 column was entitled Ghoti. It’s not about beards. That’s not a misspelling of “goatee”. Rather, it’s a misspelling of “fish” (the “gh” of “enough”, the “o” of “women”, and the “ti” of “action”) that’s traditionally attributed to George Bernard Shaw.

In this column we learn about the absurd respellings that Alexander Ellis, a mid-ninteenth-century spelling reformer, came up with. And he did some calculations. He thought “scissors” should be spelled “sizerz” (okay, that’s not bad, although how would you spell “sizers”, as in “people who size”?), but at least it’s not spelled “schiesourrhce” (“combining parts of SCHism, sIEve, aS, honOUr, myRRH and sacrifiCE.”).

And Ellis gave three different numbers for the number of possible spellings of “scissors”: 1745226, 58366440, and 81997920. In the interest of trying to guess where these came from, the first thing that comes to mind is finding the prime factorizations. Why? Well, say someone told us “there are twelve ways to spell cat“. We’d logically think that they’d come up with, say, three ways to spell the first sound of that word (say, “c”, “k”, and “ck”) , three ways to spell the second sound (“a” and “ah”), and two ways to spell the third sound (“t” and “tt”), for a total of 3 \times 2 \times 2 = 12 spellings:

cat, catt, caht, cahtt, kat, katt, kaht, kahtt, ckat, ckatt, ckaht, ckahtt

Of course English doesn’t work that way — you can spell the first sound of “cat” as “ck” but not at the beginning of a word! Zimmer tells us that Ellis acknowledged this. But if you assume the calculation was done this way, then twelve is an easy number to get. But eleven and thirteen are less likely, being primes. The numbers obtained in this way should be products of relatively small numbers, and therefore shouldn’t have large prime factors. And indeed we get

1745226 = 2 \times 3^8 \times 7 \times 19, 58366440 = 2^3 \times 3^3 \times 5 \times 11 \times 17^3, 81997920 = 2^5 \times 3^6 \times 5 \times 19 \times 37

and these could conceivably be products of six relatively small numbers. For example:

1745226 = 9 \times 193914 = 9 \times 9 \times 21546 = 9 \times 9 \times 14 \times 1539
= 9 \times 9 \times 14 \times 9 \times 171 = 9 \times 9 \times 14 \times 9 \times 9 \times 19

58366440 = 20 \times 2918322 = 20 \times 18 \times 162129 = 20 \times 18 \times 17 \times 9537
= 20 \times 18 \times 17 \times 17 \times 561 = 20 \times 18 \times 17 \times 17 \times 17 \times 33

1997920 = 20 \times 4099896 = 20 \times 19 \times 215784 = 20 \times 19 \times 24 \times 8991
= 20 \times 19 \times 24 \times 27 \times 333 = 20 \times 19 \times 24 \times 27 \times 9 \times 37

Where did I get these from? Let’s consider how I went from 20 \times 18 \times 162129 to 20 \times 18 \times 17 \times 9537 in my decomposition of 58366440. I’ve already written 58366440 = 20 \times 18 \times 162129. I know I’m going to have to write 162129 as a product of four numbers, so they’re going to be near 162129^(1/4) = 20.07. It turns out that 162129/17 is an integer, namely 9537, and no factor of 162129 is closer to its fourth root than 17 is. (That is, 18, 19, 20, 21, 22, and 23 are not factors of 162129.) This is a greedy algorithm, and these aren’t optimal decompositions in the sense of having the smallest sum. For example in the last one I could replace 24 and 9, which multiply to 216, with 18 and 12 which have the same product but a smaller sum. But there’s no reason to expect that Ellis’ products had this property anyway; some sounds can be spelled in more way than others. In particular the last one of these is unlikely to be what Ellis came up with, because the word “scissors” has two of the same sound — so I’d expect two of the factors to be the same. But what do you want from a greedy algorithm?

By the way, it’s not terribly hard to write down rules for going from spelling to pronunciation that work reasonably well. It seems like the same should be true of the reverse.

I’m looking for a job! See my linkedin profile.


3 thoughts on “Spelling and prime factorization

  1. Long ago, I set up a linked in profile. I didn’t find Linked in useful, and attempted to delete it. It didn’t get deleted, but I cannot get back on. (And so many people have asked me to, that I do wish I could.) Anyway, I don’t know of any jobs right now, but I would have glanced at your profile – and I can’t.

  2. Thanks anyway, Sue. (I’ve actually heard a lot of complaints about linkedin today, because there was a hacking incident where a lot of people’s passwords were stolen.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s