So it was pointed out to me yesterday that my name (“Michael Lugo”) has each of the five vowels (a, e, i, o, u) exactly once. Such a word is called supervocalic.
What are the chances of that?
A quick calculation: assume that letters have the frequencies given in the Wikipedia article for English. In particular the frequencies of A, E, I, O, and U are 8.167%, 12.702%, 6.966%, 7.507%, 2.758%. The frequency of all other letters combined is 61.900%; call this q. Call the product of these c; the value of c is about .
Now assume that names are created by picking letters independently at random. To construct a string in which each vowel appears exactly once, we must:
- decide where the vowels will appear. We can do this in ways; here order matters.
- put A, E, I, O, U in those five pre-chosen positions, and consonants in the others; the probability of this is .
So a string of length n has probability of having each vowel occur exactly once. For I give those probabilities in the following table:
For example, a name of length 11 (like mine) has probability about 0.0047 of containing each vowel exactly once. The string length which is most likely to be supervocalic is 13; that makes sense, as a typical string is thirty-eight percent vowels, and five is about thirty-eight percent of thirteen. It’s hard to go much further with this, though, because I don’t have the distribution of lengths of names. But whatever the distribution of name lengths, the proportion of supervocalic names is bounded above by one in two hundred. Special, but not that special. (My instinct is that supervocalic names are probably a bit more likely than this, because the distribution of the number of vowels in a name of length is probably more tightly concentrated than a binomial.)
Ken Jennings has a list of sets that contain exactly one such word, many of which contain less than a couple hundred elements, but it’s hard to say what that means in this context. For more words with this property, see the message boards; in particular there are some nine-letter examples. Getting much shorter than that seems to interfere with euphony, which my model doesn’t take into account. There have been 250 major league baseball players who have each vowel at least once; many have each vowel exactly once. Many of them are named Charlie; few, it seems, are named Michael, because your typical baseball player is more likely to go by Mike.