My name is supervocalic

So it was pointed out to me yesterday that my name (“Michael Lugo”) has each of the five vowels (a, e, i, o, u) exactly once. Such a word is called supervocalic.
What are the chances of that?

A quick calculation: assume that letters have the frequencies given in the Wikipedia article for English. In particular the frequencies of A, E, I, O, and U are 8.167%, 12.702%, 6.966%, 7.507%, 2.758%. The frequency of all other letters combined is 61.900%; call this q. Call the product of these c; the value of c is about $1.50 \times 10^{-6}$ .

Now assume that names are created by picking letters independently at random. To construct a string in which each vowel appears exactly once, we must:

decide where the vowels will appear. We can do this in $n(n-1)(n-2)(n-3)(n-4) = n!/(n-5)!$ ways; here order matters.
put A, E, I, O, U in those five pre-chosen positions, and consonants in the others; the probability of this is $C q^{n-5}$ .

So a string of length n has probability ${n! \over (n-5)!} C q^{n-5}$ of having each vowel occur exactly once. For $n = 6, 7, \ldots, 20$ I give those probabilities in the following table:

6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
0.0007	0.0014	0.0024	0.0033	0.0041	0.0047	0.0050	0.0050	0.0048	0.0045	0.0040	0.0035	0.0030	0.0025	0.0021

For example, a name of length 11 (like mine) has probability about 0.0047 of containing each vowel exactly once. The string length which is most likely to be supervocalic is 13; that makes sense, as a typical string is thirty-eight percent vowels, and five is about thirty-eight percent of thirteen. It’s hard to go much further with this, though, because I don’t have the distribution of lengths of names. But whatever the distribution of name lengths, the proportion of supervocalic names is bounded above by one in two hundred. Special, but not that special. (My instinct is that supervocalic names are probably a bit more likely than this, because the distribution of the number of vowels in a name of length $n$ is probably more tightly concentrated than a binomial.)

Ken Jennings has a list of sets that contain exactly one such word, many of which contain less than a couple hundred elements, but it’s hard to say what that means in this context. For more words with this property, see the message boards; in particular there are some nine-letter examples. Getting much shorter than that seems to interfere with euphony, which my model doesn’t take into account. There have been 250 major league baseball players who have each vowel at least once; many have each vowel exactly once. Many of them are named Charlie; few, it seems, are named Michael, because your typical baseball player is more likely to go by Mike.

Happy Pi Day!

March 14 is Pi Day, in recognition of the fact that the first three digits of the decimal approximation of π are 3.14.

My alma mater, MIT, will make admissions decisions available at 6:28 pm (US Eastern time). Click the link, there are funny pictures.

They used to do it at 1:59 pm (so the date and time formed 3/14 1:59) but the move to 6:28 means that they can satisfy the tauists.

On the probability of coincidences (but not really)

I am teaching statistics at Berkeley, where we use the Freedman, Pisani, and Purves text for many of the introductory courses. Since the text is titled Statistics, like a lot of other books, it’s referred to by the names of its authors. But “Freedman, Pisani, and Purves” is too long, so generally we call it “FPP”.

I’ve heard some people who have been around the department for a while call it “Freedman”, and indeed it does have some stylistic similarities with Freedman’s other books, such as Statistical Models: Theory and Practice. And since we’re talking about coincidences, I believe Pisani is not the CNBC correspondent of the same name, or the financial trainer, or any of various others with the same name. (Pisani’s reminiscences upon the death of Blackwell tell us that he was a lecturer at Berkeley for a long time, which seems like something these people would mention in their bios; he also got his PhD in 1970, and these two Pisanis seem younger.) While I’m on the subject, this is a wonderful, well-written book for introductory statistics classes, although sometimes it suffers from using language too well… my sense is that Kids These Days¹ aren’t used to reading, and some are more comfortable with cookbook-style textbooks that will just tell them what to do. I try to break this down in my classes, but as anyone who’s taught (or learned!) knows, that’s not easy.

I also participate on the “community weblog” metafilter. There FPP refers to a “front page post”, i. e. a post made to the “front page” of the site, usually consisting of links to one or more related web sites. It would be hopelessly meta to make a front page post about Freedman, Pisani, and Purves, but I don’t have it in me. An interesting question: how many three-letter acronyms should we expect exist in my life, given that there’s a collision between two of them? By analogy with the birthday problem, the number must be at least on the order of the square root of the number of possible such acronyms, i. e. a hundred or so — surprisingly small.

The text doesn’t abbreviate “simple random sample” to SRS, but I’m tempted to sometimes in my lecture notes. But of course that conflicts with the abbreviation for “stratified random sample”. Or for “sexual reassignment surgery”.

(On titles of books: there are at least three books in my library entitled Algebra, namely those by Artin, Hungerford, and Lang. This barely counts as a coincidence, because that is the most obvious title for an algebra book.)

1. I use “Kids These Days” ironically, being still in my twenties.

Taalman and Rosenhouse on Sudoku

Jason Rosenhouse and Laura Taalman have written an excellent book, Taking Sudoku Seriously. Despite the title, this is not primarily a puzzle book; rather it uses Sudoku, and the idea that mathematicians are essentially puzzle solvers, as a gateway into explaining to the general reader what it is that mathematicians do. The book actually contains very few standard Sudoku puzzles but a large number of variations on the standard theme. This is more interesting than standard Sudoku, I find; once you learn how to do standard Sudokus you essentially are just executing an algorithm mindlessly.

(Don’t borrow this book from a library, as I did, if you intend to do the puzzles, unless you’re okay with hogging a copy machine for a while; also, even then, some puzzles in this book involve color in a meaningful way.)

Some Sudoku-related facts from this book that I found interesting:

How many filled-in Sudoku squares are there? The authors begin by carefully working through counting “Shidoku” puzzles; this is like Sudoku, but 4-by-4 instead of 9-by-9. The method is essentially to fill in the upper left 2-by-2 square; then finish the first row and first column. If we start with

1	2	3	4
3	4
2
4

then there are three ways to complete this board. Each of these represents 96 possible Shidokus, obtained by switching the third and fourth columns, or not; switching the third and fourth rows, or not; and permuting the numbers 1, 2, 3, 4. So there are 288 possible Shidokus. But many of these are equivalent; it turns out that there are only two essentially different Shidoku puzzles:

1	2	3	4
3	4	1	2
2	1	4	3
4	3	2	1

and

1	2	3	4
3	4	1	2
2	3	4	1
4	1	2	3

and all others can be reached from these by various symmetries, including relabeling the digits, permutation of certain rows, columns, or blocks, and rotations or reflections. These symmetries form the Shidoku group; unfortunately the orbits of its action on the set of Shidokus are of different sizes! (There are 96 Shidoku puzzles equivalent to the first one above, and 192 equivalent to the second, for a total of 288.

Determining the number of Sudoku puzzles is harder. There are 948,109,639,680 ways to fill in the first three rows of a Sudoku, satisfying the Sudoku constraints. (See the Wikipedia article on the mathematics of Sudoku.) This actually gives a nice heuristic argument for the total number of ways to fill in a Sudoku. There are $(9!)^3 \approx 4.78 \times 10^{16}$ ways to fill in the first three rows of a Sudoku if we just force each row to have nine different numbers and ignore the constraint on the blocks. So if we fill in the first three rows by filling each row with a permutation chosen uniformly at random, then the probability of satisfying the block constraint is $948109639680/(9!)^3 \approx 1.98 \times 10^{-5}$ ; call this number $p$ . Now imagine filling all nine rows of a Sudoku with permutations chosen uniformly at random; there are $(9!)^9$ ways to do this. But now the probability that the top three blocks satisfy the block constraint is $p$ ; the same is true for each of the three rows of blocks and each of the three columns. If we assume these constraints are independent, which they’re not, then we estimate that the total number of 9-by-9 squares with each row containing nine different numbers, and each set of three blocks forming a row or column being valid in Sudoku, is $(9!)^9 p^6$. But these are just filled Sudokus, and that number is about $6.6 \times 10^{21}$. Perhaps coincidentally, but perhaps not, the actual number of Sudokus is about $6.7 \times 10^{21}$, as shown by Felgenhauer and Jarvis ; the heuristic is due to Kevin Kilfoil.

How are sudoku puzzles constructed? One way to do it is to set a “mask” — a set of squares within the puzzle which will be filled in — and then populate randomly and count the number of solutions. Then make random changes to the numbers filling in this mask, generally but not always reducing the number of solutions, (The authors state that something like this was used to form the puzzles in the book by Riley and Taalman, No-Frills Sudoku.)

Sudoku puzzles can be encoded in terms of graph colorings. Each square in Sudoku becomes a node, and we draw an edge between two vertices that can’t be filled with the same number; then the question is to extend a partial coloring of the Sudoku graph to a full coloring.

But they can also be encoded in terms of polynomials For these purposes we’ll play Sudoku with the numbers -2, -1, 1, 2, 3, 4, 5, 6, 7. Let each cell of the Sudoku be represented by a variable. Then for each cell we have an equation $(w+2)(w+1)(w-1)(w-2)(w-3)(w-4)(w-5)(w-6)(w-7) = 0encoding the fact that it contains one of these nine numbers, and for each region of nine cells (row, column, or block) we have two equations$ latex x_1 + x_2 + \cdots + x_9 = 25, x_1 x_2 \cdots x_9 = 10080$

encoding the fact that each of the nine numbers is contained in the region exactly once. (Why not use the usual numbers, 1 through 9? Because there is another set of nine integers that has sum 45 and product 362880; can you find it?)

I’ll close with an extended quote that I think bears repeating (pp. 115-116):

Your authors have been teaching mathematics for quite some time now, and it has been our persistent experience that this lesson [that hard puzzles are more interesting to solve than easy puzzles], obvious when pondering Sudoku puzzles, seems to elude students struggling for the first time with calculus or linear algebra. When presented with a problem requiring only a mechanical, algorithmic solution, the average student will dutifully carry out the necessary steps with grim determination. Present instead a problem requiring an element of imagination or one where it is unclear how to proceed, and you must brace yourself for the inevitable wailing and rending of garments.
This reaction is not hard to understand. In a classroom there are unpleasant things like grades and examinations to consider, not to mention the annoying realities of courses taken only to fulfill a requirement. Students have more on their minds than the sheer joy of problem solving. They welcome the mechanical problems precisely because it is clear what is expected of them. A period of frustrated confusion can be amusing when working on a puzzle, because there is no price to be paid for failing to solve it. The same experience in a math class carries with it fears of poor grades.

Finally, if you don’t have time to spare to read the whole book, look at Taalman’s article Taking Sudoku Seriously. If you like your math in podcast form, listen to Sol Lederman’s interview with Rosenhouse and Taalman. If you could care less about math but want books of Sudoku: No Frills Sudoku (standard Sudoku, but starting with an astonishingly low 18 clues each), Naked Sudoku (Sudoku variants with no starting numbers), Color Sudoku (more graphically interesting).

Weekly links for March 11

Why 24/192 music downloads make no sense (sampling theory).

Numbers API, an API for interesting facts about numbers.

“Numeracy crisis” threatens first world economies

Learning by doing. “Imagine the impact on the arts if we required every aspiring instrumentalist to complete 12 years of theory and careful study of the masters before being allowed to pick up an instrument and play.” The article is saying that science education has this effect, and describes how to bring undergraduate students into the lab. What about in mathematics?

Cinderella plots for March Madness, and March Mathness from Princeton. (In case you’re wondering: I went to MIT for undergrad and Penn for grad school, so it pains me to say this, but I’m rooting for Harvard.)

Bayes’ Theorem illustrated.

The College Mathematics Journal has an entire issue dedicated to Martin Gardner. This appears to be freely available online.

The Berlin Numeracy Test and a paper about its development. I’m proud to report that “Technically, relative to the general population, [I am] among the most statistically literate in the world.”

Risk: what’s your perspective?, a guide for healthcare professionals.

Cross Validated has a list of the most interesting statistical paradoxes.

Frank Morgan, Can math survive without the bees?, on the ancient conjecture that hexagonal (honeycomb) tiling is the most efficient way to partition the plane.

Cathy O’Neil asks what a group of quants working for the people (as opposed to for big corporations) would do.

There’s more to weight loss than counting calories.

From McSweeney’s: Hipster Logic Problems and Mathematical Translations of Popular Refrains.

Mathematical origami exhibit at UC Santa Cruz, opening April 8.

A card trick from Diaconis and Graham

I previously wrote about some mathematics of the I Ching that I got from Diaconis and Graham’s book Magical Mathematics.

Diaconis and Graham write (p. 184) that “We have occasionally taught a class on mathematics and magic tricks.” (An aside: this book has the annoying habit of using the mathematical “we” in situations where it’s clear that one author or the other is meant. So I was not sure whether Diaconis and Graham taught this class jointly. But Graham’s vita indicates that he has never been at Harvard, while Diaconis has, so I assume “we” refers to Diaconis.)

Anyway, there’s a well-known trick named Miraskill. The original version of the trick is something like this YouTube video. Shuffle a deck of cards and turn over two at a time. Each red-black pair is put in the center; each red-red pair to the left; each black-black pair to the left. Then the number of red-red and black-black pairs are the same.

The explanation is simple — the number of red cards and the number of black cards in the deck are the same, so the number after removing the mixed pairs is the same as well.

Diaconis and Graham tell how Joe Fendel, at the time a Harvard undergrad, suggested the following variation. Start with a deck of 27 cards, labeled with the twenty-seven possible triplets of rock, paper, scissors. Shuffle the deck and remove one card, and set it aside.

Then go through the deck two cards at a time, looking at the first symbol on each card. Each time the two symbols are the same, discard. Each time the two symbols are not the same, the losing symbol (under the usual rock-paper-scissors rules) scores a (negative) point. After running through 26 cards (13 pairs), two of the three symbols will have scores with the same parity (even or odd). The one of these that defeats the other at rock-paper-scissors is the first symbol on the card set aside. Run through the deck again looking at the second symbols, and then the third symbols.

For example, say the twenty-six cards, in order, have first symbols

RR PS PP RP SP RP SS RP RS PS SP RS SR.

Then mark the loser of each pair in bold (for example, in “PS”, scissors beats paper) to get

RR PS PP RP SP RP SS RP RS PS SP RS SR.

There are 4 P’s, 3 R’s, 3 S’s. The two 3s have the same parity. Rock beats scissors, so the first symbol on the missing card must be rock. And indeed if you count the symbols I gave, there are 8 rocks, 9 papers, and 9 scissorses, so the missing symbol is rock.

I’m not convinced it makes a good trick — there’s too much bookkeeping — but figuring out how it works is a nice little exercise.

Say, for example, the missing symbol is rock. Then there will be 8 rocks, 9 papers, and 9 scissorses among the 26 symbols that we see. After we remove the tied pairs, there will be an even number of rocks, and an odd number each of papers and scissorses.

Now let r be the number of rock-paper pairs that appear; let p be the number of paper-scissors pairs; let s be the number of scissors-rock pairs. (In each case the letter is chosen to correspond to the losing symbol, which gets the point.) Then since the total number of rocks is even, r + s is even; similarly r + p and s + p are odd.

Since r + s is even, r and s have the same parity; p has the opposite parity. (Formally, we’re solving the system $r+s = 0, r+p = 1, s+p = 1$ over $\mathbb{Z}_2$ , and noting that it has exactly two solutions.) Finally, rock beats scissors. If the missing symbol is scissors or paper we just permute the symbols.

I’m still not convinced it makes a good magic trick. I suspect my bar for interesting “self-working” magic tricks (the magician’s term of art for tricks that work based on some principle such as this, instead of sleight-of-hand) is high; if I instantly feel that I could figure out how something works, it’s not that interesting.

Not that I could perform the trick well; that takes some showmanship that I don’t have. But I know enough probability to know that I shouldn’t count cards – I’d be no good at it. And I know enough discrete mathematics to not be impressed by this trick

Some questions from the Museum of Math’s fundraiser

Numberplay: Math as a spectator sport. From Numberplay, a sub-blog of Wordplay, the crossword blog of the New York Times. The Museum of Mathematics, which is supposed to open later this year, held a fundraising event which included a mathematical tournament. (MCed by Will Shortz! I’m kind of curious how much math he knows.)

Go there and look at the questions; I’m going to solve some of them here, with my own comments. (In the contest these were multiple-choice. Since multiple-choice questions feel wrong outside of tests, and this blog is not a test, I’m omitting the choices.)

Moon creatures, let us suppose, have a unit of distance that we shall call a “lunar”. It was adopted because the moon’s surface area, if expressed in square lunars, exactly equals the moon’s volume in cubic lunars. The moon’s diameter is 2,160 miles. How many miles long is a lunar?

This is actually an interesting way to measure area. We’ll figure out how many lunars there are in a moon-radius. Since the surface area has to equal the volume, we get 4πr² = (4/3)πr³. Solving for r gives r = 3, so there are three lunars in a moon-radius, or six in a moon-diameter. The answer is 360. (The moon then has area and volume 36π.)

This feels roughly analogous to the definition of “radian” for circles, but is not quite the same. In fact, solving a similar equation for circumference = area, 2πr = πr², we get r = 2, suggesting we should take half the radius of a circle as the natural measure and giving 4π for both the circumference and the area. (The Tauists might be glad to hear that.)

Call a set of integers “spacy’ if it contains no more than one out of any three consecutive integers. How many subsets of {1, 2, 3, ……,12} are spacy?

Let g(n) be the number of spacy subsets of $\{1, 2, ..., n\}$ . Then clearly $g(0) = 1, g(1) = 2, g(2) = 3, g(3) = 4$ — in each case the spacy sets are just the empty set and the one-element subsets.

Then for larger n, $g(n) = g(n-1) + g(n-3)$ . How can you make a spacy subset of $\{1, 2, \ldots, n\}$ ? Well, any spacy subset of $\{ 1, 2, \ldots, n-1 \}$ is a spacy subset of $\{ 1, 2, \ldots, n \}$ which does not contain $lates n$, and there are $g(n-1)$ of those. And any spacy subset of $\{ 1, 2, \ldots, n \}$ which contains $n$ must, when you remove $n$ from it, give a spacy subset of $\{ 1, 2, \ldots, n-3 \}$ . There are $g(n-3)$ of those. Any spacy subset of $\latex \{1, 2, \ldots, n\}$ either contains or does not contain $n$ , and so $g(n) = g(n-1) + g(n-3)$ .

For example, the spacy subsets of $\{ 1, 2 \}$ are $\emptyset, 1$ , and $2$ (for notational ease I’m dropping the braces$. The spacy subsets of $\{ 1, 2, 3, 4 \}$ are $\emptyset, 1, 2, 3, 4, 14$ . So the spacy subsets of $\{ 1, 2, 3, 4, 5 \}$ are $\emptyset, 1, 2, 3, 4, 14$ (from the spacy subsets of $\{ 1, 2, 3, 4\}$ ) and $5, 15, 25$ (from adding a 5 to each spacy subset of $\{ 1, 2\}$ .)

Interestingly, the winners in this contest all came out of the financial community: the affiliations mentioned in the post at the Times are Two Sigma Investments, Renaissance Technologies, Morgan Stanley, and Citadel. A few explanations for this present themselves:

people who can solve these sorts of problems quickly tend to end up in that industry;
this was a fundraising event and people in finance have lots of money;
this event took place in New York; you’d have different results if you tried this in, say, San Francisco

The last of these explanations would be easy to check.

Kerrich, Ulam, and the scope of classes

One of the classic stories of experimental probability comes from the internment camps of World War II. John Kerrich, a South African mathematician, was interned in the Hald camp in Denmark as he had the misfortune to visit Copenhagen at exactly the wrong time. As Freedman, Pisani, and Purves, Statistics (4th ed, p. 273), put it, “to pass the time he carried out a series of experiments in rpobability theory. One experiment involved tossing a coin 10,000 times.” In June of 1945, after the war in Europe ended, he published a little book entitled “An Experimental Introduction to the Theory of Probability”, so-called because it includes the results of many rather long experiments in the flipping of coins, the selection of balls from urns, and so on. The coin-flipping experiment opens the book. I suppose now we would have probability textbooks that take the same approach but using simulated results from a random number generator instead of the results of actual experiments. I’m also reminded of the story of how Stanislaw Ulam invented the Monte Carlo method — he was playing solitaire while sick and realized that although it would be tremendously difficult to calculate the probability of winning, it would be easy to determine it by repeated play. Indeed this seems to have been in the air around that time; the Wikipedia article cites Ulam asking the same question in 1946.

However it is not the purpose of this post to comment on this book. I have a copy of this book in my hands right now but I have not read it. But I just want to quote from Kerrich’s preface (p. 6):

It is hoped that students of these pages will never have to reject any of the ideas given here, no matter how much they may refine them as their knowledge of the subject grows.

Kerrich has previously told us that he is teaching to a “very mixed class of students and colleagues”, some of whom are well-grounded in mathematics and some of whom are not. I feel that this echoes, more elegantly, a statement I’m fond of making about teaching at different levels. In high school we lie to the students and we don’t tell them that we’re lying. In college we lie to them and tell them that we’re lying. In graduate school we don’t lie to them.

How long does it take to observe that 10 is more common than 9 with three dice?

A classical problem in probability, which I learned about from Freedman, Pisani, and Purves’ book Statistics (pp. 238-240 in the fourth edition) is that of Galileo’s dice. Galileo was asked why, when rolling three fair dice, a sum of ten occurs more often than a sum of nine; he answered this question in Concerning an Investigation on Dice (from the University of York’s history of statistics page). It had previously been argued that since

10 = 6 + 3 + 1 = 6 + 2 + 2 = 5 + 4 + 1 = 5 + 3 + 2 = 4 + 4 + 2 = 4 + 3 + 3

and

9 = 6 + 2 + 1 = 5 + 3 + 1 = 5 + 2 + 2 = 4 + 4 + 1 = 4 + 3 + 2 = 3 + 3 + 3

and therefore there are six ways to get a sum of 9 and six ways to get a sum of 10, then both should be equally likely. But Galileo writes that

Nevertheless, although 9 and 12 can be made up in as many ways as 10
and 11, and therefore they should be considered as being of equal utility to these, yet
it is known that long observation has made dice-players consider 10 and 11 to be more
advantageous than 9 and 12.

Galileo observes that, for example, 6 + 3 + 1 (or any other sum with three different summands) corresponds to six different outcomes of the dice. If we distinguish the dice by calling them, say, red, white, and green (Galileo was Italian), then a red 6, white 3, green 1 is different than red 1, white 6, green 3, and so on; there are six such orders. There are similarly three different ways to roll 6 + 2 + 2 (or any sum with two of one summand and one of another) and one way to roll 3 + 3 + 3. So the number of ways to roll a 10 is 6 + 3 + 6 + 6 + 3 + 3 = 27 and the number of ways to roll a 9 is 6 + 6 + 3 + 3 + 6 + 1 = 25; so 10 is slightly more likely, as had been observed.

In a review of Joseph Mazur’s What’s Luck Got to Do with It? I mention that to tell that 10 occurs more often than 9, empirically, requires thousands of rolls. I haven’t seen the calculation written out explicitly and I was teaching this problem recently; it seems like it would be a good idea to write it out.

So say that I bet on the sum of three dice coming up 10, and you bet on it coming up 9. If the sum is 10, you’ll pay me one dollar; if the sum is 9, I’ll pay you one dollar; in any other case there is no payment. Then define a random variable $X$ , which is my net winnings on a single roll; we have $P(X=1) = 27/216$ (the probability of rolling 10), $P(X=-1) = 25/216$ (the probability of rolling 9), and $P(X=0) = 164/216$ (the probability of rolling something else).

Then it’s not hard to compute that the mean of X is 27/216 – 25/216 = 2/216 ≈ 0.0093, and the variance is $E(X^2)-E(X)^2 = 52/216 - (2/216)^2 = 0.2406$ . So after n rolls, for large n, the distribution of my net winnings is approximately normal with mean 0.0093n and variance 0.2406n, by the central limit theorem.

How large does this have to be for, say, the chance of my net winnings being positive to be 95%? The chances that my net winnings are negative are about

$\Phi \left( {-0.0093n \over \sqrt{0.2406n}} \right) = \Phi \left( -0.0189 \sqrt{n} \right)$

and for this to be 0.05 we need $0.0189\sqrt{n} = \Phi^{-1}(1-0.05) = 1.645$ . So $n = (1.645/0.0189)^2 \approx 7600$ . From this I usually conclude in class (somewhat facetiously) that the players of these dice games had too much time on their hands. (In practice I suppose that the observation that 10 coming up more often than 9 came from tabulation of many players playing this game, not a single player playing many thousands of times.)

The probability that seven people have different first initials

Last week I looked at some data collected by Rick Wicklin to determine if first and last initials are independent. (Conclusion: no.) This brought to mind a question closer to home. Literally. I live in a house with seven people. We all have different first initials. What’s the probability of that? (This is good to know if, say, you want to write a single initial on your food and assure it doesn’t get eaten; this might be the context in which I noticed it. Although I may have noticed it because we have a spreadsheet that magically keeps track of household expenses, I don’t remember.)

The quick-and-dirty way to solve this is, of course, simulation. The distribution of first initials from Wicklin’s data can be found in R. Assuming the matrix of counts by first and last initial is x, we can get the vector of frequencies of first initials
firsts = rep(0,26); for(i in 1:26){firsts[i]=sum(x[i,1:26])}
and we can extract the distribution explicitly:

j	m	s	d	c	b	k	a	r	l	t	p	e	g	w	n	h	v	f	y	i	o	z	x	u	q
579	403	390	387	342	306	294	289	264	240	230	166	134	125		75	62	61	42	40	23	15	11	10	7	5	2

I won’t explicitly use the distribution, but if you’re curious: R has a built-in vector letters which contains the letters of the alphabet. letters[order(-firsts)] puts the letters in the order coming from sorting the frequencies of first initials in descending order; that gives the first row. The second row is just sort(firsts, T). I follow Howard Wainer’s dictum of not listing in alphabetical order, a practice he memorably calls “Alabama first”.

Then to take a sample of size 7 with replacement from this distribution – to simulate my house – we can run
sample(1:26, 7, replace=TRUE, prob=freqs)
where I’m just sampling from the vector 1 to 26 because it’s easier that way. Some samples (the first six off the presses) are
16 11 18 10 7 10 10 13 18 3 7 18 13 1 10 12 3 4 4 23 13 8 6 13 19 11 4 4 13 3 2 20 14 3 2 5 1 10 1 26 19 20
which correspond to the septuples of initials PKRJGJJ, MRCGRMA, JLCDDWM, HFMSKDD, MCBTNCB, EAJAZST. In particular each of these has at least one repeated initial, so we start to get the sense that seven people chosen at random having all different initials is relatively rare.
To get the length of such a sample we can call it s and run length(table(s)) – table generates a frequency table, and s gives its length. (This may or may not be the fastest way.)

So the single line of R (alright, for an obnoxiously literal definition of “line”)
x = rep(0,7); for(s in 1:10^6){i = length(table(sample(1:26, 7, replace=T, prob=freqs))); x[i]=x[i]+1}
gives the frequency table of the number of different first initials in samples of size 7, over a million simulations. The resulting table is

number of different birthdays	1	2	3	4	5	6	7
number of runs	2	213	7636	80396	300451	424405	186897

In particular the probability of having seven different first initials is around 0.187. In comparison, the “traditional” birthday problem (where all birthdays are equally likely… well, except for February 29, which I noticed earlier this week when I taught the birthday problem on February 29) gives us that the probability of seven people having all different initials is

${(26)(25)(24)(23)(22)(21)(20) \over 26^7} = 0.41277$

and so collisions really do become much less likely. A collision among seven people is about as likely as it would be if there were fifteen possible different initials, all equally likely. The probability of this is $(15!)/(8! \times 15^7) \approx 0.1898$ . And there are other senses in which there are “effectively” only about fifteen possible first initials. For example, if $p_i$ is the probability that a random person’s first initial is the $i$th letter, then $(\sum_i p_i^2)^{-1} \approx 0.0710$; this number, roughly $1/14$ , is the probability that two people chosen at random from the population has the same birthday.

In fact, a naive approximation to the probability that no two of these seven people have the same first initial comes as follows: there are ${7 \choose 2} = 21$ pairs of people. Each pair has probability $\sum_i p_i^2 = 0.0710$ of coincidence. So the expected number of coincidences is $21(0.0710) = 1.491$ . If we assume that the distribution of the number of coincidences is Poisson (which it’s not!), the probability of at least one coincidence is $exp(-1.491) = 0.225$ . Not bad. (The same model gives $e^{-21/26} \approx 0.446$ for the uniform distribution, where the correct answer is 0.413.

Alternatively, the Shannon entropy of the distribution of first initials is $2.80$, which is the same as the Shannon entropy of a uniform distribution on a set of size $e^{2.80} \approx 16.4$. (You know, if such a set existed.)