The probability that seven people have different first initials

Last week I looked at some data collected by Rick Wicklin to determine if first and last initials are independent. (Conclusion: no.) This brought to mind a question closer to home. Literally. I live in a house with seven people. We all have different first initials. What’s the probability of that? (This is good to know if, say, you want to write a single initial on your food and assure it doesn’t get eaten; this might be the context in which I noticed it. Although I may have noticed it because we have a spreadsheet that magically keeps track of household expenses, I don’t remember.)

The quick-and-dirty way to solve this is, of course, simulation. The distribution of first initials from Wicklin’s data can be found in R. Assuming the matrix of counts by first and last initial is x, we can get the vector of frequencies of first initials
firsts = rep(0,26); for(i in 1:26){firsts[i]=sum(x[i,1:26])}
and we can extract the distribution explicitly:

j m s d c b k a r l t p e g w n h v f y i o z x u q
579 403 390 387 342 306 294 289 264 240 230 166 134 125 75 62 61 42 40 23 15 11 10 7 5 2

I won’t explicitly use the distribution, but if you’re curious: R has a built-in vector letters which contains the letters of the alphabet. letters[order(-firsts)] puts the letters in the order coming from sorting the frequencies of first initials in descending order; that gives the first row. The second row is just sort(firsts, T). I follow Howard Wainer’s dictum of not listing in alphabetical order, a practice he memorably calls “Alabama first”.

Then to take a sample of size 7 with replacement from this distribution – to simulate my house – we can run
sample(1:26, 7, replace=TRUE, prob=freqs)
where I’m just sampling from the vector 1 to 26 because it’s easier that way. Some samples (the first six off the presses) are

16 11 18 10 7 10 10
13 18 3 7 18 13 1
10 12 3 4 4 23 13
8 6 13 19 11 4 4
13 3 2 20 14 3 2
5 1 10 1 26 19 20

which correspond to the septuples of initials PKRJGJJ, MRCGRMA, JLCDDWM, HFMSKDD, MCBTNCB, EAJAZST. In particular each of these has at least one repeated initial, so we start to get the sense that seven people chosen at random having all different initials is relatively rare.
To get the length of such a sample we can call it s and run length(table(s))table generates a frequency table, and s gives its length. (This may or may not be the fastest way.)

So the single line of R (alright, for an obnoxiously literal definition of “line”)
x = rep(0,7); for(s in 1:10^6){i = length(table(sample(1:26, 7, replace=T, prob=freqs))); x[i]=x[i]+1}
gives the frequency table of the number of different first initials in samples of size 7, over a million simulations. The resulting table is

number of different birthdays 1 2 3 4 5 6 7
number of runs 2 213 7636 80396 300451 424405 186897

In particular the probability of having seven different first initials is around 0.187. In comparison, the “traditional” birthday problem (where all birthdays are equally likely… well, except for February 29, which I noticed earlier this week when I taught the birthday problem on February 29) gives us that the probability of seven people having all different initials is

{(26)(25)(24)(23)(22)(21)(20) \over 26^7} = 0.41277

and so collisions really do become much less likely. A collision among seven people is about as likely as it would be if there were fifteen possible different initials, all equally likely. The probability of this is (15!)/(8! \times 15^7) \approx 0.1898. And there are other senses in which there are “effectively” only about fifteen possible first initials. For example, if $p_i$ is the probability that a random person’s first initial is the $i$th letter, then $(\sum_i p_i^2)^{-1} \approx 0.0710$; this number, roughly 1/14, is the probability that two people chosen at random from the population has the same birthday.

In fact, a naive approximation to the probability that no two of these seven people have the same first initial comes as follows: there are {7 \choose 2} = 21 pairs of people. Each pair has probability $\sum_i p_i^2 = 0.0710$ of coincidence. So the expected number of coincidences is 21(0.0710) = 1.491. If we assume that the distribution of the number of coincidences is Poisson (which it’s not!), the probability of at least one coincidence is exp(-1.491) = 0.225. Not bad. (The same model gives e^{-21/26} \approx 0.446 for the uniform distribution, where the correct answer is 0.413.

Alternatively, the Shannon entropy of the distribution of first initials is $2.80$, which is the same as the Shannon entropy of a uniform distribution on a set of size $e^{2.80} \approx 16.4$. (You know, if such a set existed.)

30 thoughts on “The probability that seven people have different first initials

  1. Munford (TAS, 1977) showed that ANY deviation from uniform frequencies increases the probability of a match (or, in your language, decreases the probability of no match). This is not very noticeable for the standard Birthday Problem because even though the empirical distribution of birthdays is not uniform (see http://blogs.sas.com/content/iml/2011/09/09/the-most-likely-birthday-in-the-us/), the deviation from uniformity is relatively small. Consequently, the empirical probabilities of a match are not very different from the probabilities obtained by assuming a uniform distribution. As you’ve noticed, however, the distribution of initials is FAR from uniform, so there is a big difference between empirical estimates (obtained via simulation) and the exact probabilites assuming uniformity.

  2. I’m really impressed with your writing skills and also with the
    layout on your weblog. Is this a paid theme or did
    you customize it yourself? Anyway keep up the nice quality writing, it’s rare to see a
    great blog like this one today.

  3. Through my investigation, shopping for electronic devices online can for sure be expensive, nevertheless there are some guidelines that you can use to acquire the best products.
    There are usually ways to uncover discount promotions that
    could help make one to have the best electronic products products at the cheapest prices.

    Thanks for your blog post.

  4. I was suggested this web site by my cousin. I am not sure
    whether this post is written by him as no one else know such detailed about
    my trouble. You’re wonderful! Thanks!

  5. Greetings! I’ve been following your website for some time now and finally got the bravery to go
    ahead and give you a shout out from Porter Tx! Just wanted to say keep up the fantastic
    job!

  6. There have only been a couple of slightly negative customer reviews which seem to be related to having slight troubles getting it
    working right out of the box. A class action lawsuit was filed against Sony and as a result, people could get a $5
    refund if they could prove they saw and were unhappy with any of
    the several movies featuring Manning’s fictitious quotes.
    Please read the many book reviews of Stan’s
    popular book by going to Stan’s website or by doing an internet search of
    Stan’s many helpful book reviews. Successful online marketing businesses understand how important it
    is to develop a strong, responsive email mailing list.
    All sorts of books are store up in just one amazing e-Book device.

  7. (very very like in a Network Marketing enterprise). One
    example might be that you are in the United States, and your
    long distance service does not cover Canadian calls. This enables it to do its job better–making healthy choices.

  8. Limited time is usually an issue and there is only one way to rectify this problem
    – cheats for GTA 5. First and foremost, they are
    fun and it is an opprortunity for the busy Mom to forget dutiesnd get lost in
    the world of online gaming. This article has been flagged as spam, if you think this is an error please contact us.

  9. I don’t even know the way I ended up here, however I thought this
    publish was once good. I do not know who you’re however definitely you are going to a well-known blogger when you are
    not already. Cheers!

  10. In this grand scheme of things you actually get an A+ for effort.
    Where exactly you confused us was first on your facts.
    You know, they say, the devil is in the details… And that could
    not be more correct right here. Having said that,
    permit me reveal to you exactly what did do the job. Your authoring is incredibly convincing which is probably the reason why I am making an effort to comment.
    I do not really make it a regular habit of doing that.
    2nd, despite the fact that I can certainly see a leaps in reasoning you make, I am not sure
    of how you seem to connect the details which in turn help to
    make the final result. For right now I will, no doubt subscribe to your issue however
    hope in the foreseeable future you actually connect your facts better.

  11. Hello everyone, it’s my first pay a visit at this web page, and piece
    of writing is genuinely fruitful for me, keep up posting such posts.

  12. I think this is one of the most important information for me.
    And i’m glad reading your article. But should remark on few general things,
    The website style is great, the articles is really excellent : D.
    Good job, cheers

  13. I’m curikous to find out what blog system you’re using?
    I’m having some minor security issues with my latest site and
    I’d like to find something more safe. Do you have any recommendations?

  14. Write more, thats all I have to say. Literally, it seems as though you relied on the video to make your point.

    You obviously know what youre talking about, why throw away your intelligence
    on just posting videos to your weblog when you could be giving us something enlightening to
    read?

  15. Have you ever thought about including a little
    bit more than just your articles? I mean, what you
    say is valuable and all. However think of if you added some great photos or video clips to give your
    posts more, “pop”! Your content is excellent but with pics and video clips, this site
    could certainly be one of the best in its niche.
    Amazing blog!

  16. Air plants and urban varieties are becoming very popular and giving new trends
    to plant lovers. In conclusion, hydroponic and organic hydroponic gardening allow it to be feasible to obtain much more pure and healthy
    environmental conditions. Things that readers are unlikely discover from a
    regular gardening book.

  17. Excellent site you have here but I was curious if you knew of any forums that cover the same topics talked about in this article?
    I’d really like to be a part of group where I can get feedback from other experienced individuals that share the same interest.

    If you have any recommendations, please let me know. Appreciate it!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s