Last week I looked at some data collected by Rick Wicklin to determine if first and last initials are independent. (Conclusion: no.) This brought to mind a question closer to home. Literally. I live in a house with seven people. We all have different first initials. What’s the probability of that? (This is good to know if, say, you want to write a single initial on your food and assure it doesn’t get eaten; this might be the context in which I noticed it. Although I *may* have noticed it because we have a spreadsheet that magically keeps track of household expenses, I don’t remember.)

The quick-and-dirty way to solve this is, of course, simulation. The distribution of first initials from Wicklin’s data can be found in R. Assuming the matrix of counts by first and last initial is `x`, we can get the vector of frequencies of first initials

`firsts = rep(0,26); for(i in 1:26){firsts[i]=sum(x[i,1:26])}`

and we can extract the distribution explicitly:

j | m | s | d | c | b | k | a | r | l | t | p | e | g | w | n | h | v | f | y | i | o | z | x | u | q | |

579 | 403 | 390 | 387 | 342 | 306 | 294 | 289 | 264 | 240 | 230 | 166 | 134 | 125 | 75 | 62 | 61 | 42 | 40 | 23 | 15 | 11 | 10 | 7 | 5 | 2 |

I won’t explicitly use the distribution, but if you’re curious: R has a built-in vector `letters` which contains the letters of the alphabet. `letters[order(-firsts)]` puts the letters in the order coming from sorting the frequencies of first initials in descending order; that gives the first row. The second row is just `sort(firsts, T)`. I follow Howard Wainer’s dictum of not listing in alphabetical order, a practice he memorably calls “Alabama first”.

Then to take a sample of size 7 with replacement from this distribution – to simulate my house – we can run

`sample(1:26, 7, replace=TRUE, prob=freqs)`

where I’m just sampling from the vector 1 to 26 because it’s easier that way. Some samples (the first six off the presses) are

16 11 18 10 7 10 10

13 18 3 7 18 13 1

10 12 3 4 4 23 13

8 6 13 19 11 4 4

13 3 2 20 14 3 2

5 1 10 1 26 19 20

which correspond to the septuples of initials PKRJGJJ, MRCGRMA, JLCDDWM, HFMSKDD, MCBTNCB, EAJAZST. In particular each of these has at least one repeated initial, so we start to get the sense that seven people chosen at random having all different initials is relatively rare.

To get the length of such a sample we can call it `s` and run `length(table(s))` – `table` generates a frequency table, and `s` gives its length. (This may or may not be the fastest way.)

So the single line of R (alright, for an obnoxiously literal definition of “line”)

` x = rep(0,7); for(s in 1:10^6){i = length(table(sample(1:26, 7, replace=T, prob=freqs))); x[i]=x[i]+1} `

gives the frequency table of the number of different first initials in samples of size 7, over a million simulations. The resulting table is

number of different birthdays | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

number of runs | 2 | 213 | 7636 | 80396 | 300451 | 424405 | 186897 |

In particular the probability of having seven different first initials is around 0.187. In comparison, the “traditional” birthday problem (where all birthdays are equally likely… well, except for February 29, which I noticed earlier this week when I *taught* the birthday problem on February 29) gives us that the probability of seven people having all different initials is

and so collisions really do become much less likely. A collision among seven people is about as likely as it would be if there were fifteen possible different initials, all equally likely. The probability of this is . And there are other senses in which there are “effectively” only about fifteen possible first initials. For example, if $p_i$ is the probability that a random person’s first initial is the $i$th letter, then $(\sum_i p_i^2)^{-1} \approx 0.0710$; this number, roughly , is the probability that two people chosen at random from the population has the same birthday.

In fact, a naive approximation to the probability that no two of these seven people have the same first initial comes as follows: there are pairs of people. Each pair has probability $\sum_i p_i^2 = 0.0710$ of coincidence. So the expected number of coincidences is . If we assume that the distribution of the number of coincidences is Poisson (which it’s not!), the probability of at least one coincidence is . Not bad. (The same model gives for the uniform distribution, where the correct answer is 0.413.

Alternatively, the Shannon entropy of the distribution of first initials is $2.80$, which is the same as the Shannon entropy of a uniform distribution on a set of size $e^{2.80} \approx 16.4$. (You know, if such a set existed.)

Munford (TAS, 1977) showed that ANY deviation from uniform frequencies increases the probability of a match (or, in your language, decreases the probability of no match). This is not very noticeable for the standard Birthday Problem because even though the empirical distribution of birthdays is not uniform (see http://blogs.sas.com/content/iml/2011/09/09/the-most-likely-birthday-in-the-us/), the deviation from uniformity is relatively small. Consequently, the empirical probabilities of a match are not very different from the probabilities obtained by assuming a uniform distribution. As you’ve noticed, however, the distribution of initials is FAR from uniform, so there is a big difference between empirical estimates (obtained via simulation) and the exact probabilites assuming uniformity.

I’m really impressed with your writing skills and also with the

layout on your weblog. Is this a paid theme or did

you customize it yourself? Anyway keep up the nice quality writing, it’s rare to see a

great blog like this one today.

Through my investigation, shopping for electronic devices online can for sure be expensive, nevertheless there are some guidelines that you can use to acquire the best products.

There are usually ways to uncover discount promotions that

could help make one to have the best electronic products products at the cheapest prices.

Thanks for your blog post.

I ѕavour, result in I fօund exactly what Ӏ was taking a look for.

You have ended my 4 day lenǥthy hunt! God Bless you man.

Have a nice day. Bye

I was suggested this web site by my cousin. I am not sure

whether this post is written by him as no one else know such detailed about

my trouble. You’re wonderful! Thanks!

Greetings! I’ve been following your website for some time now and finally got the bravery to go

ahead and give you a shout out from Porter Tx! Just wanted to say keep up the fantastic

job!

I visited multiple sites except the audio quality for audio

songs present at this site is actually fabulous.

It’s wonderfսl that you are getting thoughts from this post as well as from oսr discussiօn maԀe at

this place.

There have only been a couple of slightly negative customer reviews which seem to be related to having slight troubles getting it

working right out of the box. A class action lawsuit was filed against Sony and as a result, people could get a $5

refund if they could prove they saw and were unhappy with any of

the several movies featuring Manning’s fictitious quotes.

Please read the many book reviews of Stan’s

popular book by going to Stan’s website or by doing an internet search of

Stan’s many helpful book reviews. Successful online marketing businesses understand how important it

is to develop a strong, responsive email mailing list.

All sorts of books are store up in just one amazing e-Book device.

It’s actually very complicated in this full of activity life to listen news on Television, so I only

use the web for that reason, and take the most recent information.

(very very like in a Network Marketing enterprise). One

example might be that you are in the United States, and your

long distance service does not cover Canadian calls. This enables it to do its job better–making healthy choices.

If you are going for finest contents like I do,

only pay a quick visit this site everyday because it provides quality contents,

thanks

Limited time is usually an issue and there is only one way to rectify this problem

– cheats for GTA 5. First and foremost, they are

fun and it is an opprortunity for the busy Mom to forget dutiesnd get lost in

the world of online gaming. This article has been flagged as spam, if you think this is an error please contact us.

What a data of un-ambiguity and preserveness of valuable knowledge

regarding unexpected emotions.

I visited various web sites but the audio feature for audio songs existing at this web site is genuinely marvelous.

I don’t even know the way I ended up here, however I thought this

publish was once good. I do not know who you’re however definitely you are going to a well-known blogger when you are

not already. Cheers!

In this grand scheme of things you actually get an A+ for effort.

Where exactly you confused us was first on your facts.

You know, they say, the devil is in the details… And that could

not be more correct right here. Having said that,

permit me reveal to you exactly what did do the job. Your authoring is incredibly convincing which is probably the reason why I am making an effort to comment.

I do not really make it a regular habit of doing that.

2nd, despite the fact that I can certainly see a leaps in reasoning you make, I am not sure

of how you seem to connect the details which in turn help to

make the final result. For right now I will, no doubt subscribe to your issue however

hope in the foreseeable future you actually connect your facts better.

Hello everyone, it’s my first pay a visit at this web page, and piece

of writing is genuinely fruitful for me, keep up posting such posts.

Heya! I just wanted to ask if you ever have any problems with hackers?

My last blog (wordpress) was hacked and I ended up losing many months of hard work due

to no backup. Do you have any methods to prevent hackers?

I think this is one of the most important information for me.

And i’m glad reading your article. But should remark on few general things,

The website style is great, the articles is really excellent : D.

Good job, cheers

I am regular reader, how are you everybody? This piece of writing posted at this web page is in fact pleasant.

I’m curikous to find out what blog system you’re using?

I’m having some minor security issues with my latest site and

I’d like to find something more safe. Do you have any recommendations?

Ԍreat article.

Write more, thats all I have to say. Literally, it seems as though you relied on the video to make your point.

You obviously know what youre talking about, why throw away your intelligence

on just posting videos to your weblog when you could be giving us something enlightening to

read?

Have you ever thought about including a little

bit more than just your articles? I mean, what you

say is valuable and all. However think of if you added some great photos or video clips to give your

posts more, “pop”! Your content is excellent but with pics and video clips, this site

could certainly be one of the best in its niche.

Amazing blog!

Air plants and urban varieties are becoming very popular and giving new trends

to plant lovers. In conclusion, hydroponic and organic hydroponic gardening allow it to be feasible to obtain much more pure and healthy

environmental conditions. Things that readers are unlikely discover from a

regular gardening book.

Hello to all, the contents present at this site are truly awesome for

people experience, well, keep up the nice work fellows.

Hi there friends, itѕ fantastic piece օf writing ϲoncerning teachingand fսlly defined, ƙeep

it up all the time.

ForexMagnates, can you please persuade Maybank Kim Eng to forgive the

unfavorable balances as most other brokers do?

Excellent site you have here but I was curious if you knew of any forums that cover the same topics talked about in this article?

I’d really like to be a part of group where I can get feedback from other experienced individuals that share the same interest.

If you have any recommendations, please let me know. Appreciate it!