What proportion of months contain parts of six calendar weeks?

What proportion of months have that annoying property that, on an old-fashioned paper calendar, the 23rd and 30th, or 24th and 31st, have to be scrunched up into a single box? Or, on a computerized calendar, six rows are necessary? Or, if we don’t want to refer to a particular calendar format, that it contains parts of six (Sunday-to-Saturday) calendar weeks?

For example, consider a 30-day month that starts on a Saturday, like September 2012, which is the next example of this phenomenon:

 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

31-day months that start on Friday or Saturday also have this property. To identify this month we can quickly code up Zeller’s congruence for the day of the week:

zeller = function(m, d, y){
if(m K = y %% 100;
J = y - 100*K;
h = (d + floor(13*(m+1)/5) + K + floor(K/4) + floor(J/4) - 2*J) %% 7;
h;
}

This returns 0 for Saturday, 1 for Sunday, …, 6 for Friday.

Then put the lengths of the months of the year in a vector:

lengths = c(31,28,31,30,31,30,31,31,30,31,30,31);

(You may object “what about leap year!” — but that doesn’t concern us, as February, even in leap year, can never require six rows.)

The sixweeks function returns TRUE if a month contains parts of six (Sunday to Saturday) calendar weeks, and FALSE otherwise:

sixweeks = function(days, first){
((days == 30) && (first == 0)) || ((days == 31) && (first == 0)) || ((days == 31) && (first == 6))}

Now the Gregorian calendar has a period of 400 years. So we just run over some 400-year period and run sixweeks on every week. The result is a vector containing the number of each month which fall within parts of six calendar weeks, in that 400-year cycle.

counts = rep(0, 12);

for(y in 2000:2399){
for(m in 1:12){
first = zeller(m, 1, y);
days = lengths[m];
counts[m] = counts[m] + sixweeks(days, first);
}
}

The output is (cleaned up a bit, and with the month names inserted):

 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 116 0 112 60 116 56 116 112 56 116 56 116

These numbers add up to 1032 (run sum(counts)). So in each 400-year period, 1032 months out of 4800, or exactly 21.5%, include parts of six calendar weeks. 116 of these months are January, 0 are February, 112 are March, and so on. (If you believe that a calendar week is Monday to Sunday, because you take the dictates of the ISO too seriously or because you’re European, it’s not hard to adapt the sixweeks function to that; instead of 1032 you get 1028.)

Could we have predicted this number without the need for computation? We can come pretty close. Seven out of every 12 months is a 31-day month; of those about two-sevenths should start on a Friday or a Saturday. Similarly, four out of every 12 months is a 30-day months, and one-seventh of those should start on a Saturday. So the probability that a randomly chosen month contains parts of six calendar weeks ought to be quite close to
${7 \over 12} \times {2 \over 7} + {4 \over 12} \times {1 \over 7} = {18 \over 84} \approx 0.214$
and indeed we come pretty close!  In fact this post is historically backwards. I did this calculation first and then went to the computer and wrote the code to check it.