From r/dataisbeautiful by alekazam13, a chart of how many people live in regions where the zip codes start with each digit. The distribution is surprisingly uneven.
To give some context: zip codes in the US (what other countries would call “postal codes”) have five digits. The first digit corresponds to one of ten regions of the US (these regions don’t exist anywhere outside of the zip code system), the next two to a postal service sorting center, and the last two to an individual post office. There are something like 40,000 zip codes, and of course 100,000 possible ten-digit numbers. Here’s a map of the regions, public domain from Wikipedia:
Of course, ZIP codes were invented some time ago; they were introduced in 1963! So what if we use 1960 census figures? There was some imbalance, but perhaps less than there is today.
|population (millions), 2019||53.5||46.8||40.6||33.2||32.9||32.7||23.8||23.7||23.7||17.3|
|population (millions), 1960||21.2||17.9||17.0||28.5||16.6||25.2||6.2||18.0||16.6||12.1|
The population of the US right now is about 332 million; so in an ideal world you’d have 33.2 million in each bucket. If you want to divide the US into ten regions, all made up of adjacent states (we’ll make some exceptions for Alaska, Hawaii, and Puerto Rico), and it’s 1963, you run into a problem pretty quickly. The population of the US at that time was 179 million. New England (that’s the six states in yellow to the far northeast; note that New York and New Jersey are not part of New England) had 10.4 million people, not enough to plausibly call it a tenth of the country. New York (labeled “10-14” above) had 16.8 million, nearly a tenth all on its own. The only sensible decision was to skip over New York and add New Jersey. (Why New York didn’t get its own first digit but has to share with Pennsylvania, I don’t know.). The “5” and especially “8” regions had very low populations back then; these were quite rural parts of the country and I assume had more post offices per capita. But since then we figured out how to get people to live in the desert.
I suspect if the system were invented today, then, the regions would look a bit different. In particular:
- California would get its own region (at 39 million, it’s fully 12% of the US population)
- Texas (29 million) would get a region nearly to itself, sharing with Oklahoma (4 million)
- Georgia plus Florida (11 + 21) would be a region – these are two states that have grown quite a bit since 1960.
- The six New England states (15) plus New York (19) would be a region
As is usual with these sorts of things, you nibble around the edges and then there end up being lots of ways to divide the middle of the country, none of which are any good. (I tried.). The rough design criteria seem to be:
- divide the country into ten sets of states of roughly equal population;
- such that each region is contiguous (probably Alaska and Hawaii should be in the same regions as Washington and California?, respectively, and Puerto Rico with somewhere on the East Coast);
- and such that the regions don’t “look funny”, but what does that even mean?
In other words, the criteria for forming congressional districts or similar. (Without gerrymandering.)
Ultimately with computerized sorting having a system where zip codes are “interpretable” doesn’t really matter. And the bureaucracy of the Postal Service came up with a different solution than the bureaucracy of Bell Telephone, which was inventing area codes at not all that different a time. Area codes seem random, although with the design principle that more populous areas get codes with smaller sums of digits, which took less time to dial. My understanding is that similar area codes were deliberately put far apart geographically, in order to reduce confusion. I’ve never actually seen that written down, though.