Carl Erickson, the president of a small software company, writes that sick time follows a logarithmic distribution. His terminology is a bit nonstandard, but here’s what he’s saying. Take all the people who have worked for his company and list the amount of sick time they’ve taken, in hours. Sort that list in descending order. Then the xth entry on the list will be about -52 ln x + 236. There were a total of 86 employees.
Does this translate into a more standard statement about the distribution of about of sick time taken? Let F(z) be the probability that someone took at least z sick days. Then we have F(236 – 52 ln x) = x/86. Let z = 236 – 52 ln x and solve for x in terms of z. This gives x = exp((236 – z)/52). So we get
F(z) = exp((236 – z)/52) / 86
and as commenters at Hacker News pointed out, exp(236/52) is about 86, so we very roughly have
F(z) = exp(-z/52)
which also forces F(0) = 1, which is necessary because amounts of sick time must be nonnegative. This is exactly the exponential distribution – which is memoryless. So sick time is exponential with mean 52 hours.
Is there a good theoretical reason that this should happen, though? The exponential distribution is memoryless but I don’t see why sickness times should be, especially since we’re talking about the total amount of time that people spend sick, not the time they spend dealing with any single illness. Or is this just an example of everything looking linear (with the right variable transformation) if you try hard enough.