Yesterday I was preparing a lecture where I would need to compute means and standard deviations. I happened to pick five integers such that their mean and the mean of their squares are both integers. (Alternatively, their sum and the sum of their squares were both multiples of 5.) What are the chances of that?

Incidentally, the integers I chose were , so their sum is and their sum-of-squares is , both divisible by 5.

Of course we need some sort of model here. Let’s assume that however we pick integers, they’re uniformly distributed modulo 5 — that is, we have probability 1/5 of choosing a multiple of 5, 1/5 of choosing a number that’s one more than a multiple of 5, and so on. Then the obvious guess is as follows:

(1) the probability that the sum of five randomly chosen integers is divisible by 5 is one-fifth;

(2) the probability that the sum of their squares is divisible by 5 is also one-fifth;

(3) these two events are independent.

As it turns out all of these are true. It’s easy to see that (1) is true. What’s the probability that the sum of n randomly chosen integers is divisible by five? Consider the sum of the first n-1 randomly chosen integers. Whatever this sum is, the probability that we’ll pick an integer that makes the sum of the first n a multiple of five is one-fifth.

However this argument doesn’t work for (2). Look at the squares of the positive integers: they’re 1, 4, 9, 16, 25, and so on. Dividing them by 5 and taking remainders we get . In fact one-fifth of the squares are divisible by 5; two-fifths are one more than a multiple of five; two-fifths are one less than a multiple of five.

So how could a sum of five squares work out to be a multiple of five? It could consist of:

(A) five multiples of five, like . This happens with probability .

(B) five squares which are one more than a multiple of 5, like . This happens with probability .

(C) five squares which are one less than a multiple of 5 — which also happens with probability .

(D) one square which is one more than a multiple of five; one which is one less than a multiple of five; and three multiples of 5, like . The probability of this is the number of ways of choosing which of our five integers is one more than a multiple of 5 and which is one less — that’s — times the probability of getting a square that’s one more than a multiple of five, then one that’s one less, then three multiples of 5, which is $(2/5)(2/5)(1/5)^3 = 4/5^5$, for a total of $80/5^5$.

(E) two squares which are one more than a multiple of five; two which are one less; one multiple of five, like . The probability of this is .

These add up to , or exactly one-fifth! Is there a simple argument for this?

Finally, how can we have a set of integers where both the sum and the sum of squares is a multiple of 5? Working through the cases, the possibilities are either that all five integers have *different* remainders when dividing by 5, which happens with probability , or that all five have the *same* remainder, which happens with probability . And these add up to… .

As it turns out, though, the sum of five random integers mod 5 and the sum of their squares mod 5 are not independent. Twenty-one of the possible twenty-five outcomes (sum, sum of squares) have probability . However, and both have probability $6/125$, and and both have probability $4/125$. I have no idea why things work out this way, as this is just the result of a brute-force calculation.

A couple observations from playing with this problem, for other numbers of integers:

1. Since the sum of two numbers is even exactly when the sum of their squares is even, these two events won’t be independent in general.

2. There are only two ways for a sum of four squares to be divisible by four: all four numbers are even, or all four are odd. The probability that this happens is 1/8, not 1/4. I wrote a program to calculate these probabilities: 1/n is the probability that a sum of n squares is divisible by n only if n is 1, 2, 3, 5, 7, 11, 13, 15, or possibly more than 30. I don’t know what these numbers have in common!

Owen, that’s a nice observation about the parity. A bit of simulation I did yesterday gives probabilities near 2/n^2 for the sum and the sum of squares of n integers both being divisible by n, when n is even. In some sense it looks like the parity is the only real constraint.

Looking at the sequence of numbers you give, I’m kind of reminded of the primes (although why nothing between 15 and 30?) and it wouldn’t surprise me if there’s a characterization of that sequence in terms of prime factorizations. I haven’t had a chance to play with this problem more, though.

I also tried the problem computationally, but unfortunately my results disagree with Owen’s. Specifically, I get that the sum of n squares is divisible by n with probability 1/n if and only if n is 2 or an odd squarefree number.

I have checked up to n = 500. Our results first disagree at n = 17. I assume that is because 17^17 exceeds the capacity of a 64-bit integer.

Speaking of an easy (or rather mechanical) way to calculate the second probability. One could “cheat” and expand (2x^4+2x+1)^5 and add up the coefficients for the 0,5,10,15,20 powers.

For instance http://www.wolframalpha.com/input/?i=%282x%5E4%2B2x%2B1%29%5E5