Why are correlations between -1 and 1?

So everyone knows that correlation coefficients are between -1 and 1. The (Pearson) correlation coefficient of x_1, \ldots, x_n and y_1, \ldots, y_n is given by

r = \sum_{i=1}^n {1 \over n}\left( {x_i - \mu_x \over \sigma_x} \right) \left( {y_i - \mu_y \over \sigma_y} \right)

where \mu_x, \mu_y are the means of the x_i and y_i, and \sigma_x, \sigma_y are their (population) standard deviations. Alternatively, after some rearrangement this is

r = {{x_1 y_1 + \cdots + x_n y_n \over n} - \mu_x \mu_y \over \sigma_x \sigma_y}

which is more convenient for calculation, but in my opinion less convenient for understanding. The correlation coefficient will be positive when (x_i-\mu_x)\sigma_x and (y_i-\mu_y)/\sigma_y usually have the same sign — meaning that larger than average values of x go with larger than average values of y — and negative when the signs tend to be mismatched.

But why should this be between -1 and 1? That’s not at all obvious from just looking at the formula. From a very informal survey of the textbooks lying around my office, if a text defines random variables then it gives a proof in terms of them. For example, Pitman, Probability, p. 433, has the following proof (paraphrased): Say X and Y are random variables and X^* = (X-E(X))/SD(X), Y^* = (Y-E(Y))/SD(Y) are their standardizations. First define correlation for random variables as Corr(X,Y) = (E(XY)-E(X)E(Y))/(SD(X)SD(Y)). Simple properties of random variables give Corr(X,Y) = E(X^* Y^*). Then observe that E(X^{*2}) = E(Y^*(2))=1 and look at

0 \le E(X^*-Y^*)^2 = 1+1-2E(X^*Y^*)

and rearrange to get that E(X^* Y^*) \le 1. Similarly looking at X^*+Y^* gives E(X^* Y^*) \ge -1. Finally, the correlation of a data set is just the correlation of the corresponding random variables.

This is all well and good if you’re introducing random variables. But one of the texts I’m teaching from this semester (Freedman, Pisani, and Purves, Statistics) doesn’t, and the other (Moore, McCabe, and Craig, Introduction to the Practice of Statistics) introduces the correlation for sets of bivariate data before it introduces random variables. These texts just baldly state that r is between -1 and 1 always — but of course some students ask why.

The inequality we’re talking about is an inequality involving sums of products: it’s really Cov(X,Y) \le SD(X) SD(Y). And that reminded me of the Cauchy-Schwarz inequality — but how to prove Cauchy-Schwarz for people who haven’t taken linear algebra? Wikipedia comes to the rescue. We only need the special case in \mathbb{R}^n, in which case Cauchy-Schwarz reduces to

\left( \sum_{i=1}^n u_i v_i \right)^2 \le \left( \sum_{i=1}^n u_i^2 \right) \left( \sum_{i=1}^n v_i^2 \right)

for any real numbers u_1, u_2, \ldots, u_n, v_1, v_2, \ldots, v_n. And the proof at Wikipedia is simple: look at the polynomial (in z)

(u_1 z + v_1)^2 + (u_2 z + v_2)^2 + \cdots + (u_n z + v_n)^2.

This is a quadratic. As a sum of squares of real numbers it’s nonnegative, so it has at most one real root. So its discriminant is nonpositive. But we can write it as

(u_1^2 + \cdots + u_n^2) z^2 + 2(u_1 v_1 + \cdots + u_n v_n) z + (v_1^2 + \cdots + v_n^2)

and so its discriminant is

4(u_1 v_1 + \cdots + u_n v_n)^2 - 4 (u_1^2 + \cdots + u_n^2) (v_1^2 + \cdots + v_n^2)

and this being nonpositive is exactly the form of Cauchy-Schwarz we needed.

To show that this implies the correlation coefficient being in [-1, 1]: let’s say we have the data (x_1, y_1), \ldots, (x_n, y_n) and we’d like to compute the correlation between the x_i and the y_j. The correlation doesn’t change under linear transformations of the data. So let $u_i$ be standardizations of the $x_i$ and let $v_j$ be standardizations of the $y_j$. Then we want the correlation in (u_1, v_1), \ldots, (u_n, v_n). But this is just

{u_1 v_1 + \cdots + u_n v_n \over n}.

By Cauchy-Schwarz we know that

(u_1 v_1 + \cdots + u_n v_n)^2 \le (u_1^2 + \cdots + u_n^2) (v_1^2 + \cdots + v_n^2)

and the right-hand side is n^2, since (u_1^2 + \cdots + u_n^2)/n is the standard deviation of the u_i, and similarly for the other factor. Therefore

(u_1 v_1 + \cdots + u_n v_n)^2 \le n^2

and dividing through by n^2 gives that the square of the correlation is bounded above by $1$, which is what we wanted.

So now I have something to tell my students other than “you need to know about random variables”, which is always nice. Not that it would kill them to know about random variables. But I’m finding that intro stat courses are full of these black boxes that some students will accept and some want to open.

6 thoughts on “Why are correlations between -1 and 1?

    1. Fair enough. To standardize a set of numbers $(x_1, x_2, \ldots, x_n)$ you just subtract their mean and then divide by their standard deviation — so you get numbers that indicate how far above or below the mean they are, in units of their standard deviation. Similarly for random variables.

  1. There’s a bit of error in the notation from the Pitman proof. In particular, what we really want is

    0 =< E[ (X* – Y*)^2] = 1 + 1 -2E[X*Y*]

    The way it's written, it looks as though the entire expectation is squared (which gives a trivial and useless result), whereas we really want the expression within the expectation to be squared.

  2. Nice post. I used to be checking constantly this blog and I’m inspired!
    Very useful information specially the closing part 🙂 I handle such info much.
    I was looking for this particular info for a very lengthy
    time. Thank you and good luck.

  3. Thanks a lot from France for the part with random variables. I don’t know if Cauchy Schwarz inegality works in this case but your demonstration is nice without. Of course,as you writed,it in this article, the case with n values of statistics without random variables works well with it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s