# Random sums of sines and random walks

John Cook, at his Probability Fact twitter feed (@ProbFact), asked (I’ve cleaned up the notation):

What is the expected amplitude for the sum of $N$ sines with random phase? i.e. sum of $\sin(x + \phi_i)$ where $\phi_i ~ uniform[0, 2\pi]$

Intuitively one expects something on the order of $sqrt{N}$, since we’re adding together what are essentially $N$ independent random variables. It’s not too hard to throw together a quick simulation, without even bothering with any trigonometry, and this was my first impulse. This code just picks the $\phi_i$ uniformly at random, and takes the maximum of $f(x) = \sum_i \sin(x + \phi_i)$ for values of $x$ which are multiples of $\pi/100$.
 x = (0:200)/(2*Pi) n = 1:100 num.samples = 100
 max.of.sines = function(phi){ max(rowSums(outer(x, phi, function(x,y){sin(x+y)}))) } 
 mean.of.max = function(n, k){mean(replicate(k, max.of.sines(runif(n, 0, 2*pi))))} averages = sapply(n, function(n){mean.of.max(n, num.samples)}) 

This is a bit tricky: in the matrix in max.of.sines, output by outer, each column gives the values of a single sine function $\sin(x + \phi_i)$, and rowSums adds them together.

We can then plot the resulting averages and fit a model $y^2 \sim Cx$. I get $C \approx 0.7872$ from my simulation, which is close enough to $pi/4$ to ring a bell:

 C = lm(averages^2~n+0)\$coefficients

 qplot(n, averages, xlab="n", ylab="mean", main="means for 100 samples") + stat_function(fun = function(x){sqrt(C*x)}) 

At this point we start thinking theory. If you’re me and you haven’t looked at a trig function in a while, you start at the wikipedia page, and discover that it actually does all the work for you:

$\sum_i a_i \sin (x + \delta_i) = a \sin (x+\delta)$

where

$a^2 = \sum_{i,j} a_i a_j \cos (\delta_i - \delta_j)$.

That is, the sum of a bunch of sinusoids with a period is a single sinusoid with the same period, and an amplitude easily calculated from the amplitudes and phases of the original sinusoids. There’s a formula for $\delta$ as well, but it’s not relevant here.
In our case all the $a_i$ are 1 and so we get

$a^2 = \sum_{i, j} \cos (\delta_i - \delta_j)$

If you take the expectation of both sides, and recognize that $E \cos (\delta_i - \delta_j)$ is 1 if $i = j$ (it’s $cos(0)$) and 0 if $i \not = j$ (just the average of the cosine function), then you learn $E(a^2) = N$ where $N$ is the number of summands. That agrees with our original guess, and is enough to prove that $E(a) \le \sqrt{N}$ by Jensen’s inequality.

To get the exact value of $E(a)$ we can expand on David Radcliffe’s comment: “Same as mean dist from origin after N unit steps in random directions. Agree with sqrt(N*pi/4)”. In particular, consider a random walk in the complex plane, where the steps are given by $exp(i \theta_j)$ where $\theta_j$ is uniform on the interval $[0, 2\pi)$. We can work out that its sum after $N$ steps is

$S_N = \sum_{j=1}^N exp(i \theta_j) = \sum_{j=1}^N (\cos \theta_j + i \sin \theta_j)$

and so, breaking up into the real and imaginary components,

$|S_N|^2 = \left( \sum_{j=1}^N \cos \theta_j \right)^2 + \left( \sum_{j=1}^n \sin \theta_j \right)^2$.

Rewriting the squared sums as double sums gives

$|S_N|^2 = \left( \sum_{j=1}^N \sum_{k=1}^N \cos \theta_j \cos \theta_k \right) + \left( \sum_{j=1}^N \sum_{k=1}^N \sin \theta_j \sin \theta_k \right)$

and combining the double sums gives
$|S_N|^2 = \sum_{j=1}^n \sum_{k=1}^N (\cos \theta_j \cos \theta_k - \sin \theta_j \sin \theta_k)$

and by the formula for the cosine of a difference we get

$|S_N|^2 = \sum_{j=1}^n \sum_{k=1}^N \cos (\theta_j - \theta_k)$

which is exactly the $a^2$ given above. So the amplitude of our sum of cosines is just the distance from the origin in a two-dimensional random walk!

It just remains to show that the expected distance from the origin of the random walk with unit steps in random directions after $N$ steps is $\sqrt{\pi N/4}$. A good heuristic demonstration is as follows: clearly the distribution of the position $(X, Y)$ is rotationally invariant, i. e. symmetric around the origin. The position $X$ is the sum of $N$ independent variables each of which is distributed like the cosine of a uniformly chosen angle; that is, it has mean $\mu = {1 \over 2\pi} \int_0^{2\pi} \cos \theta \: d\theta = 0$ and variance $\sigma^2 = {1 \over 2\pi} \int_0^{2\pi} \cos^2 \theta \: d \theta - \mu^2 = 1/2$. So the $X$-coordinate after $N$ steps is approximately normally distributed with variance $N/2$. The overall distribution, being rotationally symmetric with normal marginals, ought to be approximately jointly normal with $X$ and $Y$ both having mean 0, variance $N/2$, and uncorrelated; then $sqrt{X^2 + Y^2}$ is known to be Rayleigh-distributed, which finishes the proof modulo that one nasty fact.