# Explaining banding in a scatterplot of Goldbach’s function

David Radcliffe asks for an explanation of the “bands” in the scatterplot of the number of solutions to p + q = 2n in primes. To give an example, we have

2 × 14 = 28 = 23 + 5 = 17 + 11 = 11 + 17 = 5 + 23
2 × 15 = 30 = 23 + 7 = 19 + 11 = 17 + 13 = 13 + 17 = 11 + 19 = 7 + 23
2 × 16 = 32 = 29 + 3 = 19 + 13 = 13 + 19 = 3 + 29

and so, denoting this function by f, the elements of this sequence corresponding to n = 14, 15, 16 are f(14) = 4, f(15) = 6, and f(16) = 4 respectively. (Note that the counts here are of ordered sums, with 23 + 5 and 5 + 23 both counting; if you use unordered sums everything works out pretty much the same way, since every sum except those like p + p appears twice and I’m going to talk about ratios and inequalities and the like.)

My re-rendering of something similar to the original scatterplot is here: and there are heuristic arguments that $f(n) \approx n/(log(n)^2))$, so let’s divide by that to get a plot of the “normalized” number of solutions: There are definitely bands in these plots. Indeed the situation for n = 14, 15, 16 is typical: f(n) “tends to be” larger when n is divisible by 3 than when it isn’t. A handwaving justification for this is as follows: consider primes modulo 3. All primes (with the trivial exceptions of 2 and 3) are congruent to 1 or 5 modulo 6, and by the prime number theorem for arithmetic progressions these are equally likely. (For some data on this, see Granville and Martin on prime races, which is a nice expository paper.) So if we add two primes p and q together, there are four equally likely cases:

• p is of form 3n+1, q is of form 3n+1, p+q is of form 3n+2
• p is of form 3n+1, q is of form 3n+2, p+q is of form 3n
• p is of form 3n22, q is of form 3n+1, p+q is of form 3n
• p is of form 3n+5, q is of form 3n+2, p+q is of form 3n+1

So if we just add primes together, we get multiples of three fully half the time, and the remaining half of the results are evenly split between integers of forms 3n+2 and 3n+1.

We can make the bands “go away” by plotting, instead of $f(n)$, the function which is $f(n)/2$ when $n$ is divisible by 3 and $f(n)$ otherwise. Call this $f_3(n)$. But there’s still some banding: Naturally we look to the next prime, 5. A given prime is equally likely to be of the form 5n+1, 5n+2, 5n+3, or 5n+4; if we work through the combinations we can see that there are 4 ways to pair these up to get a multiple of 5, and 3 ways to get each of the forms 5n+1, 5n+2, 5n+3, 5n+4. So it seems natural to penalize multiples of 5 by multiplying their $f(n)$ by 3/4; the banding then is even less strong, as you can see below. The natural thing to do here is to just iterate over primes. For the prime $p$ we get that there are $p-1$ ways to pair up residue classes $1, 2, \ldots, (n-1) \pmod p$ to get the residue class 0 (i. e. multiples of $p$) and $p-2$ ways to get each of the classes $1, 2, \ldots, n-1$. That is, multiples of $2p$ are more likely than nonmultiples to be sums of randomly chosen primes, by a factor of $(p-1)/(p-2)$. Correcting for this, let’s plot $x$ against $f^*(n) = f(n) \times \left( \prod_{p|n} {p-2 \over p-1} \right);$

in this case you get the plot below. The lack of banding in this plot is basically the extended Goldbach conjecture. Although I didn’t know this when I started writing, apparently this is known as Goldbach’s comet: see e. g. Richard Tobin or Ben Vitale or this MathOverflow post.

And although this is a number-theoretic problem, much of this is an exercise in statistical model fitting; I proceeded by making a plot, checking out the residuals compared to some model to see if there was a pattern, and fitting a new model which accounted for those residuals. However, in this case there was a strong theory backing me up, so this is, thankfully, not a pure data mining exercise.