Pythagoras goes linear

Michael Lugo Uncategorized February 27, 2012February 27, 2012 3 Minutes

Let $x_i$ and $y_i$ both be uniform on $[0, 1]$ . Let $w_i$ be the smaller of the two, and let $z_i$ be the larger. Let $h_i = \sqrt{w_i^2 + z_i^2}$ . So $(x_i, y_i)$ is a random point in the unit square, and $h_i$ is its distance from the origin. We can predict this distance using linear regression. For example, in R, we can pick $10^4$ such points and execute the code

x=runif(10^4,0,1) y=runif(10^4,0,1) w=rep(0,10^4); for(i in 1:10^4){w[i]=min(x[i],y[i])} z=rep(0,10^4); for(i in 1:10^4){z[i]=max(x[i],y[i])} h=sqrt(w^2+z^2) lm(h~0+w+z)
to fit a linear model of the form $h = aw+bz$ . The least-squares model here is, for this particular simulation, $h = 0.4278w + 0.9339z$ , with $R^2 = 0.9995$ . In other words, the formula

$h = 0.4278 \min(x,y) + 0.9339 \max(x,y)$

appears to predict a as a linear function of $\min(x,y)$ and $\max(x,y)$ quite well, and so the hypotenuse of a triangle is 0.4278 times its shorter leg, plus 0.9339 times its longer leg. For a particular famous special case, try x = 3, y = 4; then we predict the hypotenuse is 0.4278(3) + 0.9339(4) = 5.019, quite close to the true value of 5.

Andrew Gelman and Deborah Nolan, in Teaching Statistics: A bag of tricks, give a very similar example, with slightly different numerical parameters and quip that “if Pythagoras knew about multiple regression, he might never have discovered his famous theorem”. (p. 146). They fit a model that is allowed to have nonzero constant term; I choose to fit a model with zero constant term. I think that our anachronistic Pythagoras would have had the sense to observe that if we double x and y, we should double the hypotenuse as well.

The natural question, to me, is to determine the “true” constants. So what constants a and B give the linear function $ax+by$ that best approximates $\sqrt{x^2+y^2}$ , when we restrict to $0 < x < y < 1$ ? The reason for the triangular-shaped region is that we’re restricting to the case where $x$ is smaller and $y$ is larger. To be consistent with our Pythagoras-as-linear-regressor model, we’ll make the approximation in the least-squares sense. So we want to minimize

$f(a,b) = latex \int_0^1 \int_0^y \left( \sqrt{x^2+y^2} – (ax+by) \right)^2 \: dx \: dy $

as a function of a and b. This is a calculus problem. Expand the integrand to get

$\int_0^1 \int_0^y x^2+y^2+a^2 x^2 + b^2 y^2 + 2ab xy - 2ax \sqrt{x^2+y^2} - 2by \sqrt{x^2+y^2} \: dx \: dy.$

The polynomials are easy to integrate; the square-root terms somewhat less so, if it’s been a while since you’ve done freshman calculus. But after a bit of work this is

$f(a,b) = {1 \over 12} a^2 + {1 \over 4} ab + {1 \over 4} b^2 + {1 \over 3} - C_1 a - C_2 b$

where $C_1 = (2\sqrt{2}-1)/6, C_2 = (\sqrt{2}+\sinh^{-1} 1)/4$ . Differentiating we get

${\partial \over \partial a} f(a,b) = {1 \over 6} a + {1 \over 4} b - C_1$

and

${\partial \over \partial b} f(a,b) = {1 \over 4} a + {1 \over 2} b - C_2$ .

Set both of these equal to zero and solve to get
$a = 24C_1 - 12C_2 = 5 \sqrt{2} - 4 - 3 \sinh^{-1} 1 = 0.4269, b = -12C_1 + 8C_2 = -2\sqrt{2} + 2 + 2 \sinh^{-1} 1 = 0.9343$

which are tolerably close to the coefficients that came out of the regression. (Those coefficients had standard errors of 0.0009 and 0.0005 respectively.)

Of course our hypothetical Pythagoras couldn’t have done these integrals, and would not have liked that they turn out to be irrational. Perhaps he would have just said that the length of the hypotenuse of a triangle was three-sevenths of the shorter leg, plus fourteen-fifteenths of the longer leg.

Published by Michael Lugo

View all posts by Michael Lugo

Published February 27, 2012February 27, 2012

17 thoughts on “Pythagoras goes linear”

Patrickrick Stein says:

February 28, 2012 at 8:45 am

Very nice. This reminds me of an old shortcut that I read while doing graphics work on a 386: sqrt( x^2 + y^2 ) is approximately 1/2 min(x,y) + max(x,y). Translated into your above equations, this is a = 1/2, b = 1.

This is very easy to pull of when your x and y are integers or fixed-point numbers. I suppose you could pull off fixed point versions of your coefficients, too.

One advantage of a = 1/2, b = 1 has over a = 0.4269, b = 0.9343 though is that axis-aligned distances are exact in the former and underestimated in the latter.

Reply
Pingback: In like a linkdump « Quomodocumque
Boris says:

March 2, 2012 at 7:35 pm

I tried computing the exact value of the coefficient of determination. I might be wrong, but I get

$\frac{50-30 \sqrt{2}+18 \sinh^{-1}(1)-22 \sqrt{2} \sinh^{-1}(1)+7 \sinh^{-1}(1)^2}{8-4 \sqrt{2} \sinh ^{-1}(1)-2 \sinh ^{-1}(1)^2}$

This is approximately equal to 0.995607, which has one less 9 than reported in the original post.

Reply
Boris says:

March 3, 2012 at 9:27 am

I tried a similar computation when you consider the hypotenuse as a function only of the longer side. Then you get r^2 of 90% (correlation is ~95%).

As a function of only the shorter side, you get r^2 of about 55% (correlation is ~74%). How does that compare with results in a science other than physics?

(I computed all of these values exactly, but they’re not pretty …)

Reply
1. superblymagazine854c8dad4c says:
  
  September 24, 2025 at 7:34 pm
  
  only of the longer side would be cool, cause then you dont need to repeat reading x and y for getting the shorter side as well. (Simpler mechanical machine.)
  
  Reply
Pingback: How to compute arctangent if you live in the 18th century « Quomodocumque
Pingback: Revisiting Pythagoras goes linear | God plays dice
pret unghii cu gel says:

August 19, 2013 at 8:24 am

I don’t leave a response, however I read some of the remarks here Pythagoras goes linear | God plays dice. I do have a few questions for you if you don’t
mind. Is it simply me or does it appear like a few of these
comments come across as if they are coming from brain
dead visitors? 😛 And, if you are posting at additional online social sites,
I’d like to keep up with everything new you have to post. Could you make a list of all of all your social community pages like your linkedin profile, Facebook page or twitter feed?

Reply
Zelda says:

August 19, 2013 at 8:26 am

Wow that was unusual. I just wrote an very long comment but after I clicked submit my comment didn’t show up. Grrrr… well I’m not writing all that over again.

Regardless, just wanted to say excellent blog!

Reply
servicii optimizare seo says:

August 19, 2013 at 8:27 am

Thanks in support of sharing such a fastidious thinking, article is pleasant,
thats why i have read it entirely

Reply
optica iasi says:

August 19, 2013 at 8:27 am

Wow, superb blog layout! How long have you been
blogging for? you made blogging look easy.
The overall look of your website is fantastic, as
well as the content!

Reply
http://blog.zespolxtrans.pl/?p=100&cpage=1&author=Joyce&email=joycefitzsimmonsgmail.com&url=httpdeejayevents.roservicii.html&comment=I+do+not+even+know+the+way+I+stopped+up+right+here+but+I+assumed+this+post+was+once+good.+I+dont+recog says:

August 20, 2013 at 2:33 am

When someone writes an piece of writing he/she maintains the image of a user in his/her brain that how a user can understand it.
So that’s why this post is great. Thanks!

Reply
sonorizari nunta says:

August 20, 2013 at 2:33 am

I wanted to thank you for this good read!! I absolutely enjoyed every bit of it.
I’ve got you bookmarked to check out new stuff you post…

Reply
love says:

May 16, 2014 at 6:57 am

I wwas able to find good information from your blog posts.

Reply
Pingback: How Not to Be Wrong: The Power of Mathematical Thinking | Short, Fat Matrices
Adrienne Hedden says:

January 29, 2015 at 8:22 am

The two wires moved different distances as the helix fell Jordon

Reply
superblymagazine854c8dad4c says:

September 24, 2025 at 7:44 pm

heres a pic of my one and 1/2 the shorter leg by the longer leg, and by pure coincidence, it actually touches on the error point of mine, which is nearly correct except for one nipple, and it touches it exactly on it!

mine is getting the minimum of y+(x/y)/2*x and x+(y/x)/2*y.

Reply