Reverse Anscombe

At Cross Validated, someone asked about why they get wildly different histograms from the same data. The user Glen_b gave an excellent answer based around an example for which data sets which differ from each other just by adding a constant have very different-looking histograms. Other commenters suggest using kernel density estimates or cumulative distribution plots, both of which wouldn’t fail on this particular question.

Anscombe’s quartet comes to mind – four bivariate data sets with the same mean and variance of each coordinate and the same correlation, which look wildly different when plotted. This is sort of a reverse-Anscombe: here data sets that look essentially the same when plotted have wildly different summary statistics.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s