[Offtopic] Line fitting [was Re: Numpy outlier removal]

Chris Angelico
Mon Jan 7 20:43:46 CET 2013

Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> Anyone can fool themselves into placing a line through a subset of non-
> linear data. Or, sadly more often, *deliberately* cherry picking fake
> clusters in order to fool others. Here is a real world example of what
> happens when people pick out the data clusters that they like based on
> visual inspection:
> http://www.skepticalscience.com/images/TempEscalator.gif

And sensible people will notice that, even drawn like that, it's only
a ~0.6 deg increase across ~30 years. Hardly statistically
significant, given that weather patterns have been known to follow
cycles at least that long. But that's nothing to do with drawing lines
through points, and more to do with how much data you collect before
you announce a conclusion, and how easily a graph can prove any point
you like.

Statistical analysis is a huge science. So is lying. And I'm not sure
most people can pick one from the other.


