[Offtopic] Line fitting [was Re: Numpy outlier removal]
steve+comp.lang.python at pearwood.info
Tue Jan 8 03:06:35 CET 2013
On Tue, 08 Jan 2013 06:43:46 +1100, Chris Angelico wrote:
> On Tue, Jan 8, 2013 at 4:58 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> Anyone can fool themselves into placing a line through a subset of non-
>> linear data. Or, sadly more often, *deliberately* cherry picking fake
>> clusters in order to fool others. Here is a real world example of what
>> happens when people pick out the data clusters that they like based on
>> visual inspection:
> And sensible people will notice that, even drawn like that, it's only a
> ~0.6 deg increase across ~30 years. Hardly statistically significant,
Well, I don't know about "sensible people", but magnitude of an effect
has little to do with whether or not something is statistically
significant or not. Given noisy data, statistical significance relates to
whether or not we can be confident that the effect is *real*, not whether
it is a big effect or a small effect.
Here's an example: assume that you are on a fixed salary with a constant
weekly income. If you happen to win the lottery one day, and consequently
your income for that week quadruples, that is a large effect that fails
to have any statistical significance -- it's a blip, not part of any long-
term change in income. You can't conclude that you'll win the lottery
every week from now on.
On the other hand, if the government changes the rules relating to tax,
deductions, etc., even by a small amount, your weekly income might go
down, or up, by a single dollar. Even though that is a tiny effect, it is
*not* a blip, and will be statistically significant. In practice, it
takes a certain number of data points to reach that confidence level.
Your accountant, who knows the tax laws, will conclude that the change is
real immediately, but a statistician who sees only the pay slips may take
some months before she is convinced that the change is signal rather than
noise. With only three weeks pay slips in hand, the statistician cannot
be sure that the difference is not just some accounting error or other
fluke, but each additional data point increases the confidence that the
difference is real and not just some temporary aberration.
The other meaning of "significant" has nothing to do with statistics, and
everything to do with "a difference is only a difference if it makes a
difference". 0.2° per decade doesn't sound like much, not when we
consider daily or yearly temperatures that typically have a range of tens
of degrees between night and day, or winter and summer. But that is
misunderstanding the nature of long-term climate versus daily weather and
glossing over the fact that we're only talking about an average and
ignoring changes to the variability of the climate: a small increase in
average can lead to a large increase in extreme events.
> given that weather patterns have been known to follow cycles at least
> that long.
That is not a given. "Weather patterns" don't last for thirty years.
Perhaps you are talking about climate patterns? In which case, well, yes,
we can see a very strong climate pattern of warming on a time scale of
decades, with no evidence that it is a cycle.
There are, of course, many climate cycles that take place on a time frame
of years or decades, such as the North Atlantic Oscillation and the El
Nino Southern Oscillation. None of them are global, and as far as I know
none of them are exactly periodic. They are noise in the system, and
certainly not responsible for linear trends.
More information about the Python-list