[Offtopic] Line fitting [was Re: Numpy outlier removal]
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Jan 7 20:23:31 EST 2013
On Mon, 07 Jan 2013 22:32:54 +0000, Oscar Benjamin wrote:
> An example: Earlier today I was looking at some experimental data. A
> simple model of the process underlying the experiment suggests that two
> variables x and y will vary in direct proportion to one another and the
> data broadly reflects this. However, at this stage there is some
> non-normal variability in the data, caused by experimental difficulties.
> A subset of the data appears to closely follow a well defined linear
> pattern but there are outliers and the pattern breaks down in an
> asymmetric way at larger x and y values. At some later time either the
> sources of experimental variation will be reduced, or they will be
> better understood but for now it is still useful to estimate the
> constant of proportionality in order to check whether it seems
> consistent with the observed values of z. With this particular dataset I
> would have wasted a lot of time if I had tried to find a computational
> method to match the line that to me was very visible so I chose the line
> visually.
If you mean:
"I looked at the data, identified that the range a < x < b looks linear
and the range x > b does not, then used least squares (or some other
recognised, objective technique for fitting a line) to the data in that
linear range"
then I'm completely cool with that. That's fine, with the understanding
that this is the first step in either fixing your measurement problems,
fixing your model, or at least avoiding extrapolation into the non-linear
range.
But that is not fitting a line by eye, which is what I am talking about.
If on the other hand you mean:
"I looked at the data, identified that the range a < x < b looked linear,
so I laid a ruler down over the graph and pushed it around until I was
satisfied that the ruler looked more or less like it fitted the data
points, according to my guess of what counts as a close fit"
that *is* fitting a line by eye, and it is entirely subjective and
extremely dodgy for anything beyond quick and dirty back of the envelope
calculations[1]. That's okay if all you want is to get something within
an order of magnitude or so, or a line roughly pointing in the right
direction, but that's all.
[...]
> I also think it would
> be highly foolish to go so far with refusing to eyeball data that you
> would accept the output of some regression algorithm even when it
> clearly looks wrong.
I never said anything of the sort.
I said, don't fit lines to data by eye. I didn't say not to sanity check
your straight line fit is reasonable by eyeballing it.
[1] Or if your data is so accurate and noise-free that you hardly have to
care about errors, since there clearly is one and only one straight line
that passes through all the points.
--
Steven
More information about the Python-list
mailing list