[Python-Dev] range objects in 3.x
Fernando Perez
fperez.net at gmail.com
Wed Sep 28 22:55:27 CEST 2011
On Tue, 27 Sep 2011 11:25:48 +1000, Steven D'Aprano wrote:
> The audience for numpy is a small minority of Python users, and they
Certainly, though I'd like to mention that scientific computing is a major
success story for Python, so hopefully it's a minority with something to
contribute <wink>
> tend to be more sophisticated. I'm sure they can cope with two functions
> with different APIs <wink>
No problem with having different APIs, but in that case I'd hope the
builtin wouldnt' be named linspace, to avoid confusion. In numpy/scipy we
try hard to avoid collisions with existing builtin names, hopefully in
this case we can prevent the reverse by having a dialogue.
> While continuity of API might be a good thing, we shouldn't accept a
> poor API just for the sake of continuity. I have some criticisms of the
> linspace API.
>
> numpy.linspace(start, stop, num=50, endpoint=True, retstep=False)
>
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
>
> * It returns a sequence, which is appropriate for numpy but in standard
> Python it should return an iterator or something like a range object.
Sure, no problem there.
> * Why does num have a default of 50? That seems to be an arbitrary
> choice.
Yup. linspace was modeled after matlab's identically named command:
http://www.mathworks.com/help/techdoc/ref/linspace.html
but I have no idea why the author went with 50 instead of 100 as the
default (not that 100 is any better, just that it was matlab's choice).
Given how linspace is often used for plotting, 100 is arguably a more
sensible choice to get reasonable graphs on normal-resolution displays at
typical sizes, absent adaptive plotting algorithms.
> * It arbitrarily singles out the end point for special treatment. When
> integrating, it is just as common for the first point to be singular as
> the end point, and therefore needing to be excluded.
Numerical integration is *not* the focus of linspace(): in numerical
integration, if an end point is singular you have an improper integral and
*must* approach the singularity much more carefully than by simply
dropping the last point and hoping for the best. Whether you can get away
by using (desired_end_point - very_small_number) --the dumb, naive
approach-- or not depends a lot on the nature of the singularity.
Since numerical integration is a complex and specialized domain and the
subject of an entire subcomponent of the (much bigger than numpy) scipy
library, there's no point in arguing the linspace API based on numerical
integration considerations.
Now, I *suspect* (but don't remember for sure) that the option to have it
right-hand-open-ended was to match the mental model people have for range:
In [5]: linspace(0, 10, 10, endpoint=False)
Out[5]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
In [6]: range(0, 10)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
I'm not arguing this was necessarily a good idea, just my theory on how it
came to be. Perhaps R. Kern or one of the numpy lurkers in here will
pitch in with a better recollection.
> * If you exclude the end point, the stepsize, and hence the values
> returned, change:
>
> >>> linspace(1, 2, 4)
> array([ 1. , 1.33333333, 1.66666667, 2. ])
> >>> linspace(1, 2, 4, endpoint=False)
> array([ 1. , 1.25, 1.5 , 1.75])
>
> This surprises me. I expect that excluding the end point will just
> exclude the end point, i.e. return one fewer point. That is, I expect
> num to count the number of subdivisions, not the number of points.
I find it very natural. It's important to remember that *the whole point*
of linspace's existence is to provide arrays with a known, fixed number of
points:
In [17]: npts = 10
In [18]: len(linspace(0, 5, npts))
Out[18]: 10
In [19]: len(linspace(0, 5, npts, endpoint=False))
Out[19]: 10
So the invariant to preserve is *precisely* the number of points, not the
step size. As Guido has pointed out several times, the value of this
function is precisely to steer people *away* from thinking of step sizes
in a context where they are more likely than not going to get it wrong.
So linspace focuses on a guaranteed number of points, and lets the step
size chips fall where they may.
> * The retstep argument changes the return signature from => array to =>
> (array, number). I think that's a pretty ugly thing to do. If linspace
> returned a special iterator object, the step size could be exposed as an
> attribute.
Yup, it's not pretty but understandable in numpy's context, a library that
has a very strong design focus around arrays, and numpy arrays don't have
writable attributes:
In [20]: a = linspace(0, 10)
In [21]: a.stepsize = 0.1
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/fperez/<ipython-input-21-ded7f1198857> in <module>()
----> 1 a.stepsize = 0.1
AttributeError: 'numpy.ndarray' object has no attribute 'stepsize'
So while not the most elegant solution (and I agree that with a different
return object a different approach can be taken), I think it's a practical
compromise that works well for numpy.
> * I'm not sure that start/end/count is a better API than
> start/step/count.
Guido has argued this point quite well, I think, but let me add that many
years of experience and millions of lines of numerical code beg to
differ. start/end/count is *precisely* the right api for this problem,
and exposing step directly is very much the wrong thing to do here.
I should add that numpy does provide an 'arange' function that does match
the built-in range() api, but returns an array instead of a list/
iterator. This function does happen to allow for floating-point steps,
but does come with the following warning about them in its docstring:
Docstring:
arange([start,] stop[, step,], dtype=None, maskna=False)
Return evenly spaced values within a given interval.
Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
but returns a ndarray rather than a list.
When using a non-integer step, such as 0.1, the results will often not
be consistent. It is better to use ``linspace`` for these cases.
# END docstring
> * This one is pure bike-shedding: I don't like the name linspace.
Sure, in numpy's case it was chosen purely to make existing matlab users
more comfortable, I think. I don't particularly like it either (I don't
come from a matlab background myself), FWIW.
I do hope, though, that the chosen name is *not*:
- 'interval'. An interval in mathematics has a strong notion of only
endpoints, containing all elements between its endpoints in the underlying
ordered set.
- 'interpolate' or similar: numerical interpolation is a whole 'nother
topic and I think this name would be more likely to confuse people
expecting function interpolation than anything.
But thanks for looking into this, and I do hope that feedback from the
numpy/scipy users and accumulated experience is useful.
Cheers,
f
More information about the Python-Dev
mailing list