[Python-Dev] range objects in 3.x

Wed Sep 28 22:55:27 CEST 2011

On Tue, 27 Sep 2011 11:25:48 +1000, Steven D'Aprano wrote:

> The audience for numpy is a small minority of Python users, and they

Certainly, though I'd like to mention that scientific computing is a major 
success story for Python, so hopefully it's a minority with something to 
contribute <wink>

> tend to be more sophisticated. I'm sure they can cope with two functions
> with different APIs <wink>

No problem with having different APIs, but in that case I'd hope the 
builtin wouldnt' be named linspace, to avoid confusion.  In numpy/scipy we 
try hard to avoid collisions with existing builtin names, hopefully in 
this case we can prevent the reverse by having a dialogue.

> While continuity of API might be a good thing, we shouldn't accept a
> poor API just for the sake of continuity. I have some criticisms of the
> linspace API.
> 
> numpy.linspace(start, stop, num=50, endpoint=True, retstep=False)
> 
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
> 
> * It returns a sequence, which is appropriate for numpy but in standard
> Python it should return an iterator or something like a range object.

Sure, no problem there.

> * Why does num have a default of 50? That seems to be an arbitrary
> choice.

Yup.  linspace was modeled after matlab's identically named command:

http://www.mathworks.com/help/techdoc/ref/linspace.html

but I have no idea why the author went with 50 instead of 100 as the 
default (not that 100 is any better, just that it was matlab's choice).  
Given how linspace is often used for plotting, 100 is arguably a more 
sensible choice to get reasonable graphs on normal-resolution displays at 
typical sizes, absent adaptive plotting algorithms.

> * It arbitrarily singles out the end point for special treatment. When
> integrating, it is just as common for the first point to be singular as
> the end point, and therefore needing to be excluded.

Numerical integration is *not* the focus of linspace(): in numerical 
integration, if an end point is singular you have an improper integral and 
*must* approach the singularity much more carefully than by simply 
dropping the last point and hoping for the best.  Whether you can get away 
by using (desired_end_point - very_small_number) --the dumb, naive 
approach-- or not depends a lot on the nature of the singularity.

Since numerical integration is a complex and specialized domain and the 
subject of an entire subcomponent of the (much bigger than numpy) scipy 
library, there's no point in arguing the linspace API based on numerical 
integration considerations.

Now, I *suspect* (but don't remember for sure) that the option to have it 
right-hand-open-ended was to match the mental model people have for range:

In [5]: linspace(0, 10, 10, endpoint=False)
Out[5]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [6]: range(0, 10)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

I'm not arguing this was necessarily a good idea, just my theory on how it 
came to be.  Perhaps R. Kern or one of the numpy lurkers in here will 
pitch in with a better recollection.

> * If you exclude the end point, the stepsize, and hence the values
> returned, change:
> 
>  >>> linspace(1, 2, 4)
> array([ 1.        ,  1.33333333,  1.66666667,  2.        ])
>  >>> linspace(1, 2, 4, endpoint=False)
> array([ 1.  ,  1.25,  1.5 ,  1.75])
> 
> This surprises me. I expect that excluding the end point will just
> exclude the end point, i.e. return one fewer point. That is, I expect
> num to count the number of subdivisions, not the number of points.

I find it very natural.  It's important to remember that *the whole point* 
of linspace's existence is to provide arrays with a known, fixed number of 
points:

In [17]: npts = 10

In [18]: len(linspace(0, 5, npts))
Out[18]: 10

In [19]: len(linspace(0, 5, npts, endpoint=False))
Out[19]: 10

So the invariant to preserve is *precisely* the number of points, not the 
step size.  As Guido has pointed out several times, the value of this 
function is precisely to steer people *away* from thinking of step sizes 
in a context where they are more likely than not going to get it wrong.  
So linspace focuses on a guaranteed number of points, and lets the step 
size chips fall where they may.

> * The retstep argument changes the return signature from => array to =>
> (array, number). I think that's a pretty ugly thing to do. If linspace
> returned a special iterator object, the step size could be exposed as an
> attribute.

Yup, it's not pretty but understandable in numpy's context, a library that 
has a very strong design focus around arrays, and numpy arrays don't have 
writable attributes:

In [20]: a = linspace(0, 10)

In [21]: a.stepsize = 0.1
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/fperez/<ipython-input-21-ded7f1198857> in <module>()
----> 1 a.stepsize = 0.1

AttributeError: 'numpy.ndarray' object has no attribute 'stepsize'

So while not the most elegant solution (and I agree that with a different 
return object a different approach can be taken), I think it's a practical 
compromise that works well for numpy.

> * I'm not sure that start/end/count is a better API than
> start/step/count.

Guido has argued this point quite well, I think, but let me add that many 
years of experience and millions of lines of numerical code beg to 
differ.  start/end/count is *precisely* the right api for this problem, 
and exposing step directly is very much the wrong thing to do here.

I should add that numpy does provide an 'arange' function that does match 
the built-in range() api, but returns an array instead of a list/
iterator.  This function does happen to allow for floating-point steps, 
but does come  with the following warning about them in its docstring:

Docstring:
arange([start,] stop[, step,], dtype=None, maskna=False)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
but returns a ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not
be consistent.  It is better to use ``linspace`` for these cases.

# END docstring

> * This one is pure bike-shedding: I don't like the name linspace.

Sure, in numpy's case it was chosen purely to make existing matlab users 
more comfortable, I think.  I don't particularly like it either (I don't 
come from a matlab background myself), FWIW.

I do hope, though, that the chosen name is *not*:

- 'interval'.  An interval in mathematics has a strong notion of only 
endpoints, containing all elements between its endpoints in the underlying 
ordered set.

- 'interpolate' or similar: numerical interpolation is a whole 'nother 
topic and I think this name would be more likely to confuse people 
expecting function interpolation than anything.

But thanks for looking into this, and I do hope that feedback from the 
numpy/scipy users and accumulated experience is useful.

Cheers,

f