[Python-ideas] Float range class

Thu Jan 8 23:14:24 CET 2015

On Thu, Jan 8, 2015 at 2:05 PM, Chris Barker <chris.barker at noaa.gov> wrote:

> I would think that a floating range class would necessarily use
>> multiplication rather than repeated addition (to allow indexing at
>> arbitrary point), which would avoid cumulative floating-point errors
>> (although it would still have a smaller floating point error at the end),
>> and for the same reason the final value would have to be pre-computed
>> rather than using a naive ">=" which would allow it to be a bit smarter.
>>
>
> That's the trick -- range() (and arange) does not specify a final value,
> it specifies a final value not to include. This is well defined and easy to
> understand for integers, but not so for floating point. But you are right
> about the multiplication and pre-computing of final value -- that's a good
> reason to provide this as a built-in -- it's very easy to implement, but
> even easier to implement badly.
>

The implementation based on `start + i*step` doesn't solve the problem of
"unexpected" off-by-one sequences ("unexpected", that is, for those not
familiar with the vagaries of floating point calculations).

Suppose frange uses the "half open" convention, and consider

    frange(start=0.3, stop=0.9, step=0.3)

With perfect math, it should be [0.3, 0.6]. But check:

    >>> start = 0.3
    >>> stop = 0.9
    >>> step = 0.3
    >>> [start + i*step for i in range(4) if start + i*step < stop]
    [0.3, 0.6, 0.8999999999999999]

Now suppose frange uses the "closed" convention, and consider

    frange(start=0.4, stop=1.2, step=0.4)

Here we naively expect [0.4, 0.8, 1.2].  Take a look at what we get:

    >>> start = 0.4
    >>> stop = 1.2
    >>> step = 0.4
    >>> [start + i*step for i in range(4) if start + i*step <= stop]
    [0.4, 0.8]

That's because the third value is actually 1.2000000000000002:

    >>> [start + k*step for k in range(3)]
    [0.4, 0.8, 1.2000000000000002]

I've fixed at least two bugs in scipy because of exactly this type of naive
use of numpy's arange function.  The fix is to use np.linspace, as Chris
explained.  These days, I only use np.arange with integers (or integral
floating point values);  if I want a uniformally spaced sequence of floats
with a non-integer step, I use np.linspace.  If I don't want the last
point, I use the `endpoint=False` argument in `np.linspace` (e.g. `x =
np.linspace(0, 1, num=4, endpoint=False)` generates [0, 0.25, 0.5, 0.75].

Warren

(Chris, sorry for the duplicate email.  Once again I forgot to "reply all".)

>  But, at least in my own experience, I use arange when I want an
>>> interval-based range, and linspace when I want a count-based range.
>>>
>>
> I would argue (and do!) that you should not do this -- if you know what
> you are doing with FP, then fine, but it really is tricky. You would be
> better off computing the count you want then then using linspace anyway. I
> suppose an interval-based API to something like linspace would be a nice
> convenience, though.
>
> I haven't managed to come up with a quick an easy example where this
> matters, but they DO happen.
>
> I guess I'm arguing that a range-like object for FP should be a closed
> rather than open interval -- specifying the starting and end points. That
> is because defining an open interval where numbers are of finite, but hard
> to know know in advance the interval, is just too ugly and complex.
>
> I see your point that sometimes you want a specific delta, and sometimes
> you want a specific end point, but I supect that most of the time you want
> a specific delta you ALSO want a specific and point and/or want to know how
> many values you are going to get.
>
> In fact, in the most common use of integer range, you really are defining
> the number of values you want.
>
> And note that the range convention of starting at zero and not including
> the stop value was designed to match python indexing convention, i.e.:
>
> for i in range( len(sequence) ):
>     ...
>
> is natural an easy to write, and does what's expected. And also:
>
> for i in range(n):
>     ....
>
> will loop n times.
>
> but neither of these apply to floating point ranges.
>
> -Chris
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150108/7db3a4e3/attachment.html>