[Python-ideas] Float range class

Sat Jan 10 11:47:43 CET 2015

On Jan 10, 2015, at 2:17, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 10.01.2015 09:37, Andrew Barnert wrote:
>> On Jan 9, 2015, at 23:03, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
>> 
>>> On Jan 9, 2015, at 21:13, Guido van Rossum <guido at python.org> wrote:
>>> 
>>>> - numeric tricks to avoid intermediate overflows or numeric instabilities, plus optimizations
>>> 
>>> I'll leave this to others. (If no one steps up, I can look at the numpy code, but I'm sure this list has people who know it off the top of their heads.)
>> 
>> Actually, here's what numpy does:
>> 
>>    step = (stop-start)/float((num-1))
>>    y = arange(0, num) * step + start
>>    y[-1] = stop
>> 
>> De-vectorize that, and it's exactly your too-naive algorithm (except for the off-by-1), with a simple fix for inexact stop.
>> 
>> From a quick check, the timing is almost identical (6.89us vs. 6.49us to enumerate linspace(1, 2, 5000). 
>> 
>> It doesn't _normally_ accumulate rounding errors. For example, whether I use the same test, or much smaller numbers, every 5th element, and sometimes the one immediately after, is off by one digit, but the intervening elements are identical. However, if you get down to denormal values, then the underflow _does_ accumulate. For example, linspace(0, 2e-323, 10) should give (0, 0, 5e-324, 5e-324. ...), and the smart version does that, but the naive version--and numpy--gives all 0s. And if you build a simple 4.2 fixed-point type and ask for linspace(0, .1, 100), again you get all 0s. On the other hand, the naive version avoids overflow problems at the other end of the scale, so maybe that's a wash?
> 
> Hmm, I guess that depends on what floats you "normally" use ;-)
> You can't even represent 0.1 accurately in IEEE floats:

Well, sure, but a very small linspace having spurious underflows to 0 (with the naive version) or a very large one having spurious overflows to inf (with the clever one) is more serious a problem than 0.1 being off by part of one digit, isn't it?

> http://grouper.ieee.org/groups/754/faq.html#binary-decimal
> 
> Since we're not after performance here (use numpy for that), I
> think it's more important to get as accurate as possible and
> not have the last interval be larger or smaller than the rest,
> or having the interval points further right in the list be off
> of the exact value by an increasing amount (with positive or
> negative delta).

Sure, but all of that is true of both versions. Neither one accumulates errors, or has interval points further right in the list be off by an increasing amount (except in the case of very small/large, respectively, values, for which they both have that problem). Both versions have errors of up to one digit, distributed evenly throughout the range. So, I don't think your criteria help to judge, because they judge both versions to be equally good/bad.

The naive version happens to give exactly the same errors as np.linspace on most platforms. And the fact that it's used by numpy itself seems to imply that people have thought this through and chosen that implementation, so it's probably worth looking into why they did so. 

On the other hand, the clever version gives nicer-looking results on some common values (as your example shows). I can find cases where the naive version looks nicer, but they're clearly not going to come up as often.

The naive version works with types like datetime.

I don't know if any of those considerations are important enough to "win" here.

> The version I posted does not have these problems. It also has
> the usual fp rounding issues per interval value, but those errors
> don't accumulate, since you're not using the same delta
> across the whole range.
> 
> I'm sure it can be improved, but it's certainly a more accurate
> approach than the naive "accumulate rounding errors and then fix
> the end point" approach :-)

You'd think so, and yet tests don't seem to bear that out.

The reason for fixing the endpoint isn't to deal with accumulate errors (which, again, don't exist), just to deal with the possible single-digit error.

> def naive_frange(start, stop, steps):
>    start = float(start)
>    stop = float(stop)
>    steps = int(steps)
>    delta = (stop - start) / steps
>    for i in range(steps + 1):
>        yield start + i * delta
> 
> def frange(start, stop, steps):
>    start = float(start)
>    stop = float(stop)
>    steps = int(steps)
>    for i in range(steps + 1):
>        yield ((steps - i) * start + i * stop) / steps

You've got an off-by-one error in both versions. If you ask for 10 steps, you get 11 steps. (Guido's two versions had the exact same error--in fact, yours are just slightly rearranged, more verbose, iterator versions of his list comps.)

Also, the conversions are unnecessary for the case you're trying to solve (pass in integers and the /steps ensures that they'll be converted to float automatically), and harmful in cases you haven't thought of (your algorithm would work correctly for complex or np.float32, but the conversions prevent it from being used).

> Example:
> 
> naive_frange: [0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001,
> 0.8, 0.9, 1.0]
> 
> frange: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

Your own example demonstrates that the naive version _doesn't_ accumulate error. Otherwise, 0.9 would be farther off than 0.7, and it's not.

And again, look at my example. With 5000 numbers from 1 to 2, most are correct, but when index%5==0 they're off by one digit, and when index%5==1 they're sometimes off by one digit. The errors are almost perfectly distributed evenly through the range, being no larger or more common at the high end than the low.

So, unless you can find some examples that I can't (and explain why numpy would go with an erroneous implementation), I think you may be trying to fix a problem that doesn't exist (as both Guido and I did before you, and in exactly the same way).