[Numpy-discussion] 0/0 == 0?

Fri Oct 3 21:17:33 EDT 2014

On Sat, Oct 4, 2014 at 12:40 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Sat, Oct 4, 2014 at 12:21 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Fri, Oct 3, 2014 at 8:12 AM, Robert Kern <robert.kern at gmail.com> wrote:
>>> On Fri, Oct 3, 2014 at 4:29 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris
>>>> <charlesr.harris at gmail.com> wrote:
>>>>>
>>>>> On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>>>>>
>>>>>> Out[1] has an integer divided by an integer, and you can't represent nan
>>>>>> as an integer. Perhaps something weird was happening with type promotion
>>>>>> between versions?
>>>>>
>>>>>
>>>>> Also note that in python3 the '/' operator does float rather than integer
>>>>> division.
>>>>>
>>>>>>>> np.array(0) / np.array(0)
>>>>> __main__:1: RuntimeWarning: invalid value encountered in true_divide
>>>>> nan
>>>>
>>>> Floor division still acts the same though:
>>>>
>>>>>>> np.array(0) // np.array(0)
>>>> __main__:1: RuntimeWarning: divide by zero encountered in floor_divide
>>>> 0
>>>>
>>>> The seterr warning system makes a lot of sense for IEEE754 floats,
>>>> which are specifically designed so that 0/0 has a unique well-defined
>>>> answer. For ints though this seems really broken to me. 0 / 0 = 0 is
>>>> just the wrong answer. It would be nice if we had something reasonable
>>>> to return, but we don't, and I'd rather raise an error than return the
>>>> wrong answer.
>>>
>>> Well, actually, that's the really nice thing about seterr for ints!
>>> CPUs have hardware floating point exception flags to work with. We had
>>> to build one for ints. If you want an error, you can get an error. *I*
>>> don't want an error, and I don't have to have one!
>>
>> Sure, that's fine for integer computations corner cases that have
>> well-defined outputs, like wraparound. But it doesn't make sense for
>> divide-by-zero.
>>
>> The key thing about the IEEE754 exception design is that it gives you
>> the option of either raising an error immediately or else letting it
>> propagate through the computation as a nan until you reach an
>> appropriate place to handle it.
>>
>> With ints we don't have nan, so we don't have the second option. Our
>> options are either raise an error immediately, or else return some
>> nonsense value that will just cause you to get some meaningless
>> result, with no way to detect or recover from this situation. (Why
>> don't we define 0 / 0 == -72? It would make just as much sense.)
>>
>> The second option is terrible enough that I kinda don't believe you
>> when you say you want it. Maybe I'm missing something but...
>
> I fix the values after-the-fact because one *can* detect and recover
> from this situation with just a smidgen of forethought.
>
> <not-real-code>
>
> mask = (denominator == 0)
> x = numerator // denominator
> # We don't care about the masked cases. Fill them with a value that
> # will be harmless/ignored downstream. Here, it's 0. It might be something
> # else in other contexts.
> x[mask] = 0
>
> </not-real-code>

I don't find this argument very convincing, except as an argument for
having a deprecation period. In the unusual case where this is what
you want, it's trivial and more explicit to write it directly --
bringing errstate into it is just rube goldbergian. E.g.:

mask = (denominator == 0)
x = np.floor_divide(numerator, denominator, where=~mask)
x[mask] = 0

>> Even more egregiously, numpy currently treats the integer
>> divide-by-zero case identically with the floating-point one -- so if
>> you want 0 / 0 to be an error (as you have to if you care about
>> getting correct results), then you have to make 0.0 / 0.0 an error as
>> well.
>
> If you would like to introduce a separate `integer_divide` setting for
> errstate() and make it raise by default, I'd be marginally okay with
> that. In the above pattern, I'd be wrapping it with an errstate()
> context manager anyways to silence the warning, so silencing the
> default exception would be just as easy. However, nothing else in
> errstate() raises by default, so this would be the odd special case.

In a perfect world (which may or not match the actual world) I
actually would prefer integer wraparound to be treated as its own
category (instead of being lumped with float overflow-to-inf), and to
raise by default (with the option to enable it explicitly). Unexpected
inf's give you correct results in some cases and obviously-broken
results in others; unexpected wraparound tends to produce silent bugs
in basically all integer using code [1]; lumping them together is
pretty suboptimal. But that's a whole 'nother discussion...

-n

[1] http://googleresearch.blogspot.co.uk/2006/06/extra-extra-read-all-about-it-nearly.html

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org