[Python-ideas] checking for identity before comparing built-in objects
Robert Kern
robert.kern at gmail.com
Wed Oct 10 15:23:38 CEST 2012
On 10/10/12 2:25 AM, Mike Graham wrote:
> I'm sometimes surprised at the creativity and passion behind solutions
> to this issue.
>
> I've been a Python user for some years now, including time dealing
> with stuff like numpy where you're fairly likely to run into NaNs.
> I've been an active member of several support communities where I can
> confidently say I have encountered tens of thousands of Python
> questions. Not once can I recall ever having or seeing anyone have an
> actual problem that I had or someone else had due to the way Python
> handles NaN. As far as I can tell, it works _perfectly_.
>
> I appreciate the aesthetic concerns, but I really wish someone would
> explain to me what's actually broken and in need of fixing.
While I also don't think that anything needs to be fixed, I must say that in my
years of monitoring tens of thousands of Python questions, there have been a few
legitimate problems with the NaN behavior. It does come up from time to time.
The most frequent problem is checking if a list contains a NaN. The obvious
thing to do for many users:
nan in list_of_floats
This is a reasonable prediction based on what one normally does for most objects
in Python, but this is quite wrong. But because list.__contains__() checks for
identity first, it can look like it works when people test it out:
>>> nan = float('nan')
>>> nan in [1.0, 2.0, nan]
True
Then they write their code doing the wrong thing thinking that they tested their
approach.
I classify this as a wart: it breaks reasonable predictions from users, requires
more exceptions-based knowledge about NaNs to use correctly, and can trap users
who do try to experiment to determine the behavior. But I think that the cost of
acquiring and retaining such knowledge is not so onerous as to justify the cost
of any of the attempts to fix the wart.
The other NaN wart (unrelated to this thread) is that sorting a list of floats
containing a NaN will usually leave the list unsorted because "inequality
comparisons with a NaN always return False" breaks the assumptions of timsort
and other sorting algorithms. You should remember this, as you once demonstrated
the problem:
http://mail.python.org/pipermail/python-ideas/2011-April/010063.html
This is a real problem, so much so that numpy works around it by enforcing our
sorts to always sort NaN at the end of the array. Unfortunately, lists do not
have the luxury of cheaply knowing the type of all of the objects in the list,
so this is not an option for them.
Real problems, but nothing that motivates a change, in my opinion.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-ideas
mailing list