[Python-ideas] checking for identity before comparing built-in objects

Wed Oct 10 15:23:38 CEST 2012

On 10/10/12 2:25 AM, Mike Graham wrote:

> I'm sometimes surprised at the creativity and passion behind solutions
> to this issue.
>
> I've been a Python user for some years now, including time dealing
> with stuff like numpy where you're fairly likely to run into NaNs.
> I've been an active member of several support communities where I can
> confidently say I have encountered tens of thousands of Python
> questions. Not once can I recall ever having or seeing anyone have an
> actual problem that I had or someone else had due to the way Python
> handles NaN. As far as I can tell, it works _perfectly_.
>
> I appreciate the aesthetic concerns, but I really wish someone would
> explain to me what's actually broken and in need of fixing.

While I also don't think that anything needs to be fixed, I must say that in my 
years of monitoring tens of thousands of Python questions, there have been a few 
legitimate problems with the NaN behavior. It does come up from time to time.

The most frequent problem is checking if a list contains a NaN. The obvious 
thing to do for many users:

   nan in list_of_floats

This is a reasonable prediction based on what one normally does for most objects 
in Python, but this is quite wrong. But because list.__contains__() checks for 
identity first, it can look like it works when people test it out:

   >>> nan = float('nan')
   >>> nan in [1.0, 2.0, nan]
   True

Then they write their code doing the wrong thing thinking that they tested their 
approach.

I classify this as a wart: it breaks reasonable predictions from users, requires 
more exceptions-based knowledge about NaNs to use correctly, and can trap users 
who do try to experiment to determine the behavior. But I think that the cost of 
acquiring and retaining such knowledge is not so onerous as to justify the cost 
of any of the attempts to fix the wart.

The other NaN wart (unrelated to this thread) is that sorting a list of floats 
containing a NaN will usually leave the list unsorted because "inequality 
comparisons with a NaN always return False" breaks the assumptions of timsort 
and other sorting algorithms. You should remember this, as you once demonstrated 
the problem:

   http://mail.python.org/pipermail/python-ideas/2011-April/010063.html

This is a real problem, so much so that numpy works around it by enforcing our 
sorts to always sort NaN at the end of the array. Unfortunately, lists do not 
have the luxury of cheaply knowing the type of all of the objects in the list, 
so this is not an option for them.

Real problems, but nothing that motivates a change, in my opinion.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco