On 10/10/12 2:25 AM, Mike Graham wrote:
I'm sometimes surprised at the creativity and passion behind solutions to this issue.
I've been a Python user for some years now, including time dealing with stuff like numpy where you're fairly likely to run into NaNs. I've been an active member of several support communities where I can confidently say I have encountered tens of thousands of Python questions. Not once can I recall ever having or seeing anyone have an actual problem that I had or someone else had due to the way Python handles NaN. As far as I can tell, it works _perfectly_.
I appreciate the aesthetic concerns, but I really wish someone would explain to me what's actually broken and in need of fixing.
While I also don't think that anything needs to be fixed, I must say that in my years of monitoring tens of thousands of Python questions, there have been a few legitimate problems with the NaN behavior. It does come up from time to time.
The most frequent problem is checking if a list contains a NaN. The obvious thing to do for many users:
nan in list_of_floats
This is a reasonable prediction based on what one normally does for most objects in Python, but this is quite wrong. But because list.__contains__() checks for identity first, it can look like it works when people test it out:
nan = float('nan') nan in [1.0, 2.0, nan]
Then they write their code doing the wrong thing thinking that they tested their approach.
I classify this as a wart: it breaks reasonable predictions from users, requires more exceptions-based knowledge about NaNs to use correctly, and can trap users who do try to experiment to determine the behavior. But I think that the cost of acquiring and retaining such knowledge is not so onerous as to justify the cost of any of the attempts to fix the wart.
The other NaN wart (unrelated to this thread) is that sorting a list of floats containing a NaN will usually leave the list unsorted because "inequality comparisons with a NaN always return False" breaks the assumptions of timsort and other sorting algorithms. You should remember this, as you once demonstrated the problem:
This is a real problem, so much so that numpy works around it by enforcing our sorts to always sort NaN at the end of the array. Unfortunately, lists do not have the luxury of cheaply knowing the type of all of the objects in the list, so this is not an option for them.
Real problems, but nothing that motivates a change, in my opinion.