
Mark Dickinson writes:
On Wed, Mar 24, 2010 at 5:36 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Steven D'Aprano writes:
> I suspect that's a feature, not a bug.
Right: distinct nans (i.e., those with different id()) are treated as distinct set elements or dict keys.
I don't see how it can be so. Aren't all of those entries garbage? To compute a histogram of results for computations on a series of cases would you not have to test each result for NaN-hood, then hash on a proxy such as the string "Nan"?
So what alternative behaviour would you suggest, and how would you implement it?
I don't have an alternative behavior to suggest. I'm not suggesting that it's a bug, I'm suggesting that it's a wart: useless, ugly, and in some presumably rare/buggy cases, it could lead to nasty behavior. The example I have in mind is computing a histogram of function values for a very large sample of inputs. (This is a pathological example, of course: things where NaNs are representable generally won't be used directly as keys in a dictionary used to represent a histogram. Rather, they would be mapped to a representative value as the key.) If there are a lot of NaN's, the dictionary could get unexpectedly large. That's not Python's fault, of course:
Meanwhile IEEE 754 requires that nans compare unequal to themselves, breaking reflexivity. So there have to be some compromises somewhere.
Indeed. IEEE 754 compatibility *is* a feature.
One alternative would be to prohibit putting nans into sets and dicts by making them unhashable; I'm not sure what that would gain, though.
I would find that more intuitive. While NaNs aren't mutable, they're similar to mutable values in that their value is not deterministic in a certain sense. OTOH, since the only example I can think of where I would personally want to check whether a NaN is in a container is pathological, my intuition is hardly reliable.