On Mon, Oct 8, 2012 at 12:03 PM, Rob Cliffe firstname.lastname@example.org wrote:
On 08/10/2012 19:39, Guido van Rossum wrote:
Does this mean that the following behaviour of lists is a bug?
> > x=float('NAN') > [x]==[x], [x]<=[x], [x]>=[x]
(True, True, True)
No. That's a special case in the comparisons for sequences.
[Now that I'm back at a real keyboard I can elaborate...]
This applies to all container comparisons: without the rule that if two contained items reference the same object they are to be considered equal without calling their __eq__, containers couldn't take the shortcut that a container is always equal to itself (i.e. c1 is c2 => c1 == c2). Without this shortcut, container comparisons would be much more expensive: any time a large container was compared to itself, it would be forced to recursively compare all the contained items. You might say that it has to do this anyway when comparing to a container that is not itself, but if the anser is "unequal" the comparison can stop as soon as two unequal items are found, whereas if the answer is "equal" you end up comparing all items. For two different containers there is no possible shortcut, but comparing a container to itself is quite common and really does deserve the shortcut. We discussed this in the past and always came to the same conclusion: despite the rules for NaN, the shortcut for containers is required. A similar shortcut exists for 'x in [x]' BTW.
Thank you for elaborating, I was going to ask what the justification for the special case was. You have explained why
x=float('NAN'); A=[x]; A==A
but not as far as I can see why
x=float('NAN'); A=[x]; B=[x]; A==B, [x]=[x]
where neither of the results is comparing a container to itself.
It's so that when the container is iterating over pairs of elements it can check for item identity (a simple pointer comparison) first, which makes a pretty big difference in speed.