[Python-ideas] checking for identity before comparing built-in objects
Guido van Rossum
guido at python.org
Mon Oct 8 21:46:43 CEST 2012
On Mon, Oct 8, 2012 at 12:03 PM, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
>
> On 08/10/2012 19:39, Guido van Rossum wrote:
>>
>> Does this mean that the following behaviour of lists is a bug?
>>>>>>>
>>>>>>> x=float('NAN')
>>>>>>> [x]==[x], [x]<=[x], [x]>=[x]
>>>>
>>>> (True, True, True)
>>>
>>> No. That's a special case in the comparisons for sequences.
>>
>> [Now that I'm back at a real keyboard I can elaborate...]
>>
>> This applies to all container comparisons: without the rule that if
>> two contained items reference the same object they are to be
>> considered equal without calling their __eq__, containers couldn't
>> take the shortcut that a container is always equal to itself (i.e. c1
>> is c2 => c1 == c2). Without this shortcut, container comparisons would
>> be much more expensive: any time a large container was compared to
>> itself, it would be forced to recursively compare all the contained
>> items. You might say that it has to do this anyway when comparing to a
>> container that is not itself, but if the anser is "unequal" the
>> comparison can stop as soon as two unequal items are found, whereas if
>> the answer is "equal" you end up comparing all items. For two
>> different containers there is no possible shortcut, but comparing a
>> container to itself is quite common and really does deserve the
>> shortcut. We discussed this in the past and always came to the same
>> conclusion: despite the rules for NaN, the shortcut for containers is
>> required. A similar shortcut exists for 'x in [x]' BTW.
>>
> Thank you for elaborating, I was going to ask what the justification for the
> special case was.
> You have explained why
>
>>>> x=float('NAN'); A=[x]; A==A
> True
>
> but not as far as I can see why
>
>>>> x=float('NAN'); A=[x]; B=[x]; A==B, [x]=[x]
> (True, True)
>
> where neither of the results is comparing a container to itself.
It's so that when the container is iterating over pairs of elements it
can check for item identity (a simple pointer comparison) first, which
makes a pretty big difference in speed.
--
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas
mailing list