[Python-Dev] unicode hell/mixing str and unicode as dictionary keys

Ron Adam rrr at ronadam.com
Mon Aug 7 23:37:42 CEST 2006


Michael Foord wrote:
> David Hopwood wrote:[snip..]
>>   
>>>> we should, of course, continue to use the one we always used (for
>>>> "ascii", there is no difference between the two).
>>>>       
>>> +1
>>>
>>> This seems the most (only ?) logical solution.
>>>     
>> No; always considering Unicode and non-ASCII byte strings to be distinct
>> is just as logical.

Yes, that's true.  (But can't be done prior to P3k of course.) Consider 
the comparison of ...

    [3] == (3,)   ->  False

These are not the same thing even though it may be trivial to treat them 
as being equivalent.  So how smart should a equivalence comparison be? 
I think testing for interchangeability and/or taking into account 
context is going down a very difficult road.  Which is what the string 
to Unicode comparison does by making an assumption that the string type 
is in the default encoding, which it may not be.

Purity in this would insist that comparing floats and integers always 
return False, but there is little ambiguity when it comes to whether 
numerical values are equivalent or not.  The rules for their comparisons 
are fairly well established.  So numerical equivalence can be the 
exception when comparing values of differing types and its the expected 
behavior as well as the established practice in programming.


> Except there has been an implicit promise in Python for years now that 
> ascii byte-strings will compare equally to the unicode equivalent: lots 
> of code assumes this. Breaking this is fine in principle - but for Py3K 
> not Py 2.x.

Also True.  And I hope that a bytes to Unicode comparison in Py3k will 
always returns False just like [3] == (3,) always returns False.


> That means Martin's solution is the best for the current problem. (IMHO 
> of course...)

I think (IMHO) in this particular case, maintaining "backwards 
compatibility" should take precedence (until Py3k) and be the stated 
reason for the continued behavior in the documents as well.  And so 
Unicode to String comparisons should be the second exception to not 
doing data form conversions when comparing two objects.  At least for 
pre-Py3k.

Are there other cases where different types of objects compare equal? 
(Not including those where the user writes or overrides a method to get 
that functionality of course.)


Cheers,
    Ron




More information about the Python-Dev mailing list