[Python-Dev] unicode hell/mixing str and unicode as dictionary keys
Ron Adam
rrr at ronadam.com
Mon Aug 7 23:37:42 CEST 2006
Michael Foord wrote:
> David Hopwood wrote:[snip..]
>>
>>>> we should, of course, continue to use the one we always used (for
>>>> "ascii", there is no difference between the two).
>>>>
>>> +1
>>>
>>> This seems the most (only ?) logical solution.
>>>
>> No; always considering Unicode and non-ASCII byte strings to be distinct
>> is just as logical.
Yes, that's true. (But can't be done prior to P3k of course.) Consider
the comparison of ...
[3] == (3,) -> False
These are not the same thing even though it may be trivial to treat them
as being equivalent. So how smart should a equivalence comparison be?
I think testing for interchangeability and/or taking into account
context is going down a very difficult road. Which is what the string
to Unicode comparison does by making an assumption that the string type
is in the default encoding, which it may not be.
Purity in this would insist that comparing floats and integers always
return False, but there is little ambiguity when it comes to whether
numerical values are equivalent or not. The rules for their comparisons
are fairly well established. So numerical equivalence can be the
exception when comparing values of differing types and its the expected
behavior as well as the established practice in programming.
> Except there has been an implicit promise in Python for years now that
> ascii byte-strings will compare equally to the unicode equivalent: lots
> of code assumes this. Breaking this is fine in principle - but for Py3K
> not Py 2.x.
Also True. And I hope that a bytes to Unicode comparison in Py3k will
always returns False just like [3] == (3,) always returns False.
> That means Martin's solution is the best for the current problem. (IMHO
> of course...)
I think (IMHO) in this particular case, maintaining "backwards
compatibility" should take precedence (until Py3k) and be the stated
reason for the continued behavior in the documents as well. And so
Unicode to String comparisons should be the second exception to not
doing data form conversions when comparing two objects. At least for
pre-Py3k.
Are there other cases where different types of objects compare equal?
(Not including those where the user writes or overrides a method to get
that functionality of course.)
Cheers,
Ron
More information about the Python-Dev
mailing list