[Python-Dev] Dicts are broken Was: unicode hell/mixing str and unicode asdictionarykeys
"Martin v. Löwis"
martin at v.loewis.de
Mon Aug 7 15:00:20 CEST 2006
M.-A. Lemburg schrieb:
> Python just doesn't know the encoding of the 8-bit string, so can't
> make any assumptions on it. As result, it raises an exception to inform
> the programmer.
Oh, Python does make an assumption what the encoding is: it assumes
it is the system encoding (i.e. "ascii"). Then invoking the ascii
codec raises an exception, because the string clearly isn't ascii.
> It is well possible that the string uses an encoding where the
> Unicode string is indeed the equal to the string, assuming this
> encoding
So what? Python uses the system encoding for this operation.
What does it matter that the result would be different if it
had used a different encoding.
The strings are unequal under the system encoding; it's irrelevant
that they might be equal under a different encoding.
The same holds for the ASCII part (i.e. where you don't get an
exception):
py> u"foo" == "sbb"
False
py> u"foo".encode("rot13") == "sbb"
True
So the strings compare as unequal, even though they compare
equal if treated as rot13. That doesn't stop Python from considering
them unequal.
Regards,
Martin
More information about the Python-Dev
mailing list