[Python-Dev] unicode hell/mixing str and unicode as dictionary keys
"Martin v. Löwis"
martin at v.loewis.de
Mon Aug 7 14:46:45 CEST 2006
M.-A. Lemburg schrieb:
>> There's no disputing that an exception should be raised
>> if the string *must* be interpretable as characters in
>> order to continue. But that's not true here if you allow
>> for the interpretation that they're simply objects of
>> different (duck) type and therefore unequal.
>
> Hmm, given that interpretation, 1 == 1.0 would have to be
> False.
No, but 1 == 1.5 would have to be False (and actually is).
In that analogy, int relates to float as ascii-bytes to
Unicode: some values are shared between int and float (e.g.
1 and 1.0), other values are not shared (e.g. 1.5 has no
equivalent in int). An int equals a float only if both
values originate from the shared subset.
Now, int is a (nearly) true subset of float, so there are
no ints with no float equivalent (actually, there are, but
Python ignores that).
> Note that you do have to interpret the string as characters
> if you compare it to Unicode and there's nothing wrong with
> that.
Consider this:
py> int(3+4j)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: can't convert complex to int; use int(abs(z))
py> 3 == 3+4j
False
So even though the conversion raises an exception, the
values are determined to be not equal. Again, because int
is a nearly true subset of complex, the conversion goes
the other way, but *if* it would use the complex->int
conversion, then the TypeError should be taken as
a guarantee that the objects don't compare equal.
Expanding this view to Unicode should mean that a unicode
string U equals a byte string B if
U.encode(system_encode) == B or B.decode(system_encoding) == U,
and that they don't equal otherwise (e.g. if the conversion
fails with a "not convertible" exception). Which of the
two conversions is selected is arbitrary; we should, of
course, continue to use the one we always used (for
"ascii", there is no difference between the two).
Regards,
Martin
More information about the Python-Dev
mailing list