[Python-Dev] Unicode and comparisons
Guido van Rossum
guido@python.org
Tue, 04 Apr 2000 07:51:42 -0400
> Fredrik bug report made me dive a little deeper into compares
> and contains tests.
>
> Here is a snapshot of what my current version does:
>
> >>> '1' == None
> 0
> >>> u'1' == None
> 0
> >>> '1' == 'aäöü'
> 0
> >>> u'1' == 'aäöü'
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeError: UTF-8 decoding error: invalid data
>
> >>> '1' in ('a', None, 1)
> 0
> >>> u'1' in ('a', None, 1)
> 0
> >>> '1' in (u'aäöü', None, 1)
> 0
> >>> u'1' in ('aäöü', None, 1)
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeError: UTF-8 decoding error: invalid data
>
> The decoding errors occur because 'aäöü' is not a valid
> UTF-8 string (Unicode comparisons coerce both arguments
> to Unicode by interpreting normal strings as UTF-8
> encodings of Unicode).
>
> Question: is this behaviour acceptable or should I go
> even further and mask decoding errors during compares
> and contains tests too ?
I think this is right -- I expect it will catch more errors than it
will cause.
This made me go out and see what happens if you compare a numeric
class instance (one that defines __int__) to another int -- it doesn't
even call the __int__ method! This should be fixed in 1.7 when we do
the smart comparisons and rich coercions (or was it the other way
around? :-).
--Guido van Rossum (home page: http://www.python.org/~guido/)