
Fredrik bug report made me dive a little deeper into compares and contains tests.
Here is a snapshot of what my current version does:
'1' == None 0 u'1' == None 0 '1' == 'aäöü' 0 u'1' == 'aäöü' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
'1' in ('a', None, 1) 0 u'1' in ('a', None, 1) 0 '1' in (u'aäöü', None, 1) 0 u'1' in ('aäöü', None, 1) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
The decoding errors occur because 'aäöü' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode).
Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ?
I think this is right -- I expect it will catch more errors than it will cause. This made me go out and see what happens if you compare a numeric class instance (one that defines __int__) to another int -- it doesn't even call the __int__ method! This should be fixed in 1.7 when we do the smart comparisons and rich coercions (or was it the other way around? :-). --Guido van Rossum (home page: http://www.python.org/~guido/)