
Guido van Rossum wrote:
Fredrik bug report made me dive a little deeper into compares and contains tests.
Here is a snapshot of what my current version does:
'1' == None 0 u'1' == None 0 '1' == 'aäöü' 0 u'1' == 'aäöü' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
'1' in ('a', None, 1) 0 u'1' in ('a', None, 1) 0 '1' in (u'aäöü', None, 1) 0 u'1' in ('aäöü', None, 1) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
The decoding errors occur because 'aäöü' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode).
Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ?
I think this is right -- I expect it will catch more errors than it will cause.
Ok, I'll only mask the TypeErrors then. (UnicodeErrors are subclasses of ValueErrors and thus do not get masked.)
This made me go out and see what happens if you compare a numeric class instance (one that defines __int__) to another int -- it doesn't even call the __int__ method! This should be fixed in 1.7 when we do the smart comparisons and rich coercions (or was it the other way around? :-).
Not sure ;-) I think both go hand in hand. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/