
Fredrik bug report made me dive a little deeper into compares and contains tests. Here is a snapshot of what my current version does:
'1' == None 0 u'1' == None 0 '1' == 'aäöü' 0 u'1' == 'aäöü' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
'1' in ('a', None, 1) 0 u'1' in ('a', None, 1) 0 '1' in (u'aäöü', None, 1) 0 u'1' in ('aäöü', None, 1) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
The decoding errors occur because 'aäöü' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode). Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/