"Martin v. Loewis" wrote:
Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ?
I always thought it is a core property of cmp that it works between all objects.
It does, but not necessarily without exceptions. I could easily mask the decoding errors too and then have cmp() work exactly as for strings, but the outcome may be different to what the user had expected due to the failing conversion. Sorting order may then look quite unsorted...
Because of that,
x=[u'1','aäöü'] x.sort() Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
fails. As always in cmp, I'd expect to get a consistent outcome here (ie. cmp should give a total order on objects).
OTOH, I'm not so sure why cmp between plain and unicode strings needs to perform UTF-8 conversion? IOW, why is it desirable that
'a' == u'a' 1
This is needed to enhance inter-operability between Unicode and normal strings. Note that they also have the same hash value (provided both use the ASCII code range), making them interchangeable in dictionaries:
d={u'a':1} d['a'] = 2 d[u'a'] 2 d['a'] 2
This is per design.
Anyway, I'm not objecting to that outcome - I only think that, to get cmp consistent, it may be necessary to drop this result. If it is not necessary, the better.
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/