Re: [Python-Dev] Re: Unicode and comparisons

4 Apr 2000


      "Martin v. Loewis" wrote:
...
...
Question: is this behaviour acceptable or should I go even further
and mask decoding errors during compares and contains tests too ?
I always thought it is a core property of cmp that it works between
all objects.
It does, but not necessarily without exceptions. I could easily
mask the decoding errors too and then have cmp() work exactly
as for strings, but the outcome may be different to what the
user had expected due to the failing conversion. Sorting order
may then look quite unsorted...
...
Because of that,
...
...
...
x=[u'1','aäöü']
x.sort()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: UTF-8 decoding error: invalid data
fails. As always in cmp, I'd expect to get a consistent outcome here
(ie. cmp should give a total order on objects).
OTOH, I'm not so sure why cmp between plain and unicode strings needs
to perform UTF-8 conversion? IOW, why is it desirable that
...
...
...
'a' == u'a'
1
This is needed to enhance inter-operability between Unicode
and normal strings. Note that they also have the same hash
value (provided both use the ASCII code range), making them
interchangeable in dictionaries:
...
...
...
d={u'a':1}
d['a'] = 2
d[u'a']
2
d['a']
2
This is per design.
...
Anyway, I'm not objecting to that outcome - I only think that, to get
cmp consistent, it may be necessary to drop this result. If it is not
necessary, the better.
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/

Re: [Python-Dev] Re: Unicode and comparisons

M.-A. Lemburg