[Python-Dev] Re: Unicode and comparisons

Peter Funk pf@artcom-gmbh.de
Tue, 4 Apr 2000 23:14:59 +0200 (MEST)


Hi!

Guido van Rossum:
> > I always thought it is a core property of cmp that it works between
> > all objects.
> 
> Not any more.  Comparisons can raise exceptions -- this has been so
> since release 1.5.  This is rarely used between standard objects, but
> not unheard of; and class instances can certainly do anything they
> want in their __cmp__.

Python 1.6a1 (#6, Apr  2 2000, 02:32:06)  [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> a = '1'
>>> b = 2
>>> a < b
0
>>> a > b  # Newbies are normally baffled here
1
>>> a = 'ä'
>>> b =  u'ä'
>>> a < b
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: UTF-8 decoding error: unexpected end of data

IMO we will have a *very* hard to time to explain *this* behaviour 
to newbiews!  

Unicode objects are similar to normal string objects from the users POV.
It is unintuitive that objects that are far less similar (like for 
example numbers and strings) compare the way they do now, while the 
attempt to compare an unicode string with a standard string object 
containing the same character raises an exception.

Mit freundlichen Grüßen (Regards), Peter
(BTW: using an 12year old US keyboard and a custom xmodmap all the time 
to write umlauts lots of other interisting chars: ÷× ± ²³ ½¼ ° µ «» ¿? ¡! ;-)