[Python-Dev] Re: Unicode and comparisons

Tue, 04 Apr 2000 23:47:16 +0200

Peter Funk wrote:
> 
> Hi!
> 
> Guido van Rossum:
> > > I always thought it is a core property of cmp that it works between
> > > all objects.
> >
> > Not any more.  Comparisons can raise exceptions -- this has been so
> > since release 1.5.  This is rarely used between standard objects, but
> > not unheard of; and class instances can certainly do anything they
> > want in their __cmp__.
> 
> Python 1.6a1 (#6, Apr  2 2000, 02:32:06)  [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> a = '1'
> >>> b = 2
> >>> a < b
> 0
> >>> a > b  # Newbies are normally baffled here
> 1
> >>> a = 'ä'
> >>> b =  u'ä'
> >>> a < b
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeError: UTF-8 decoding error: unexpected end of data
> 
> IMO we will have a *very* hard to time to explain *this* behaviour
> to newbiews!
> 
> Unicode objects are similar to normal string objects from the users POV.
> It is unintuitive that objects that are far less similar (like for
> example numbers and strings) compare the way they do now, while the
> attempt to compare an unicode string with a standard string object
> containing the same character raises an exception.

I don't think newbies will really want to get into the UTF-8
business right from the start... when they do, they probably
know about the above problems already.

Changing this behaviour to silently swallow the decoding
error would cause more problems than do good, IMHO.
Newbies sure would find (u'a' not in 'aäöü') == 1 just
as sursprising...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/