[Python-Dev] Unicode and comparisons

Tue, 04 Apr 2000 17:52:52 +0200

Guido van Rossum wrote:
> 
> > Fredrik bug report made me dive a little deeper into compares
> > and contains tests.
> >
> > Here is a snapshot of what my current version does:
> >
> > >>> '1' == None
> > 0
> > >>> u'1' == None
> > 0
> > >>> '1' == 'aäöü'
> > 0
> > >>> u'1' == 'aäöü'
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> > UnicodeError: UTF-8 decoding error: invalid data
> >
> > >>> '1' in ('a', None, 1)
> > 0
> > >>> u'1' in ('a', None, 1)
> > 0
> > >>> '1' in (u'aäöü', None, 1)
> > 0
> > >>> u'1' in ('aäöü', None, 1)
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> > UnicodeError: UTF-8 decoding error: invalid data
> >
> > The decoding errors occur because 'aäöü' is not a valid
> > UTF-8 string (Unicode comparisons coerce both arguments
> > to Unicode by interpreting normal strings as UTF-8
> > encodings of Unicode).
> >
> > Question: is this behaviour acceptable or should I go
> > even further and mask decoding errors during compares
> > and contains tests too ?
> 
> I think this is right -- I expect it will catch more errors than it
> will cause.

Ok, I'll only mask the TypeErrors then. (UnicodeErrors are
subclasses of ValueErrors and thus do not get masked.)

> This made me go out and see what happens if you compare a numeric
> class instance (one that defines __int__) to another int -- it doesn't
> even call the __int__ method!  This should be fixed in 1.7 when we do
> the smart comparisons and rich coercions (or was it the other way
> around? :-).

Not sure ;-) I think both go hand in hand.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/