[Python-Dev] decoding errors when comparing strings

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Wed, 26 Jul 2000 09:49:44 +0200

guido wrote:
> > summary: the current interpreter throws an "ASCII decoding
> > error" exception if you compare 8-bit and unicode strings, and
> > the 8-bit string happen to contain a character in the 128-255
> > range.
> Doesn't bother me at all.  If I write a user-defined class that raises
> an exception in __cmp__ you can get the same behavior.  The fact that
> the hashes were the same is a red herring; there are plenty of values
> with the same hash that aren't equal.
> I see the exception as a useful warning that the program isn't
> sufficiently Unicode aware to work correctly.  That's a *good* thing
> in my book -- I'd rather raise an exception than silently fail.

I assume that means you're voting for alternative 3:

    "a third alternative would be to keep the exception, and make
    the dictionary code exception proof."

because the following isn't exactly good behaviour:

>>> a =3D "=84"
>>> b =3D unicode(a, "iso-8859-1")
>>> d =3D {}
>>> d[a] =3D "a"
>>> d[b] =3D "b"
>>> len(d)
UnicodeError: ASCII decoding error: ordinal not in range(128)
>>> len(d)

(in other words, the dictionary implementation misbehaves if items
with the same hash value cannot be successfully compared)