[Python-Dev] decoding errors when comparing strings

Guido van Rossum guido@beopen.com
Wed, 26 Jul 2000 07:57:02 -0500


> > I see the exception as a useful warning that the program isn't
> > sufficiently Unicode aware to work correctly.  That's a *good* thing
> > in my book -- I'd rather raise an exception than silently fail.
> 
> I assume that means you're voting for alternative 3:
> 
>     "a third alternative would be to keep the exception, and make
>     the dictionary code exception proof."

Yes.

> because the following isn't exactly good behaviour:
> 
> >>> a = "„"
> >>> b = unicode(a, "iso-8859-1")
> >>> d = {}
> >>> d[a] = "a"
> >>> d[b] = "b"
> >>> len(d)
> UnicodeError: ASCII decoding error: ordinal not in range(128)
> >>> len(d)
> 2
> 
> (in other words, the dictionary implementation misbehaves if items
> with the same hash value cannot be successfully compared)

Good point.  This would happen if you used flakey class instances as
keys as well!

(Note that the exception really happens on the d[b] = "b" statement;
but because the dict implementation doesn't check for exceptions, it
gets reported too late.  We've seen this kind of bugs before in
Python.)

--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)