[Python-Dev] decoding errors when comparing strings

Tim Peters tim_one@email.msn.com
Wed, 26 Jul 2000 04:09:27 -0400


[Guido]
> ...
> I see the exception as a useful warning that the program isn't
> sufficiently Unicode aware to work correctly.  That's a *good* thing
> in my book -- I'd rather raise an exception than silently fail.

[Fredrik Lundh]
> I assume that means you're voting for alternative 3:
>
>     "a third alternative would be to keep the exception, and make
>     the dictionary code exception proof."
>
> because the following isn't exactly good behaviour:
>
> >>> a = "„"
> >>> b = unicode(a, "iso-8859-1")
> >>> d = {}
> >>> d[a] = "a"
> >>> d[b] = "b"
> >>> len(d)
> UnicodeError: ASCII decoding error: ordinal not in range(128)
> >>> len(d)
> 2
>
> (in other words, the dictionary implementation misbehaves if items
> with the same hash value cannot be successfully compared)

Hmm.  That's a bug in the dict implementation that's independent of Unicode
issues, then -- and I can provoke similar behavior with classes that raise
exceptions from __cmp__, without using Unicode anywhere.  So, ya, the dict
bugs have to be fixed.  Nobody needs to vote on *that* part <wink>.  I'll
look into it "soon", unless somebody else does first.