[Python-Dev] decoding errors when comparing strings
Wed, 26 Jul 2000 11:11:37 +0200
Guido van Rossum wrote:
> > (revisiting an old thread on mixed string comparisions)
> I think it's PEP time for this one...
> > summary: the current interpreter throws an "ASCII decoding
> > error" exception if you compare 8-bit and unicode strings, and
> > the 8-bit string happen to contain a character in the 128-255
> > range.
> Doesn't bother me at all. If I write a user-defined class that raises
> an exception in __cmp__ you can get the same behavior. The fact that
> the hashes were the same is a red herring; there are plenty of values
> with the same hash that aren't equal.
> I see the exception as a useful warning that the program isn't
> sufficiently Unicode aware to work correctly. That's a *good* thing
> in my book -- I'd rather raise an exception than silently fail.
> Note that it can't break old code unless you try to do new things with
> the old code: the old code coudn't have supported Unicode because it
> doesn't exist in Python 1.5.2.
Hmm, so you do want exceptions to be raised for coercion errors
during compare ?
For the record:
PyObject_Compare() currently only does coercion for number
slot compatible types (ok, all Python instances have these
slots...). Unicode objects are handled in a special case to
allow the PyUnicode_Compare() API to do all necessary conversions
and then proceed with the compare.
The coercion proposal will have to deal with all this in 2.1.
My version of the proposal (see Python Pages) puts all the power
into the hands of the operators themselves rather than using
a centralized coercion method. In this scenario, the special
casing in PyObject_Compare() for Unicode would no longer
be necessary, since the Unicode compare slot could then take
care of the needed conversions (essentially PyUnicode_Compare()
would beceom that slot).
Python Pages: http://www.lemburg.com/python/