[Python-Dev] decoding errors when comparing strings

M.-A. Lemburg mal@lemburg.com
Sun, 16 Jul 2000 00:27:16 +0200


Moshe Zadka wrote:
> 
> On Sat, 15 Jul 2000, Fredrik Lundh wrote:
> 
> > paul wrote:
> > > As soon as you find a character out of the ASCII range in one of the
> > > strings, I think that you should report that the two strings are
> > > unequal.
> >
> > sounds reasonable -- but how do you flag "unequal" in cmp?  which
> > value is "larger" if all that we know is that they're different...
> 
> We can say something like "beyond the ASCII range, every unicode character
> is larger then any regular 8-bit character", and compare
> lexicographically.

The usual method in the Python compare logic is to revert to
the type name for compares in case coercion fails... I think
this is the right description in this case: decoding fails and
thus coercion becomes impossible.

PyObject_Compare() has the logic, we'd just have to reenable
it for Unicode which currently is handled as special case to
pass through the decoding error.

Note that Unicode objects which don't coerce would then always
compare larger than 8-bit strings ("unicode" > "string").

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/