[Python-Dev] Re: \ud800 crashes interpreter (PR#384)

M.-A. Lemburg mal@lemburg.com
Wed, 05 Jul 2000 17:40:19 +0200


Bill Tutt wrote:
> 
> > MAL wrote:
> > ... I wonder why compiling "print u'\uD800'" causes the
> > hash value to be computed ...
> 
> That's an easy one. Com_addconst() (or something it calls) calls
> PyObject_Hash() during the compilation process.

Ah ok.
 
> Re: UTF-8
> There's no reason why you can't support surrogates in UTF-8, while still not
> supporting them in slice notation.

True.

> It's certainly the easiest way to fix the problem.

Well, it doesn't really fix the problem... your note only made
it clear that with the change in default encoding (be it ASCII
or whatever the locale defines), has the unwanted side effect
of breaking the has/cmp rule for non-ASCII character strings
vs. Unicode.

Perhaps pushing the default encoding down all the way is
the solution (with some trickery this is possible now, since
changing the default encoding is only allows in site.py) or
simply stating that the hash/cmp rule only works for ASCII 
contents of the objects.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/