[Python-Dev] Re: \ud800 crashes interpreter (PR#384)

M.-A. Lemburg mal@lemburg.com
Tue, 04 Jul 2000 22:58:46 +0200

billtut@microsoft.com wrote:
> Full_Name: Bill Tutt
> Version: CVS
> OS: NT
> Submission from: tide70.microsoft.com (
> u'\ud800' causes the interpreter to crash
> example:
> print u'\ud800'
> What happens:
> The code failes to compile because while adding the constant, the unicode_hash
> function is called which for some reason requires the UTF-8 string format.

The reasoning at the time was that dictionaries should accept
Unicode objects as keys which match their string equivalents
as the same key, e.g. 'abc' works just as well as u'abc'.

UTF-8 was the default encoding back then. I'm not sure how
to fix the hash value given the new strategy w/r to the
default encoding... 

According to the docs, objects comparing equal should have the
same hash value, yet this would require the hash value to be
calculated using the default encoding and that
would not only cause huge performance problems, but could
effectively render Unicode useless, because not all default
encodings are lossless (ok, one could work around this by
falling back to some other way of calculating the hash
value in case the conversion fails).
Looks like we have a problem here :-/

> The conversion to UTF-8 fails (completly bogus), the compiler only notes that
> compilation failed, and cleared the stored exception info.
> When compilatino finishes it remembered that it failed, and returns.
> The interpreter then proceeds to crash in PyErr_NormalizeException() because the
> UTF-8 conversion exception info isn't there anymore.
> Suggested fix:
> Changing the UTF-8 conversion code to emit 4 bytes for surrogate characters.
> _______________________________________________
> Python-bugs-list maillist  -  Python-bugs-list@python.org
> http://www.python.org/mailman/listinfo/python-bugs-list

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/