[Python-Dev] Re: \ud800 crashes interpreter (PR#384)

Fredrik Lundh fredrik@pythonware.com
Wed, 5 Jul 2000 11:14:21 +0200

mal wrote:
> > Given the new 7-bit-ASCII-as-default-encoding-for-8-bit-strings
> > convention, shouldn't just hashing the character values work
> > fine?  That is, hash('abc') should == hash(u'abc'), no conversion
> > required.
> Yes, and it does so already for pure ASCII values. The problem
> comes from the fact that the default encoding can be changed to
> a locale specific value (site.py does the lookup for you), e.g.
> given you have defined LANG to be us_en, Python will default
> to Latin-1 as default encoding.

footnote: in practice, this is a Unix-only feature.

I suggest adding code to the _locale module (or maybe sys is
better?) which can be used to dig up a suitable encoding for
non-Unix platforms.  On Windows, the code page should be
"cp%d" % GetACP().

I'll look into this later today.