[Python-Dev] String encoding

Greg Stein gstein@lyra.org
Tue, 23 May 2000 03:57:21 -0700 (PDT)

I still think that having any kind of global setting is going to be
troublesome. Whether it is per-thread or not, it still means that Module
Foo cannot alter the value without interfering with Module Bar.


On Tue, 23 May 2000, M.-A. Lemburg wrote:

> The recent discussion about repr() et al. brought up the idea
> of a locale based string encoding again.
> A support module for querying the encoding used in the current
> locale together with the experimental hook to set the string
> encoding could yield a compromise which satisfies ASCII, Latin-1
> and UTF-8 proponents.
> The idea is to use the site.py module to customize the interpreter
> from within Python (rather than making the encoding a compile
> time option). This is easily doable using the (yet to be written)
> support module and the sys.setstringencoding() hook.
> The default encoding would be 'ascii' and could then be changed
> to whatever the user or administrator wants it to be on a per
> site basis. Furthermore, the encoding should be settable on
> a per thread basis inside the interpreter (Python threads
> do not seem to inherit any per-thread globals, so the
> encoding would have to be set for all new threads).
> E.g. a site.py module could look like this:
> """
> import locale,sys
> # Get encoding, defaulting to 'ascii' in case it cannot be
> # determined
> defenc = locale.get_encoding('ascii')
> # Set main thread's string encoding
> sys.setstringencoding(defenc)
> This would result in the Unicode implementation to assume
> defenc as encoding of strings.
> """
> Minor nit: due to the implementation, the C parser markers
> "s" and "t" and the hash() value calculation will still need
> to work with a fixed encoding which still is UTF-8. C APIs
> which want to support Unicode should be fixed to use "es"
> or query the object directly and then apply proper, possibly
> OS dependent conversion.
> Before starting off into implementing the above, I'd like to
> hear some comments...
