[Python-Dev] String encoding

M.-A. Lemburg mal@lemburg.com
Tue, 23 May 2000 13:14:46 +0200


Greg Stein wrote:
> 
> I still think that having any kind of global setting is going to be
> troublesome. Whether it is per-thread or not, it still means that Module
> Foo cannot alter the value without interfering with Module Bar.

True. 

The only reasonable place to alter the setting is in
site.py for the main thread. I think the setting should be
inherited by child threads, but I'm not sure whether this is
possible or not.
 
Modules that would need to change the settings are better
(re)designed in a way that doesn't rely on the setting at all, e.g.
work on Unicode exclusively which doesn't introduce the need
in the first place.

And then, noone is forced to alter the ASCII default to begin
with :-) The good thing about exposing this mechanism in Python
is that it gets user attention...

> Cheers,
> -g
> 
> On Tue, 23 May 2000, M.-A. Lemburg wrote:
> 
> > The recent discussion about repr() et al. brought up the idea
> > of a locale based string encoding again.
> >
> > A support module for querying the encoding used in the current
> > locale together with the experimental hook to set the string
> > encoding could yield a compromise which satisfies ASCII, Latin-1
> > and UTF-8 proponents.
> >
> > The idea is to use the site.py module to customize the interpreter
> > from within Python (rather than making the encoding a compile
> > time option). This is easily doable using the (yet to be written)
> > support module and the sys.setstringencoding() hook.
> >
> > The default encoding would be 'ascii' and could then be changed
> > to whatever the user or administrator wants it to be on a per
> > site basis. Furthermore, the encoding should be settable on
> > a per thread basis inside the interpreter (Python threads
> > do not seem to inherit any per-thread globals, so the
> > encoding would have to be set for all new threads).
> >
> > E.g. a site.py module could look like this:
> >
> > """
> > import locale,sys
> >
> > # Get encoding, defaulting to 'ascii' in case it cannot be
> > # determined
> > defenc = locale.get_encoding('ascii')
> >
> > # Set main thread's string encoding
> > sys.setstringencoding(defenc)
> >
> > This would result in the Unicode implementation to assume
> > defenc as encoding of strings.
> > """
> >
> > Minor nit: due to the implementation, the C parser markers
> > "s" and "t" and the hash() value calculation will still need
> > to work with a fixed encoding which still is UTF-8. C APIs
> > which want to support Unicode should be fixed to use "es"
> > or query the object directly and then apply proper, possibly
> > OS dependent conversion.
> >
> > Before starting off into implementing the above, I'd like to
> > hear some comments...
> >
> > Thanks,
> > --
> > Marc-Andre Lemburg
> > ______________________________________________________________________
> > Business:                                      http://www.lemburg.com/
> > Python Pages:                           http://www.lemburg.com/python/
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev@python.org
> > http://www.python.org/mailman/listinfo/python-dev
> >
> 
> --
> Greg Stein, http://www.lyra.org/
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://www.python.org/mailman/listinfo/python-dev

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/