[I18n-sig] Changing case
Wed, 12 Apr 2000 09:59:25 +0200
Guido van Rossum wrote:
> > Perhaps we should just loosen the used encoding for u"...chars..."
> > using #pragmas and/or cmd line switches. Then people around the
> > world would at least have a simple way to write programs which
> > still work everywhere, but can be written using any of the
> > encodings known to Python. 8-bit "...chars..." would then
> > be interpreted as before: user defined data using a user
> > defined encoding (the string->Unicode conversion would still
> > need to make the UTF-8 assumption, though).
> This sounds like my proposal. Let's do it.
Thinking about this some more: while adding a flag to designate
the u"" encoding would be easy, should the encoded string also
be able to contain \uXXXX and the like sequences ? If yes, we'd
need a two level approach:
1. decode the input encoding to Unicode
2. decode the embedded \uXXXX et al. escape sequences (now within
We'd need a new codec for 2 and this codec would have to be able
to translate Unicode to Unicode -- nothing difficult, but a
new technique since all others currently do 8-bit <-> Unicode.
"Draft proposal"ing here:
Let's start the experiment with a command line switch
until #pragma handling has been properly defined. #pragmas
should then be used for scripts read from files to ensure
that they work elsewhere in the world.
What command line switch should we use... -e as in
We'd also need an environment variable ro make things easier,
The value should be available within Python as e.g. sys.encoding.
The given encoding would only be used by the compiler (the part
that translates u"..." strings into objects). Usage in scripts
in then up to user-land routines (via sys.encoding).
To make all this work without too many hassles we'd need
(at least the most commonly used) CJKV codecs in the core
distribution. How big would these be ? Would someone contribute
them... Tamito ?
Python Pages: http://www.lemburg.com/python/