[Python-Dev] #pragmas in Python source code

M.-A. Lemburg mal@lemburg.com
Tue, 18 Apr 2000 00:01:38 +0200

Paul Prescod wrote:
> "M.-A. Lemburg" wrote:
> >
> > ...
> > The current need for #pragmas is really very simple: to tell
> > the compiler which encoding to assume for the characters
> > in u"...strings..." (*not* "...8-bit strings..."). The idea
> > behind this is that programmers should be able to use other
> > encodings here than the default "unicode-escape" one.
> I'm totally confused about this. Are we going to allow UCS-2 sequences
> in the middle of Python programs that are otherwise ASCII?

The idea is to make life a little easier for programmers
who's native script is not easily writable using ASCII, e.g.
the whole Asian world.

While originally only the encoding used within the quotes of
u"..." was targetted (on the i18n sig), there has now been
some discussion on this list about whether to move forward
in a whole new direction: that of allowing whole Python scripts
to be encoded in many different encodings. The compiler will
then convert the scripts first to Unicode and then to 8-bit
strings as needed.

Using this technique which was introduced by Fredrik Lundh
we could in fact have Python scripts which are encoded in
UTF-16 (two bytes per character) or other more obscure
encodings. The Python interpreter would only see Unicode
and Latin-1.

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/