[Python-Dev] #pragmas in Python source code

Tue, 18 Apr 2000 12:56:55 +0200

Guido van Rossum wrote:
> > Using this technique which was introduced by Fredrik Lundh
> > we could in fact have Python scripts which are encoded in
> > UTF-16 (two bytes per character) or other more obscure
> > encodings. The Python interpreter would only see Unicode
> > and Latin-1.
>=20
> Wouldn't it make more sense to have the Python compiler *always* see
> UTF-8 and to use a simple preprocessor to deal with encodings?

to some extent, this depends on what the "everybody" in
CP4E means -- if you were to do user-testing on non-americans,
I suspect "why cannot I use my own name as a variable name"
might be as common as "why are SPAM and spam two different
variables?".

and if you're willing to address both issues in Py3K, it's much
easier to use a simple internal representation, and handle en-
codings on the way in and out.  and PY_UNICODE* strings are
easier to process than UTF-8 encoded char* strings...

</F>