[Python-Dev] PEP 263 -- Python Source Code Encoding

Guido van Rossum guido@python.org
Tue, 26 Feb 2002 16:53:55 -0500


> In phase 2, the encoding will apply to all strings. So it will not be
> possible to put arbitrary byte sequences in a string literal, atleast
> if the encoding disallows certain byte sequences (like UTF-8, or
> ASCII). Since this is currently possible, we have a backwards
> compatibility problem.

I would say that any program that currently uses non-ASCII in string
literals (whether Unicode or 8-bit literals) is strictly spoken
undefined.  For cases where a specific encoding is used, the solution
is easy: add an explicit encoding.  Other cases are simply garbage and
should use \xDD escapes instead.

Maybe an implementation phase 1a should be introduced that warns about
the occurrence of non-ASCII characters anywhere in the source code
when no encoding is specified.

--Guido van Rossum (home page: http://www.python.org/~guido/)