[Python-Dev] PEP 263 -- Python Source Code Encoding

M.-A. Lemburg mal@lemburg.com
Tue, 26 Feb 2002 21:48:57 +0100

Finn Bock wrote:
> [MAL]
> >I consider the above PEP ready for review by the developers.
> >Please comment.
> The pep seems to dictate that the source by default must be read as
> latin-1:
> """
> Python will default to Latin-1 as standard encoding if no other
> encoding hints are given.
> """
> Jython already reads the python source with the default java encoding
> which usually depends on the PCs locale.
> If a small loophole could be added to that requirement, then the pep
> have my full support.

Hmm, in phase two we will need to decode the source code
file using some encoding into Unicode and then reencode the
8-bit string parts using that same encoding. The only 
requirement we have for that is round-trip safety, so that
string literals turn out as the same value you see in the
source file.

Now, Unicode literals are explicit about this: unicode-escape
is a latin-1 codec with some escaping knowledge. I'm not sure
how to get this in line with the "any round-trip safe encoding"

OTOH, if Jython users write source code which depends on the
PC's locale then they are bound to write non-portable code,
so fixing one encoding would certainly help here.

What I don't understand is why you read the file using the
PC's locale. Wouldn't it be possible to set the file encoding 
prior to reading from it ?

