[I18n-sig] Strawman Proposal (2): Encoding attributes

Paul Prescod paulp@ActiveState.com
Thu, 08 Feb 2001 15:45:26 -0800


Python source files may declare their encoding in the first few lines.
An encoding declaration must be found before the first statement in the
source file.

The encoding declaration is not a pragma. It does not show up in the
parse tree and has no semantic meaning for the compiler itself. It is
conceptually handled in a pre-compile "encoding sniffing" step. This
step is done using the Latin 1 encoding. 

The encoding declaration has the following basic syntax:

#?encoding="<some string>"

<some string> is the encoding name and must be associated with a
registered codec. The appropriate codec is used to decode the source
file. The decoded result is passed to the compiler. Once the decoding is
done, the encoding declaration has no other effect. In other words, it
does not further affect the interpretation of string literals with
non-ASCII characters or anything else.

The encoding declaration SHOULD be present in all Python source files
encoded in any character encoding other than 7-bit ASCII. Some future
version of Python may make this an absolute requirement.