[I18n-sig] Strawman Proposal (2): Encoding attributes

M.-A. Lemburg mal@lemburg.com
Fri, 09 Feb 2001 11:10:33 +0100

Paul Prescod wrote:
> Python source files may declare their encoding in the first few lines.
> An encoding declaration must be found before the first statement in the
> source file.
> The encoding declaration is not a pragma. It does not show up in the
> parse tree and has no semantic meaning for the compiler itself. It is
> conceptually handled in a pre-compile "encoding sniffing" step. This
> step is done using the Latin 1 encoding.

I'd rather restrict this to ASCII since codec names must be ASCII
and this would also allow detecting wrong formats of the source file
in addition to make UTF-16 detection possible.
> The encoding declaration has the following basic syntax:
> #?encoding="<some string>"
> <some string> is the encoding name and must be associated with a
> registered codec. The appropriate codec is used to decode the source
> file. 

Decode to what other format ? Unicode, the current locale's encoding ?
What would happen if the decoding step fails ?

> The decoded result is passed to the compiler. Once the decoding is
> done, the encoding declaration has no other effect. In other words, it
> does not further affect the interpretation of string literals with
> non-ASCII characters or anything else.

But if it doesn't affect the interpretation of string literals then
what benefits do we gain from knowing the encoding ?
> The encoding declaration SHOULD be present in all Python source files
> encoded in any character encoding other than 7-bit ASCII. Some future
> version of Python may make this an absolute requirement.

I think that such a scheme is indeed possible, but not until we
have made all strings default to Unicode. Then decoding to Unicode
would be the proper thing to do.

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/