[Python-Dev] PEP 263 -- Python Source Code Encoding

Fredrik Lundh fredrik@pythonware.com
Sat, 2 Mar 2002 09:40:43 +0100


Jason wrote:
> The problem I have with PEP 263 right now is that the
> "-*- coding: -*-" magic is really sort of being abused.

really?

> I gather that "coding:" is supposed to specify the
> encoding (what MIME calls "charset") of the file.
> But under PEP 263, it only refers to the Unicode string
> literals within the program.  Everything else must still
> be treated as 8-bit text.

from the current version (revision 1.9) of the PEP:

    "The complete Python source file should use a single
    encoding."

> For example, I'm not sure what effect "coding: utf-16"
> would have.  (?)

    "Only ASCII compatible encodings are allowed."

> For another example, if you have UTF-8 Unicode string
> literals in your program but you also have 8-bit
> Latin-1 plain str string literals in the same program,
> how should you mark it?

    "Embedding of differently encoded data is not
    allowed"

> Therefore I argue that it makes no sense to use "coding:" to
> label a Python file, because the file doesn't consist of Unicode
> text.

    "the proposed solution should be implemented in two phases:

    1. Implement the magic comment detection and default encoding
       handling, but only apply the detected encoding to Unicode
       literals in the source file.

    2. Change the tokenizer/compiler base string type from char* to
       Py_UNICODE* and apply the encoding to the complete file."

</F>