PEP 263 status check
"Martin v. Löwis"
martin at v.loewis.de
Fri Aug 6 10:08:22 CEST 2004
John Roth wrote:
> In fact, I think that a scan and update program in the tools
> directory might be a very good idea - just walk through a
> Python library, scan and update everything that doesn't
> have a declaration.
Good idea. I see whether I can write something before 2.4,
but contributions are definitely welcome.
> My specific question there was how the code handles the
> combination of UTF-8 as the encoding and a non-ascii
> character in an 8-bit string literal. Is this an error? The
> PEP does not say so. If it isn't, what encoding will
> it use to translate from unicode back to an 8-bit
UTF-8 is not in any way special wrt. the PEP. Notice that
UTF-8 is *not* Unicode - it is an encoding of Unicode, just
like ISO-8559-1 or us-ascii (although the latter two only
encode a subset of Unicode). Yes, the byte string literals
will be converted back to an "8-bit encoding", but the 8-bit
encoding will be UTF-8! IOW, byte string literals are always
converted back to the source encoding before execution.
> Another project for people who care about this
> subject: tools. Of the half zillion editors, pretty printers
> and so forth out there, how many check for the encoding
> line and do the right thing with it? Which ones need to
> be updated?
I know IDLE, Eric, Komodo, and Emacs do support encoding
declarations. I know PythonWin doesn't, although I once
had written patches to add such support. A number of editors
(like notepad.exe) do the right thing only if the document
has the UTF-8 signature.
Of course, editors don't necessarily need to actively
support the feature as long as the declared encoding is
the one they use, anyway. They won't display source in
other encodings correctly, but some of them don't have
the notion of multiple encodings, anyway.
More information about the Python-list