[Python-3000] PEP 3131 accepted
python at zesty.ca
Sat May 26 12:33:23 CEST 2007
Ka-Ping Yee wrote:
> Alas, the coding directive is not good enough. Have a look at this:
> That's an image of a text editor containing some Python code. Can you
> tell whether running it (post-PEP-3131) will delete your .bashrc file?
Martin v. Löwis wrote:
> I would think that it doesn't (i.e. allowed should stay at 0).
> Why does os.remove get invoked?
Mike Klaas wrote:
> Perhaps a letter in the encoding declaration is non-ascii, nullifying
> the encoding enforcement and allowing a cyrillic 'a' in allowed = 0?
You got it.
See the actual source file at
There are three things going on here:
1. All three occurrences of "allowed" look the same. And
it seems they are truly the same, because the coding
declaration on line 2 says the file is ASCII. But in
fact, they aren't the same -- one of them contains a
Cyrillic "a", which changes the meaning of the program.
2. But how is that possible when the coding declaration
says the file is ASCII? If you believe it, then you
also expect the coding declaration itself to be ASCII,
i.e., a real coding declaration. But it isn't -- the
word "coding" contains a Cyrillic "c".
3. Then why doesn't Python complain about this non-ASCII
character on line 2 of the file, since ASCII is supposed
to be the default encoding? Because there is a UTF-8 BOM
at the beginning of the file.
PEP 263 tries to prevent confusion by making Python complain
if the coding declaration conflicts with the already-set
UTF-8 encoding. But even though line 2 looks like a coding
declaration, Python doesn't notice it, so you get no warning.
The conclusion is that one cannot rely on the coding declaration
to know what the encoding is, because one cannot know what the
coding declaration says. We would be able to rely on it, if only
it were encoded in ASCII. But the enabling of UTF-8 by a BOM at the
beginning of the file is an invisible override. This invisible
override is the source of the danger. If we want to be able to
read the coding declaration with any confidence, we should get rid
of the invisible override.
More information about the Python-3000