[issue6011] python doesn't build if prefix contains non-ascii characters
report at bugs.python.org
Sat Dec 25 22:46:49 CET 2010
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
Le vendredi 24 décembre 2010 à 14:46 +0000, Baptiste Carvello a écrit :
> the patch solves the bug for me as well (using locale "C", the
> filesystem encoding is utf-8). However, I do not understand why the
> patch checks that the shebang line decodes with both utf-8 and the
> file's encoding. The shebang line is only used by the kernel to locate
> the interpreter, so none of these should matter. Or have I misuderstood
> the patch?
The shebang is read by 3 different functions:
a) the shell reads the first line: if it starts with "#!", it's a
shebang: read the command and options and execute it
b) Python searchs a "#cookie:xxx" pattern in the first or the second
line using a binary parser
c) Python reads the file using the Python encoding: encoding written in
the #coding:xxx header or UTF-8 by default
(a) The shell reads the file as a binary file, it doesn't care of the
encoding. It reads byte strings and pass them to the kernel.
(b) The parser starts with the default encoding, UTF-8. Even if the file
encoding is not UTF-8, all lines (Python only checks the cookie in the
first or the second line) before #coding:xxx cookie are read in UTF-8.
The shebang have to be written to the first line, so the cookie cannot
be written before the shebang => the shebang have to be decodable from
(b) If the file encoding is not UTF-8, a #cookie:xxx is used and the
whole file (including the shebang) have to be decodable from this
encoding => the shebang have to be decodable from the file encoding
So the shebang have to be decodable from UTF-8 and from the file
I should maybe add a comment about that in the patch.
Example of (b) issue:
File "./build/scripts-3.2/2to3", line 1
SyntaxError: Non-UTF-8 code starting with '\xff' in
file ./build/scripts-3.2/2to3 on line 1, but no encoding declared; see
http://python.org/dev/peps/pep-0263/ for details
The shebang is b'#!/home/haypo/tmp/py3k\xff/bin/python3.2\n', my locale
encoding is UTF-8 and the file encoding has no encoding cookie (it is
encoded to UTF-8).
copy_script.patch fixes an issue if the configure prefix is not ASCII
(especially if the prefix is not decodable from UTF-8).
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list