[New-bugs-announce] [issue13758] compile() should not encode 'filename' (at least on Windows)
Terry J. Reedy
report at bugs.python.org
Wed Jan 11 04:46:44 CET 2012
New submission from Terry J. Reedy <tjreedy at udel.edu>:
The 3.2.2 doc for compile() says "The filename argument should give the file from which the code was read; pass some recognizable value if it wasn’t read from a file ('<string>' is commonly used)."
I am not sure what 'recognizable' is supposed to mean, but as I understand it, it would be user-specific and any string containing a fake 'filename' should be accepted and attached to the output code object as the .co_filename attribute. (At least on Windows.)
In fact, compile() has a hidden restriction: it encodes 'filename' with the local filesystem encoding. It tosses the bytes result (at least on Windows) but lets a UnicodeEncodeError terminate compilation. The effect is to add an undocumented and spurious dependency to code that has nothing to do with real files or the local machine.
In #10114, msg118845, Victor Stinner justified this with
"co_filename attribute is used to display the traceback: Python opens the related file, read the source code line and display it."
If the filename is fake, it cannot do that. (Perhaps the doc should warn users to make sure that fake filenames do not match any possibly real filenames ;-). The traceback mechanism could ignore UnicodeEncodeErrors just as well as it now ignores IO(?)Errors when open('fakename') does not not work.
Victor continues "On Windows, co_filename is directly used because Windows accepts unicode for filenames." This is not true in that on at least some Windows, compile tries to encode with the mbcs codec, which in turn uses the hidden local codepage. I believe that for most or all codepages, this will even raise errors for some valid Unicode filenames.
I do not know whether the stored .co_filename attribute type for *nix is str, as on Windows, or bytes. If the latter, the doc should say so.
If compile() continues to filter fake filenames, which I oppose, the doc should also say so and say what it does.
This issue came up on python-list when someone used a Chinese filename and mbcs rejected it.
components: Interpreter Core
stage: test needed
title: compile() should not encode 'filename' (at least on Windows)
versions: Python 3.2, Python 3.3
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce