[Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Guido van Rossum guido at python.org
Mon Aug 31 04:34:43 CEST 2009


On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon<brett at python.org> wrote:
> On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido at python.org> wrote:
>> On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett at python.org> wrote:
>>> I am going through and running the entire test suite using importlib
>>> to ferret out incompatibilities. I have found a bunch, although all
>>> rather minor (raising a different exception typically; not even sure
>>> they are worth backporting as anyone reliant on the old exceptions
>>> might get a nasty surprise in the next micro release), and now I am
>>> down to my last failing test suite: test_import.
>>>
>>> Ignoring the execution bit problem (http://bugs.python.org/issue6526
>>> but I have no clue why this is happening), I am bumping up against
>>> TestPycRewriting.test_incorrect_code_name. Turns out that import
>>> resets co_filename on a code object to __file__ before exec'ing it to
>>> create a module's namespace in order to ignore the file name passed
>>> into compile() for the filename argument. Now I can't change
>>> co_filename from Python as it's a read-only attribute and thus can't
>>> match this functionality in importlib w/o creating some custom code to
>>> allow me to specify the co_filename somewhere (marshal.loads() or some
>>> new function).
>>>
>>> My question is how important is this functionality? Do I really need
>>> to go through and add an argument to marshal.loads or some new
>>> function just to set co_filename to something that someone explicitly
>>> set in a .pyc file? Or I can let this go and have this be the one
>>> place where builtins.__import__ and importlib.__import__ differ and
>>> just not worry about it?
>>
>> ISTR that Bill Janssen once mentioned a file replication mechanism
>> whereby there were two names for each file: the "canonical" name on a
>> replicated read-only filesystem, and the longer "writable" name on a
>> unique master copy. He ended up with the filenames in the .pyc files
>> being pretty bogus (since not everyone had access to the writable
>> filesystem). So setting co_filename to match __file__ (i.e. the name
>> under which the module is being imported) would be a nice service in
>> this case.
>>
>> In general this would happen whenever you pre-compile a bunch of .py
>> files to .pyc/.pyo and then copy the lot to a different location. Not
>> a completely unlikely scenario.

> Well, to get this level of compatibility I am going to need to add
> some magical API somewhere then to overwrite a code object's "file"
> location. Blah.

Agreed, no fun. Unfortunately for core Python it really pays to go the
extra mile...

> I will either add an argument to marshal.loads to specify an
> overriding file path or add an imp.exec that takes a file path
> argument to override the code object with.

Remember, there are many code objects created from one pyc file.
Adding it to marshal.load*() makes sense because then it's usable for
other purposes too, and that attacks the issue from the root. (in
import.c it's done by update_compiled_module() right after
read_compiled_module(), which is a thin wrapper around marshal.load())
I'm not sure how imp.exec would make sure that introspection of the
loaded code objects always gets the right thing.

>> (I was going to comment on the execution bit issue but I realized I'm
>> not even sure if you're talking about import.c or not. :-)
>
> So it turns out a bunch of execution/write bit stuff has come up in
> Python 2.7 and importlib has been ignoring it. =) Importlib has simply
> been opening up the bytecode files with 'wb' and writing out the file.
> But test_import tests that no execution bit get set or that a write
> bit gets added if the source file lacks it. I guess I can use
> posix.chmod and posix.stat to copy the source file's read and write
> bits and always mask out the execution bits. I hate this low-level
> file permission stuff.

It's no fun -- see the layers of #ifdefs in open_exclusive() in
import.c. (Though I think you won't need to worry about VMS. :-) But
it's somewhat important to get it right from a security POV. I would
use os.open() and wrap an io.BufferedWriter around it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list