On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon
On Sun, Aug 30, 2009 at 17:24, Guido van Rossum
wrote: On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon
wrote: I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.
Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).
My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?
ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.
In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.
Well, to get this level of compatibility I am going to need to add some magical API somewhere then to overwrite a code object's "file" location. Blah.
Agreed, no fun. Unfortunately for core Python it really pays to go the extra mile...
I will either add an argument to marshal.loads to specify an overriding file path or add an imp.exec that takes a file path argument to override the code object with.
Remember, there are many code objects created from one pyc file. Adding it to marshal.load*() makes sense because then it's usable for other purposes too, and that attacks the issue from the root. (in import.c it's done by update_compiled_module() right after read_compiled_module(), which is a thin wrapper around marshal.load()) I'm not sure how imp.exec would make sure that introspection of the loaded code objects always gets the right thing.
(I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-)
So it turns out a bunch of execution/write bit stuff has come up in Python 2.7 and importlib has been ignoring it. =) Importlib has simply been opening up the bytecode files with 'wb' and writing out the file. But test_import tests that no execution bit get set or that a write bit gets added if the source file lacks it. I guess I can use posix.chmod and posix.stat to copy the source file's read and write bits and always mask out the execution bits. I hate this low-level file permission stuff.
It's no fun -- see the layers of #ifdefs in open_exclusive() in import.c. (Though I think you won't need to worry about VMS. :-) But it's somewhat important to get it right from a security POV. I would use os.open() and wrap an io.BufferedWriter around it. -- --Guido van Rossum (home page: http://www.python.org/~guido/)