Mailman 3 how important is setting co_filename for a module being imported to what __file__ is set to? - Python-Dev

how important is setting co_filename for a module being imported to what file is set to?

Brett Cannon

Aug. 30, 2009

11:28 p.m.

I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import. Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function). My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it? -Brett

Show replies by date

Robert Collins

August 2009

12:13 a.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

On Sun, 2009-08-30 at 16:28 -0700, Brett Cannon wrote:

...

Just to be clear, this would show up if I: had a python tree built and run stuff from it symlinked to that tree from somewhere else ran stuff from that somewhere else - because the pyc is already on disk? Thats been an invaluable 'wtf' debugging tool at various times, because the odd provenance of the path in the pyc makes it extremely clear that what is being loaded isn't what one had thought was being loaded. OTOH, always showing the path that the pyc was *actually found at* would fix the weirdness that occurs when you mv a python tree from one place to another. -Rob

Brett Cannon

12:23 a.m.

On Sun, Aug 30, 2009 at 17:13, Robert Collins<robertc@robertcollins.net> wrote:

...

Right; the code object would think it was loaded from the original location it was created at instead of where it actually is. Now why someone would want to move their .pyc files around instead of recompiling I don't know short of not wanting to send someone source. -Brett

Guido van Rossum

12:26 a.m.

On Sun, Aug 30, 2009 at 5:23 PM, Brett Cannon<brett@python.org> wrote:

...

I already mentioned replication; it could also just be a matter of downloading a tarball with .py and .pyc files. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Glyph Lefkowitz

3:13 a.m.

On Sun, Aug 30, 2009 at 8:26 PM, Guido van Rossum <guido@python.org> wrote:

...

Also, if you're using Python in an embedded context, bytecode compilation (or even filesystem access!) can be prohibitively slow, so an uncompressed .zip file full of compiled .pyc files is really the way to go. I did this a long time ago on an XScale machine, but recent inspection of the Android Python scripting stuff shows a similar style of deployment (c.f. /data/data/com.google.ase/python/lib/python26.zip).

Guido van Rossum

12:24 a.m.

On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case. In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario. (I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Brett Cannon

12:34 a.m.

On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido@python.org> wrote:

...

On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).

My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.

In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.

Well, to get this level of compatibility I am going to need to add some magical API somewhere then to overwrite a code object's "file" location. Blah. I will either add an argument to marshal.loads to specify an overriding file path or add an imp.exec that takes a file path argument to override the code object with.

...

(I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-)

So it turns out a bunch of execution/write bit stuff has come up in Python 2.7 and importlib has been ignoring it. =) Importlib has simply been opening up the bytecode files with 'wb' and writing out the file. But test_import tests that no execution bit get set or that a write bit gets added if the source file lacks it. I guess I can use posix.chmod and posix.stat to copy the source file's read and write bits and always mask out the execution bits. I hate this low-level file permission stuff. -Brett

Guido van Rossum

2:34 a.m.

On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon<brett@python.org> wrote:

...

On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido@python.org> wrote:

...
On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).

My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.

In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.

...

Agreed, no fun. Unfortunately for core Python it really pays to go the extra mile...

...

Remember, there are many code objects created from one pyc file. Adding it to marshal.load*() makes sense because then it's usable for other purposes too, and that attacks the issue from the root. (in import.c it's done by update_compiled_module() right after read_compiled_module(), which is a thin wrapper around marshal.load()) I'm not sure how imp.exec would make sure that introspection of the loaded code objects always gets the right thing.

...

It's no fun -- see the layers of #ifdefs in open_exclusive() in import.c. (Though I think you won't need to worry about VMS. :-) But it's somewhat important to get it right from a security POV. I would use os.open() and wrap an io.BufferedWriter around it. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Brett Cannon

2:43 a.m.

On Sun, Aug 30, 2009 at 19:34, Guido van Rossum<guido@python.org> wrote:

...

On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon<brett@python.org> wrote:

...
On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido@python.org> wrote:

...
On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).

My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.

In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.

...
Well, to get this level of compatibility I am going to need to add some magical API somewhere then to overwrite a code object's "file" location. Blah.

Agreed, no fun. Unfortunately for core Python it really pays to go the extra mile...

Definitely, which is why I will do it, just not tonight as I am tired of compatibility fixing for now. =)

...

That was my thinking.

...

Basically it would be imp.exec(module, code, path) and it would tweak the code object before execution based on introspecting what the module had set for __file__. But might as well add the support to marshal.

...

I will have to see what of that is implemented in C or in Python. I have always tried to keep all pure Python code out of importlib for bootstrapping reasons in order to keep the possibility of using importlib as the implementation of import. But maybe I should not be worrying about that right at the moment and instead do what keeps the code simple. -Brett

Benjamin Peterson

2:51 a.m.

2009/8/30 Brett Cannon <brett@python.org>:

...

You can use the C implementation of io, _io, which has a full buffering implementation. Of course, that also makes it a better harder for other implementations which may wish to use importlib because the io library would have to be completely implemented... -- Regards, Benjamin

Brett Cannon

5:43 a.m.

On Sun, Aug 30, 2009 at 19:51, Benjamin Peterson<benjamin@python.org> wrote:

...

True. I guess it's a question of whether making importlib easier to maintain and as minimally reliant on C-specific modules is more/less important than trying to bootstrap it in for CPython for __import__ at some point. -Brett

Nick Coghlan

1:13 p.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon wrote:

...

I'd suggest preferring _io, but falling back to the Python io module if the accelerated version doesn't exist. You should get the best of both worlds that way (no bootstrap issues in CPython and other implementations with an _io module, but a still functional importlib in other implementations). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Brett Cannon

4:34 p.m.

On Mon, Aug 31, 2009 at 06:13, Nick Coghlan<ncoghlan@gmail.com> wrote:

...

Well, all important code is in importlib._bootstrap which lacks a single import statement; all dependent modules are injected externally in importlib.__init__. That allows for the possibility of C code to import importlib/_bootstrap.py along with the buit-in modules required, and then inject those built-in modules into importlib's global namespace. This is why I have functions in there that are duplications of things found elsewhere. That means that while I have named the module _io and I use _io.FileIO, you could also easily inject io with the name _io and have everything just work if you were trying to bootstrap. The deal is that if I want to keep up the bootstrap goal I need to continue to restrict myself to either built-in modules or thing I know I can choose to expose later in built-in modules. This is why when Guido suggested os.open I said I will have to see how it is implemented because if it doesn't come from posix or is easy to duplicate I won't be able to use it. -Brett

Gregory P. Smith

7:01 a.m.

On Sun, Aug 30, 2009 at 5:24 PM, Guido van Rossum <guido@python.org> wrote:

...

On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).

My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.

In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.

8-9 years ago while using py2exe on windows to create stand along binaries out of Python programs for distribution we ran into this issue... The compiled .pyc's that py2exe bundles up contained the pathname to the source code on the development build system. When you get a stacktrace python would look for the source code based on those.... Really horrible if your build system used a windows drive letter other than C such as D: as it could cause windows to pop up a dialog asking the user to insert a CD or spin up a spun down optical disc or ask for a floppy, etc. ;)

Antoine Pitrou

9:24 a.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon <brett <at> python.org> writes:

...

Why can't we simply make co_filename a writable attribute instead of inventing some complicated API?

Nick Coghlan

1:19 p.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

Antoine Pitrou wrote:

...

I thought of that question as well, but the later exchange between Guido and Brett made me realise that a lot more than the top level module code object is affected here - the adjustment also needs to be propagated to the code objects created by the module for functions and generators and so forth. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Michael Foord

1:27 p.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

Nick Coghlan wrote:

...

Even if it is not necessary or sufficient it still sounds like a useful change. When writing tools that generate modules or manipulate code objects these read-only attributes are a great nuisance. Michael

...

Cheers, Nick.

-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog

Antoine Pitrou

1:32 p.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...

I'm not sure I understand. There's a single type of "code object" and it's PyCodeObject. Making the attribute writable (from Python code) at that level should be sufficient. (then of course the recursive machinery needed to mutate all code objects created in a module may be slightly inefficient if written in Python, but at least it's possible to write it) Regards Antoine.

Benjamin Peterson

2:55 p.m.

2009/8/31 Antoine Pitrou <solipsis@pitrou.net>:

...

Because code objects are supposed to be a immutable hashable object? -- Regards, Benjamin

Antoine Pitrou

3:10 p.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Benjamin Peterson <benjamin <at> python.org> writes:

...

...
Why can't we simply make co_filename a writable attribute instead of

inventing

...

...
some complicated API?

Because code objects are supposed to be a immutable hashable object?

Right, but co_filename is used neither in tp_hash nor in tp_richcompare. Regards Antoine.

Brett Cannon

4:27 p.m.

On Mon, Aug 31, 2009 at 08:10, Antoine Pitrou<solipsis@pitrou.net> wrote:

...

I didn't suggest this since I assumed co_filename was made read-only for a reason back when the design decision was made. But if the original safety concerns are not there then I am happy to simply change the attribute to writable. -Brett

Guido van Rossum

4:33 p.m.

On Mon, Aug 31, 2009 at 9:27 AM, Brett Cannon<brett@python.org> wrote:

...

Hm... I still wonder if there would be bad side effects of making co_filename writable, but I can't think of any, so maybe you can make this work... The next step would be to not write it out when marshalling a code object -- this might save a bit of space in pyc files too! (I guess for compatibility you might want to write it as an empty string.) Of course, tracking down all the code objects in the return value of marshal.load*() might be a bit tricky -- API-wise I still think that making it an argument to marshal.load*() might be simpler. Also it would preserve the purity of code objects. (Michael: it would be fine if *other* implementations of Python made co_filename writable, as long as you can't think of security issues with this.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Brett Cannon

4:57 p.m.

On Mon, Aug 31, 2009 at 09:33, Guido van Rossum<guido@python.org> wrote:

...

I would only want to consider stripping out the filename from the marshal format if a filename argument to marshal.load* was required to guarantee that code objects always in some sensible state. Otherwise everyone would end up with tracebacks that made no sense by default. But adding a required argument to marshal.load* would be quite the pain for compatibility.

...

OK, so what does co_filename get used for? I think it is referenced to open files for use in printing out the traceback. Python won't be able to open files that you can't as a user, so that shouldn't be a security risk. All places where co_filename is referenced would need to gain a check or start using some new C function/macro which verified that co_filename was a string and not some number or something else which wouldn't get null-terminated and thus lead to buffer overflow. A quick grep for co_filename turns up 17 uses in C code, although having to add some check would ruin the purity Guido is talking about and make a single attribute on code objects something people have to be careful about instead of having a guarantee that all attributes have some specific type of value. I'm with Guido; I would rather add an optional argument to marshal.load*. It must be a string and, if present, is used to override co_filename in the resulting code object. Once we have had the argument around we can then potentially make it a required argument and have file paths in the marshal data go away (or decide to default to some string constant when people don't specify the path argument). -Brett

Guido van Rossum

5:02 p.m.

On Mon, Aug 31, 2009 at 9:57 AM, Brett Cannon<brett@python.org> wrote:

...

Well... It would be, but consider this: marshal.load() already takes a file argument; in most cases you can extract the name from the file easily. And for marshal.loads(), I'm not sure that the filename baked into the data is all that reliable anyways.

...

You could also do the validation on assignment.

...

Actually that sounds like a fine transitional argument. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Brett Cannon

7 p.m.

On Mon, Aug 31, 2009 at 10:02, Guido van Rossum<guido@python.org> wrote:

...

On Mon, Aug 31, 2009 at 9:57 AM, Brett Cannon<brett@python.org> wrote:

...
On Mon, Aug 31, 2009 at 09:33, Guido van Rossum<guido@python.org> wrote:

...
Hm... I still wonder if there would be bad side effects of making co_filename writable, but I can't think of any, so maybe you can make this work... The next step would be to not write it out when marshalling a code object -- this might save a bit of space in pyc files too! (I guess for compatibility you might want to write it as an empty string.)

I would only want to consider stripping out the filename from the marshal format if a filename argument to marshal.load* was required to guarantee that code objects always in some sensible state. Otherwise everyone would end up with tracebacks that made no sense by default. But adding a required argument to marshal.load* would be quite the pain for compatibility.

Well... It would be, but consider this: marshal.load() already takes a file argument; in most cases you can extract the name from the file easily. And for marshal.loads(), I'm not sure that the filename baked into the data is all that reliable anyways.

...
...
Of course, tracking down all the code objects in the return value of marshal.load*() might be a bit tricky -- API-wise I still think that making it an argument to marshal.load*() might be simpler. Also it would preserve the purity of code objects.

(Michael: it would be fine if *other* implementations of Python made co_filename writable, as long as you can't think of security issues with this.)

OK, so what does co_filename get used for? I think it is referenced to open files for use in printing out the traceback. Python won't be able to open files that you can't as a user, so that shouldn't be a security risk. All places where co_filename is referenced would need to gain a check or start using some new C function/macro which verified that co_filename was a string and not some number or something else which wouldn't get null-terminated and thus lead to buffer overflow.

You could also do the validation on assignment.

...
A quick grep for co_filename turns up 17 uses in C code, although having to add some check would ruin the purity Guido is talking about and make a single attribute on code objects something people have to be careful about instead of having a guarantee that all attributes have some specific type of value.

I'm with Guido; I would rather add an optional argument to marshal.load*. It must be a string and, if present, is used to override co_filename in the resulting code object. Once we have had the argument around we can then potentially make it a required argument and have file paths in the marshal data go away (or decide to default to some string constant when people don't specify the path argument).

Actually that sounds like a fine transitional argument.

I will plan to take this approach then; http://bugs.python.org/issue6811 will track all of this. Since this is a 3.2 thing I am not going to rush to implement this. -Brett

Antoine Pitrou

7:27 p.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon <brett <at> python.org> writes:

...

I still don't understand what the point is of this complicated approach (adding an argument to marshal.load()) compared to the simple and obvious approach (making co_filename mutable). Besides, the latter would let you code the recursive renaming algorithm in Python, which is the whole point of importlib (rewriting most code in Python), isn't it? Regards Antoine.

Brett Cannon

7:59 p.m.

On Mon, Aug 31, 2009 at 12:27, Antoine Pitrou<solipsis@pitrou.net> wrote:

...

If we add the argument to marshal.load* we can eventually drop the file location string from marshal data entirely by requiring people to specify the filename to use when the code object is created. Making co_filename mutable simply doesn't allow for this case unless we decide a default value should be used instead.

...

Sure, but I am not about to re-implement marshal in pure Python just because importlib uses it. -Brett

Brett Cannon

8:39 p.m.

On Mon, Aug 31, 2009 at 12:59, Brett Cannon<brett@python.org> wrote:

...

I should also mention that I am +0 on the marshal.load* change. I could be convinced to try to pursue a mutable co_filenme direction, but considering the BDFL likes the marshal.load* approach and it opens up the possibility of compacting the marshal format I am leaning towards sticking with this initial direction. -Brett

Antoine Pitrou

8:47 p.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon <brett <at> python.org> writes:

...

I am really not opinionated on this one. I was just pointing out that choosing a non-obvious solution generally requires good reasons to do so. The marshal format compaction sounds like premature optimization, since nobody seems to have formulated such a request. Regards Antoine.

Fred Drake

9:04 p.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

On Aug 31, 2009, at 4:47 PM, Antoine Pitrou wrote:

...

Every time I've been bitten by the wrong co_filename values (usually from tracebacks), changing the way marshal creates code objects to use a values passed in has been the thing that made the most sense to me. The feature request that's involved here, getting correct co_filename values, can be implemented in different ways, sure. This particular change produces the least impact in the because it *doesn't* change the mutability of code objects. I for one appreciate that, mostly because I'm simply wary of making code objects mutable in this way having unexpected side effects in some library. -Fred -- Fred Drake <fdrake at acm.org>

Nick Coghlan

September 2009

10:36 a.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

Fred Drake wrote:

...

"linecache" was the one that immediately popped into mind for me (not saying it is affected, just saying it would be the first place I would like for possible side effects). I think this is one of those situations where something has behaved a certain way for so long that it would be really hard to be confident that we had considered all the possible ramifications of changing it. Modifying marshal to allow Python code to override the stored value instead of making it a free-for-all preserves the post-compile immutability characteristic while still letting Brett make importlib mimic the behaviour of import.c. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Robert Collins

August 2009

12:13 a.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

On Sun, 2009-08-30 at 16:28 -0700, Brett Cannon wrote:

...

Brett Cannon

12:23 a.m.

On Sun, Aug 30, 2009 at 17:13, Robert Collins<robertc@robertcollins.net> wrote:

...

Guido van Rossum

12:26 a.m.

On Sun, Aug 30, 2009 at 5:23 PM, Brett Cannon<brett@python.org> wrote:

...

I already mentioned replication; it could also just be a matter of downloading a tarball with .py and .pyc files. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Glyph Lefkowitz

3:13 a.m.

On Sun, Aug 30, 2009 at 8:26 PM, Guido van Rossum <guido@python.org> wrote:

...

Guido van Rossum

12:24 a.m.

On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...

Brett Cannon

12:34 a.m.

On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido@python.org> wrote:

...

On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).

My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.

In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.

...

(I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-)

Guido van Rossum

August 2009

2:34 a.m.

On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon<brett@python.org> wrote:

...

On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido@python.org> wrote:

...
On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).

My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.

In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.

...

Agreed, no fun. Unfortunately for core Python it really pays to go the extra mile...

...

Brett Cannon

2:43 a.m.

On Sun, Aug 30, 2009 at 19:34, Guido van Rossum<guido@python.org> wrote:

...

On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon<brett@python.org> wrote:

...
On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido@python.org> wrote:

...
On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).

My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.

In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.

...
Well, to get this level of compatibility I am going to need to add some magical API somewhere then to overwrite a code object's "file" location. Blah.

Agreed, no fun. Unfortunately for core Python it really pays to go the extra mile...

Definitely, which is why I will do it, just not tonight as I am tired of compatibility fixing for now. =)

...

That was my thinking.

...

Benjamin Peterson

2:51 a.m.

2009/8/30 Brett Cannon <brett@python.org>:

...

Brett Cannon

5:43 a.m.

On Sun, Aug 30, 2009 at 19:51, Benjamin Peterson<benjamin@python.org> wrote:

...

Nick Coghlan

1:13 p.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon wrote:

...

Brett Cannon

4:34 p.m.

On Mon, Aug 31, 2009 at 06:13, Nick Coghlan<ncoghlan@gmail.com> wrote:

...

Gregory P. Smith

August 2009

7:01 a.m.

On Sun, Aug 30, 2009 at 5:24 PM, Guido van Rossum <guido@python.org> wrote:

...

On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett@python.org> wrote:

...
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function).

My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case.

In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario.

Antoine Pitrou

9:24 a.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon <brett <at> python.org> writes:

...

Why can't we simply make co_filename a writable attribute instead of inventing some complicated API?

Nick Coghlan

1:19 p.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

Antoine Pitrou wrote:

...

Michael Foord

1:27 p.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

Nick Coghlan wrote:

...

Cheers, Nick.

-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog

Antoine Pitrou

1:32 p.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...

Benjamin Peterson

2:55 p.m.

2009/8/31 Antoine Pitrou <solipsis@pitrou.net>:

...

Because code objects are supposed to be a immutable hashable object? -- Regards, Benjamin

Antoine Pitrou

August 2009

3:10 p.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Benjamin Peterson <benjamin <at> python.org> writes:

...

...
Why can't we simply make co_filename a writable attribute instead of

inventing

...

...
some complicated API?

Because code objects are supposed to be a immutable hashable object?

Right, but co_filename is used neither in tp_hash nor in tp_richcompare. Regards Antoine.

Brett Cannon

4:27 p.m.

On Mon, Aug 31, 2009 at 08:10, Antoine Pitrou<solipsis@pitrou.net> wrote:

...

Guido van Rossum

4:33 p.m.

On Mon, Aug 31, 2009 at 9:27 AM, Brett Cannon<brett@python.org> wrote:

...

Brett Cannon

4:57 p.m.

On Mon, Aug 31, 2009 at 09:33, Guido van Rossum<guido@python.org> wrote:

...

Guido van Rossum

5:02 p.m.

On Mon, Aug 31, 2009 at 9:57 AM, Brett Cannon<brett@python.org> wrote:

...

You could also do the validation on assignment.

...

Actually that sounds like a fine transitional argument. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Brett Cannon

7 p.m.

On Mon, Aug 31, 2009 at 10:02, Guido van Rossum<guido@python.org> wrote:

...

On Mon, Aug 31, 2009 at 9:57 AM, Brett Cannon<brett@python.org> wrote:

...
On Mon, Aug 31, 2009 at 09:33, Guido van Rossum<guido@python.org> wrote:

...
Hm... I still wonder if there would be bad side effects of making co_filename writable, but I can't think of any, so maybe you can make this work... The next step would be to not write it out when marshalling a code object -- this might save a bit of space in pyc files too! (I guess for compatibility you might want to write it as an empty string.)

I would only want to consider stripping out the filename from the marshal format if a filename argument to marshal.load* was required to guarantee that code objects always in some sensible state. Otherwise everyone would end up with tracebacks that made no sense by default. But adding a required argument to marshal.load* would be quite the pain for compatibility.

Well... It would be, but consider this: marshal.load() already takes a file argument; in most cases you can extract the name from the file easily. And for marshal.loads(), I'm not sure that the filename baked into the data is all that reliable anyways.

...
...
Of course, tracking down all the code objects in the return value of marshal.load*() might be a bit tricky -- API-wise I still think that making it an argument to marshal.load*() might be simpler. Also it would preserve the purity of code objects.

(Michael: it would be fine if *other* implementations of Python made co_filename writable, as long as you can't think of security issues with this.)

OK, so what does co_filename get used for? I think it is referenced to open files for use in printing out the traceback. Python won't be able to open files that you can't as a user, so that shouldn't be a security risk. All places where co_filename is referenced would need to gain a check or start using some new C function/macro which verified that co_filename was a string and not some number or something else which wouldn't get null-terminated and thus lead to buffer overflow.

You could also do the validation on assignment.

...
A quick grep for co_filename turns up 17 uses in C code, although having to add some check would ruin the purity Guido is talking about and make a single attribute on code objects something people have to be careful about instead of having a guarantee that all attributes have some specific type of value.

I'm with Guido; I would rather add an optional argument to marshal.load*. It must be a string and, if present, is used to override co_filename in the resulting code object. Once we have had the argument around we can then potentially make it a required argument and have file paths in the marshal data go away (or decide to default to some string constant when people don't specify the path argument).

Actually that sounds like a fine transitional argument.

I will plan to take this approach then; http://bugs.python.org/issue6811 will track all of this. Since this is a 3.2 thing I am not going to rush to implement this. -Brett

Antoine Pitrou

August 2009

7:27 p.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon <brett <at> python.org> writes:

...

Brett Cannon

7:59 p.m.

On Mon, Aug 31, 2009 at 12:27, Antoine Pitrou<solipsis@pitrou.net> wrote:

...

Sure, but I am not about to re-implement marshal in pure Python just because importlib uses it. -Brett

Brett Cannon

8:39 p.m.

On Mon, Aug 31, 2009 at 12:59, Brett Cannon<brett@python.org> wrote:

...

Antoine Pitrou

8:47 p.m.

New subject: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon <brett <at> python.org> writes:

...

Fred Drake

9:04 p.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

On Aug 31, 2009, at 4:47 PM, Antoine Pitrou wrote:

...

Nick Coghlan

September 2009

10:36 a.m.

New subject: how important is setting co_filename for a module being imported to what __file__ is set to?

Fred Drake wrote:

...

5676

Age (days ago)

5678

Last active (days ago)

List overview

Download

30 comments

10 participants

participants (10)

Antoine Pitrou
Benjamin Peterson
Brett Cannon
Fred Drake
Glyph Lefkowitz
Gregory P. Smith
Guido van Rossum
Michael Foord
Nick Coghlan
Robert Collins

how important is setting co_filename for a module being imported to what __file__ is set to?

tags

participants (10)

how important is setting co_filename for a module being imported to what file is set to?