__file__ is not always an absolute path
In #7712 I was trying to change regrtest to always run the tests in a temporary CWD (e.g. /tmp/@test_1234_cwd/). The patches attached to the issue add a context manager that changes the CWD, and it works fine when I run ./python -m test.regrtest from trunk/. However, when I try from trunk/Lib/ it fails with ImportErrors (note that the latest patch by Florent Xicluna already tries to workaround the problem). The traceback points to "the_package = __import__(abstest, globals(), locals(), [])" in runtest_inner (in regrtest.py), and a "print __import__('test').__file__" there returns 'test/__init__.pyc'. This can be reproduced quite easily: trunk$ ./python Python 2.7a2+ (trunk:77941M, Feb 3 2010, 06:40:49) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import os, sys os.getcwd() '/home/wolf/dev/trunk' import test test.__file__ # absolute '/home/wolf/dev/trunk/Lib/test/__init__.pyc' os.chdir('/tmp') test.__file__ '/home/wolf/dev/trunk/Lib/test/__init__.pyc' from test import test_unicode # works test_unicode.__file__ '/home/wolf/dev/trunk/Lib/test/test_unicode.pyc'
[21]+ Stopped ./python
trunk$ cd Lib/ trunk/Lib$ ../python Python 2.7a2+ (trunk:77941M, Feb 3 2010, 06:40:49) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import os, sys os.getcwd() '/home/wolf/dev/trunk/Lib' import test test.__file__ # relative 'test/__init__.pyc' os.chdir('/tmp') from test import test_unicode # fails Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name test_unicode
Is there a reason why in the second case test.__file__ is relative?
On Sat, Feb 6, 2010 at 12:49 PM, Ezio Melotti
In #7712 I was trying to change regrtest to always run the tests in a temporary CWD (e.g. /tmp/@test_1234_cwd/). The patches attached to the issue add a context manager that changes the CWD, and it works fine when I run ./python -m test.regrtest from trunk/. However, when I try from trunk/Lib/ it fails with ImportErrors (note that the latest patch by Florent Xicluna already tries to workaround the problem). The traceback points to "the_package = __import__(abstest, globals(), locals(), [])" in runtest_inner (in regrtest.py), and a "print __import__('test').__file__" there returns 'test/__init__.pyc'. This can be reproduced quite easily:
trunk$ ./python Python 2.7a2+ (trunk:77941M, Feb 3 2010, 06:40:49) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import os, sys os.getcwd() '/home/wolf/dev/trunk' import test test.__file__ # absolute '/home/wolf/dev/trunk/Lib/test/__init__.pyc' os.chdir('/tmp') test.__file__ '/home/wolf/dev/trunk/Lib/test/__init__.pyc' from test import test_unicode # works test_unicode.__file__ '/home/wolf/dev/trunk/Lib/test/test_unicode.pyc'
[21]+ Stopped ./python
trunk$ cd Lib/ trunk/Lib$ ../python Python 2.7a2+ (trunk:77941M, Feb 3 2010, 06:40:49) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import os, sys os.getcwd() '/home/wolf/dev/trunk/Lib' import test test.__file__ # relative 'test/__init__.pyc' os.chdir('/tmp') from test import test_unicode # fails Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name test_unicode
Is there a reason why in the second case test.__file__ is relative?
I haven't tried to repro this particular example, but the reason is that we don't want to have to call getpwd() on every import nor do we want to have some kind of in-process variable to cache the current directory. (getpwd() is relatively slow and can sometimes fail outright, and trying to cache it has a certain risk of being wrong.) What we do instead, is code in site.py that walks over the elements of sys.path and turns them into absolute paths. However this code runs before '' is inserted in the front of sys.path, so that the initial value of sys.path is ''. You may want to print the value of sys.path at various points to see for yourself. -- --Guido van Rossum (python.org/~guido)
On 10:29 pm, guido@python.org wrote:
On Sat, Feb 6, 2010 at 12:49 PM, Ezio Melotti
wrote: In #7712 I was trying to change regrtest to always run the tests in a temporary CWD (e.g. /tmp/@test_1234_cwd/). The patches attached to the issue add a context manager that changes the CWD, and it works fine when I run ./python -m test.regrtest from trunk/. However, when I try from trunk/Lib/ it fails with ImportErrors (note that the latest patch by Florent Xicluna already tries to workaround the problem). The traceback points to "the_package = __import__(abstest, globals(), locals(), [])" in runtest_inner (in regrtest.py), and a "print __import__('test').__file__" there returns 'test/__init__.pyc'. This can be reproduced quite easily: [snip]
I haven't tried to repro this particular example, but the reason is that we don't want to have to call getpwd() on every import nor do we want to have some kind of in-process variable to cache the current directory. (getpwd() is relatively slow and can sometimes fail outright, and trying to cache it has a certain risk of being wrong.)
Assuming you mean os.getcwd(): exarkun@boson:~$ python -m timeit -s 'def f(): pass' 'f()' 10000000 loops, best of 3: 0.132 usec per loop exarkun@boson:~$ python -m timeit -s 'from os import getcwd' 'getcwd()' 1000000 loops, best of 3: 1.02 usec per loop exarkun@boson:~$ So it's about 7x more expensive than a no-op function call. I'd call this pretty quick. Compared to everything else that happens during an import, I'm not convinced this wouldn't be lost in the noise. I think it's at least worth implementing and measuring. Jean-Paul
On Sat, Feb 6, 2010 at 3:22 PM,
On 10:29 pm, guido@python.org wrote:
On Sat, Feb 6, 2010 at 12:49 PM, Ezio Melotti
wrote: In #7712 I was trying to change regrtest to always run the tests in a temporary CWD (e.g. /tmp/@test_1234_cwd/). The patches attached to the issue add a context manager that changes the CWD, and it works fine when I run ./python -m test.regrtest from trunk/. However, when I try from trunk/Lib/ it fails with ImportErrors (note that the latest patch by Florent Xicluna already tries to workaround the problem). The traceback points to "the_package = __import__(abstest, globals(), locals(), [])" in runtest_inner (in regrtest.py), and a "print __import__('test').__file__" there returns 'test/__init__.pyc'. This can be reproduced quite easily:
[snip]
I haven't tried to repro this particular example, but the reason is that we don't want to have to call getpwd() on every import nor do we want to have some kind of in-process variable to cache the current directory. (getpwd() is relatively slow and can sometimes fail outright, and trying to cache it has a certain risk of being wrong.)
Assuming you mean os.getcwd():
Yes.
exarkun@boson:~$ python -m timeit -s 'def f(): pass' 'f()' 10000000 loops, best of 3: 0.132 usec per loop exarkun@boson:~$ python -m timeit -s 'from os import getcwd' 'getcwd()' 1000000 loops, best of 3: 1.02 usec per loop exarkun@boson:~$ So it's about 7x more expensive than a no-op function call. I'd call this pretty quick. Compared to everything else that happens during an import, I'm not convinced this wouldn't be lost in the noise. I think it's at least worth implementing and measuring.
But it's a system call, and its speed depends on a lot more than the speed of a simple function call. It depends on the OS kernel, possibly on the filesystem, and so on. Also "os.getcwd()" abstracts away various platform details that the C import code would have to replicate. Really, the approach of preprocessing sys.path makes much more sense. If an app wants sys.path[0] to be an absolute path too they can modify it themselves. -- --Guido van Rossum (python.org/~guido)
On Feb 06, 2010, at 11:22 PM, exarkun@twistedmatrix.com wrote:
I haven't tried to repro this particular example, but the reason is that we don't want to have to call getpwd() on every import nor do we want to have some kind of in-process variable to cache the current directory. (getpwd() is relatively slow and can sometimes fail outright, and trying to cache it has a certain risk of being wrong.)
Assuming you mean os.getcwd():
exarkun@boson:~$ python -m timeit -s 'def f(): pass' 'f()' 10000000 loops, best of 3: 0.132 usec per loop exarkun@boson:~$ python -m timeit -s 'from os import getcwd' 'getcwd()' 1000000 loops, best of 3: 1.02 usec per loop exarkun@boson:~$ So it's about 7x more expensive than a no-op function call. I'd call this pretty quick. Compared to everything else that happens during an import, I'm not convinced this wouldn't be lost in the noise. I think it's at least worth implementing and measuring.
I'd like to see the effect on command line scripts that are run often and then exit, e.g. Bazaar or Mercurial. Start up time due to import overhead seems to be a constant battle for those types of projects. -Barry
Barry Warsaw
exarkun <at> boson:~$ python -m timeit -s 'from os import getcwd' 'getcwd()' 1000000 loops, best of 3: 1.02 usec per loop
[...]
I'd like to see the effect on command line scripts that are run often and then exit, e.g. Bazaar or Mercurial. Start up time due to import overhead seems to be a constant battle for those types of projects.
If os.getcwd() is only called once when "normalizing" sys.path, and if it just takes one microsecond, I don't really see the point. :-) Antoine.
Antoine Pitrou wrote:
Barry Warsaw
writes: exarkun <at> boson:~$ python -m timeit -s 'from os import getcwd' 'getcwd()' 1000000 loops, best of 3: 1.02 usec per loop [...] I'd like to see the effect on command line scripts that are run often and then exit, e.g. Bazaar or Mercurial. Start up time due to import overhead seems to be a constant battle for those types of projects.
If os.getcwd() is only called once when "normalizing" sys.path, and if it just takes one microsecond, I don't really see the point. :-)
The problem is that having '' as the first entry in sys.path currently means "do the import relative to the current directory". Unless we want to change the language semantics so we stick os.getcwd() at the front instead of '', then __file__ is still going to be relative sometimes. Alternatively, we could special case those specific imports to do os.getcwd() at the time of the import. That won't affect the import speed significantly for imports from locations other than '' (i.e. most of them) and will more accurately reflect the true meaning of __file__ in that case (since we put the module in sys.modules, future imports won't see different versions of that module even if the working directory is changed, so the relative value for __file__ becomes a lie as soon as the working directory changes) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan
The problem is that having '' as the first entry in sys.path currently means "do the import relative to the current directory". Unless we want to change the language semantics so we stick os.getcwd() at the front instead of '', then __file__ is still going to be relative sometimes.
"Changing the language semantics" is actually what I was thinking about :) Do some people actually rely on the fact that changing the current directory will also change the import path? cheers Antoine.
Antoine Pitrou wrote:
Nick Coghlan
writes: The problem is that having '' as the first entry in sys.path currently means "do the import relative to the current directory". Unless we want to change the language semantics so we stick os.getcwd() at the front instead of '', then __file__ is still going to be relative sometimes.
"Changing the language semantics" is actually what I was thinking about :) Do some people actually rely on the fact that changing the current directory will also change the import path?
I've learned that no matter how insane our current semantics for something may be, someone, somewhere will be relying on them :) In this case, the current semantics aren't even all that insane. A bit odd maybe, but not insane. I think they're even documented, but I couldn't say exactly where without some digging. I think we also use the trick of checking for an empty string in sys.path[0] in a couple of places before deciding whether or not to remove it (I seem to recall applying a patch to pydoc along those lines so it worked properly with the -m switch). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Mon, Feb 08, 2010 at 12:51:22PM +0000, Antoine Pitrou wrote:
Do some people actually rely on the fact that changing the current directory will also change the import path?
On the interactive prompt, yes. But I guess that's a habit that could be easily un-learnt. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
Guido van Rossum wrote:
What we do instead, is code in site.py that walks over the elements of sys.path and turns them into absolute paths. However this code runs before '' is inserted in the front of sys.path, so that the initial value of sys.path is ''.
You may want to print the value of sys.path at various points to see for yourself.
I ran into the issue on Debian or Ubuntu (can't remember) several years ago. The post-install script of the Python package did something like "cd /usr/lib/pythonX.Y && ./compileall.py", so all pyc files were created relative to the library root of Python. The __file__ attribute of all pre-compiled Python files were relative, too. Christian
On Sat, Feb 6, 2010 at 4:04 PM, Christian Heimes
Guido van Rossum wrote:
What we do instead, is code in site.py that walks over the elements of sys.path and turns them into absolute paths. However this code runs before '' is inserted in the front of sys.path, so that the initial value of sys.path is ''.
You may want to print the value of sys.path at various points to see for yourself.
I ran into the issue on Debian or Ubuntu (can't remember) several years ago. The post-install script of the Python package did something like "cd /usr/lib/pythonX.Y && ./compileall.py", so all pyc files were created relative to the library root of Python. The __file__ attribute of all pre-compiled Python files were relative, too.
Are you sure you remember this right? The <code>.co_filename attributes will be unmarshalled straight from the bytecode file which indeed will have the relative path in this case (hopefully we'll finally fix this in 3.2 and 2.7). But if I read the code in import.c correctly, __file__ is set on the basis of the path of the file read, which in turn comes from sys.path which will have been "absolufied" by site.py. Or maybe this was so long ago that site.py didn't yet do that? -- --Guido van Rossum (python.org/~guido)
Guido van Rossum schrieb:
Are you sure you remember this right? The <code>.co_filename attributes will be unmarshalled straight from the bytecode file which indeed will have the relative path in this case (hopefully we'll finally fix this in 3.2 and 2.7). But if I read the code in import.c correctly, __file__ is set on the basis of the path of the file read, which in turn comes from sys.path which will have been "absolufied" by site.py. Or maybe this was so long ago that site.py didn't yet do that?
I ran into the problem years ago. I can recall the Python version but it must have been 2.2 or 2.3, maybe 2.1. I'm not entirely sure how it happened, too. All I can remember that I traced the cause down to the way compileall was called. I've tried to reproduce the issue with Python 2.6 but failed. It looks like the code does the right thing. Christian
On Sat, Feb 6, 2010 at 4:36 PM, Christian Heimes
Guido van Rossum schrieb:
Are you sure you remember this right? The <code>.co_filename attributes will be unmarshalled straight from the bytecode file which indeed will have the relative path in this case (hopefully we'll finally fix this in 3.2 and 2.7). But if I read the code in import.c correctly, __file__ is set on the basis of the path of the file read, which in turn comes from sys.path which will have been "absolufied" by site.py. Or maybe this was so long ago that site.py didn't yet do that?
I ran into the problem years ago. I can recall the Python version but it must have been 2.2 or 2.3, maybe 2.1. I'm not entirely sure how it happened, too. All I can remember that I traced the cause down to the way compileall was called. I've tried to reproduce the issue with Python 2.6 but failed. It looks like the code does the right thing.
Hm. The timing doesn't match. From the svn logs for site.py looks like this was introduced in r17768 on 2000-09-28, which puts it before 2.0 was released. -- --Guido van Rossum (python.org/~guido)
participants (8)
-
Antoine Pitrou
-
Barry Warsaw
-
Christian Heimes
-
exarkun@twistedmatrix.com
-
Ezio Melotti
-
Floris Bruynooghe
-
Guido van Rossum
-
Nick Coghlan