Re: [Python-Dev] __file__ is not always an absolute path

On 6 Feb, 11:53 pm, guido@python.org wrote:
On Sat, Feb 6, 2010 at 3:22 PM, <exarkun@twistedmatrix.com> wrote:
On 10:29 pm, guido@python.org wrote:
[snip]
I haven't tried to repro this particular example, but the reason is that we don't want to have to call getpwd() on every import nor do we want to have some kind of in-process variable to cache the current directory. (getpwd() is relatively slow and can sometimes fail outright, and trying to cache it has a certain risk of being wrong.)
Assuming you mean os.getcwd():
Yes.
exarkun@boson:~$ python -m timeit -s 'def f(): pass' 'f()' 10000000 loops, best of 3: 0.132 usec per loop exarkun@boson:~$ python -m timeit -s 'from os import getcwd' 'getcwd()' 1000000 loops, best of 3: 1.02 usec per loop exarkun@boson:~$ So it's about 7x more expensive than a no-op function call. I'd call this pretty quick. Compared to everything else that happens during an import, I'm not convinced this wouldn't be lost in the noise. I think it's at least worth implementing and measuring.
But it's a system call, and its speed depends on a lot more than the speed of a simple function call. It depends on the OS kernel, possibly on the filesystem, and so on.
Do you know of a case where it's actually slow? If not, how convincing should this argument really be? Perhaps we can measure it on a few platforms before passing judgement. For reference, my numbers are from Linux 2.6.31 and my filesystem (though I don't think it really matters) is ext3. I have eglibc 2.10.1 compiled by gcc version 4.4.1.
Also "os.getcwd()" abstracts away various platform details that the C import code would have to replicate.
That logic can all be hidden behind a C API which os.getcwd() can then be implemented in terms of. There's no reason for it to be any harder to invoke from C than it is from Python.
Really, the approach of preprocessing sys.path makes much more sense. If an app wants sys.path[0] to be an absolute path too they can modify it themselves.
That may turn out to be the less expensive approach. I'm not sure in what other ways it is the approach that makes much more sense. Quite the opposite: centralizing the responsibility for normalizing this value makes a lot of sense if you consider things like reducing code duplication and, in turn, removing the possibility for bugs. Adding better documentation for __file__ is another task which I think is worth undertaking, regardless of whether any change is made to how its value is computed. At the moment, the two or three sentences about it in PEP 302 are all I've been able to find, and they don't really get the job done. Jean-Paul

On 7 Feb 2010, at 05:27, exarkun@twistedmatrix.com wrote:
Do you know of a case where it's actually slow? If not, how convincing should this argument really be? Perhaps we can measure it on a few platforms before passing judgement.
On Mac OS X at least, system calls are notoriously slow. I think it has to do with Mach overhead, or something… $ arch -arch ppc /usr/bin/python2.6 -m timeit -s 'def f(): pass' 'f()' 1000000 loops, best of 3: 0.476 usec per loop $ arch -arch ppc /usr/bin/python2.6 -m timeit -s 'from os import getcwd' 'getcwd()' 10000 loops, best of 3: 21.9 usec per loop $ arch -arch i386 /usr/bin/python2.6 -m timeit -s 'def f(): pass' 'f()' 1000000 loops, best of 3: 0.234 usec per loop $ arch -arch i386 /usr/bin/python2.6 -m timeit -s 'from os import getcwd' 'getcwd()' 100000 loops, best of 3: 14.1 usec per loop $ arch -arch x86_64 /usr/bin/python2.6 -m timeit -s 'def f(): pass' 'f()' 10000000 loops, best of 3: 0.182 usec per loop $ arch -arch x86_64 /usr/bin/python2.6 -m timeit -s 'from os import getcwd' 'getcwd()' 100000 loops, best of 3: 11 usec per loop For maximum reproducibility, I used the stock Python 2.6.1 included in Mac OS X 10.6.2. In other words ‘os.getcwd()’ is more than fifty times as slow as a regular function call when using Mac OS X. -- Dan Villiom Podlaski Christiansen danchr@gmail.com

I did some quick measures out of curiosity. Performances seems clearly filesystem and O.S. dependent (and are likely deployment/configuration dependent). I did each test 3 times to ensure measure where consistent. Tests were done with ActivePython 2.6.3.7. * AIX 5.3: python26 -m timeit -s 'def f(): pass' 'f()' 1000000 loops, best of 3: 0.336 usec per loop cwd is NFS mount: users/baplepil/sandbox> python26 -m timeit -s 'from os import getcwd' 'getcwd()' 1000 loops, best of 3: 1.09 msec per loop cwd is /tmp: /tmp> python26 -m timeit -s 'from os import getcwd' 'getcwd()' 1000 loops, best of 3: 323 usec per loop * Solaris 10 (Sparc): python26 -m timeit -s 'def f(): pass' 'f()' 1000000 loops, best of 3: 0.495 usec per loop cwd is NFS mount: users/baplepil/sandbox> python26 -m timeit -s 'from os import getcwd' 'getcwd()' 100000 loops, best of 3: 12.1 usec per loop cwd is /tmp: /tmp> python26 -m timeit -s 'from os import getcwd' 'getcwd()' 100000 loops, best of 3: 4.58 usec per loop * Windows XP SP2: python -m timeit -s "def f(): pass; f()" 10000000 loops, best of 3: 0.0531 usec per loop cwd is network drive (same as previous NFS mount): R:\...\users\baplepil>python -m timeit -s "from os import getcwd" "getcwd()" 100000 loops, best of 3: 5.14 usec per loop cwd is C:\temp>: C:\temp>python -m timeit -s "from os import getcwd" "getcwd()" 100000 loops, best of 3: 4.27 usec per loop 2010/2/17 Dan Villiom Podlaski Christiansen <danchr@gmail.com>
On 7 Feb 2010, at 05:27, exarkun@twistedmatrix.com wrote:
Do you know of a case where it's actually slow? If not, how convincing should this argument really be? Perhaps we can measure it on a few platforms before passing judgement.
participants (3)
-
Baptiste Lepilleur
-
Dan Villiom Podlaski Christiansen
-
exarkun@twistedmatrix.com