[Python-Dev] __file__

Robert Collins robertc at robertcollins.net
Mon Mar 1 00:56:20 CET 2010


On Mon, 2010-03-01 at 12:35 +1300, Greg Ewing wrote:
> 
> Yes, although that would then incur higher stat overheads for
> people distributing .pyc files. There doesn't seem to be a
> way of pleasing everyone.
> 
> This is all assuming that the extra stat calls are actually
> a problem. Does anyone have any evidence that they would
> really take significant time compared to loading the module?
> Once you've looked for one file in a given directory, looking
> for another one in the same directory ought to be quite fast,
> since all the relevant directory blocks will be in the
> filesystem cache. 

We've done a bunch of testing in bzrlib. Basic things are:
 - statting /is/ expensive *if* you don't use the result.
 - loading code is the main cost *once* you have a hot disk cache

Specifically, stats for files that are *not present* incur page-in costs
for the dentries needed to determine the file is absent. In the special
case of probing for $name.$ext1, ...$ext2, ...$ext3, you generally hit
the same pages and don't incur additional page in costs. (you'll hit the
same page in most file systems when you look for the second and third
entries).

In most file systems stats for files that *are present* also incur a
page-in for the inode of the file. If you then do not read the file,
this is I/O that doesn't really gain anything. 

Being able to disable .py file usage completely - so that only foo.pyc
and foo/__init__.pyc are probed for, could have a very noticable change
in the cold cache startup time.

# Startup time for bzr (cold cache):
$ drop-caches
$ time bzr --no-plugins revno
5061

real    0m8.875s
user    0m0.210s
sys     0m0.140s

# Hot cache
$ time bzr --no-plugins revno
5061

real    0m0.307s
user    0m0.250s
sys     0m0.040s


(revno is a small command that reads a small amount of data - just
enough to trigger demand loading of the core repository layers and so
on).

strace timings for those two operations:
cold cache:
$ strace -c bzr --no-plugins revno
5061
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 56.34    0.040000          76       527           read
 28.98    0.020573           9      2273      1905 open
 14.43    0.010248          14       734       625 stat
  0.15    0.000107           0       533           fstat
...

hot cache:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.10    0.000368          92         4           getdents
 19.49    0.000159           0       527           read
 16.91    0.000138           1       163           munmap
 10.05    0.000082           2        54           mprotect
  8.46    0.000069           0      2273      1905 open
  0.00    0.000000           0         8           write
  0.00    0.000000           0       367           close
  0.00    0.000000           0       734       625 stat
...

Cheers,
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100301/22f9c569/attachment.pgp>


More information about the Python-Dev mailing list