Optimizing tips for os.listdir

G. S. Hayes sjdevnull at yahoo.com
Wed Sep 29 22:49:47 CEST 2004


sjdevnull at yahoo.com (G. S. Hayes) wrote in message news:<96c2e938.0409271432.23a2b877 at posting.google.com>...
> Nick Craig-Wood <nick at craig-wood.com> wrote in message news:<slrnclg6vb.rjj.nick at irishsea.home.craig-wood.com>...
> > Under a unix based OS the above will translate to 1
> > opendir()/readdir()/closedir() and 1 stat() for each file.  There
> > isn't a quicker way in terms of system calls AFAIK.
> 
> Under Linux, readdir() returns a struct dirent that has a d_type
> member indicating the file type (DT_DIR for directories) so you can
> avoid calling stat() on each file.  I thought some BSD systems did
> this as well.

Offtopic since it's really not Python related, (though I guess Python
might want to consider exposing this functionality in a portable way
eventually):

As a quick followup, with 10000 files on my machine it takes about
twice as long to use stat to get this information as to access the
d_type field.  And it costs an extra 10000 syscalls (the d_type one is
about 93 syscalls total, mostly standard program startup/shutdown
costs like mapping in shared libs, flushing output on exit, etc).

On the other hand, they both execute in under a second.  So for most
programs the difference in speed is probably negligible, and the
programming cost of portably choosing which method you want to use
probably isn't worth it in general (I could maybe see it for
specialized applications).



More information about the Python-list mailing list