os.path.walk (was: Re: Optimizing code)

François Pinard pinard at iro.umontreal.ca
Fri Feb 25 08:15:54 EST 2000


Gerrit Holl <gerrit.holl at pobox.com> writes:
> <quote name="Harald Hanche-Olsen" date="951408593">

> > [...] it calls stat() three times on every regular file in the tree:
> > First, in os.path.isfile, second, in os.path.getsize, and third, in
> > os.path.walk, which needs to find out if a filename corresponds to a
> > directory or not.

> I see.

This has bothered me several times already, in Python.  Perl has a device
caching the last `stat' result, quite easy to use, for allowing users to
precisely optimise such cases.  In many cases, the user has no reason
to think the file system changed enough, recently, to be worth calling
`stat' again.  Of course, one might call `stat' in his code and use the
resulting info block, and this is what I do.  But does not interface well
with os.path.walk.

It would be nice if the Python library was maintaining a little cache for
`stat', and if there was a way for users to interface with it as wanted.

By the way, the `find' program has some optimisations to avoid calling
`stat', which yield a very significant speed-up on Unix (I do not know
that these optimisations can be translated to other OS-es, however).
Could os.path.walk use them, if not already?  The main trick is to save
the number of links on the `.' entry, knowing that it we have one link
in the including directory, one link for the `.' entry itself, and one
`..' link per sub-directory.  Each time we `stat' a sub-directory in the
current one, we decrease the saved count.  When it reaches 2, we know
that no directories remain, and so, can spare all remaining `stat' calls.
It even works for the root directory, because the fact that there is no
including directory is compensated by the fact that `..' points to `/'.

I surely often use `os.path.walk' in my own things, so any speed improvement
in that area would be welcome for me.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard






More information about the Python-list mailing list